1 / 8

CMS data transfer tests

CMS data transfer tests. Data transfer tests: Focused and short-term programme of work Objective: apply existing knowledge and tools to problem of sustained data throughput for LHC and running experiments Typical goal: CMS DC04. Others exist (will likely run in parallel )

limei
Download Presentation

CMS data transfer tests

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMS data transfer tests • Data transfer tests: • Focused and short-term programme of work • Objective: apply existing knowledge and tools to problem of sustained data throughput for LHC and running experiments • Typical goal: CMS DC04. Others exist (will likely run in parallel) • 60Mbps aggregate into/out of MSS • Up to 200Mbps across WAN, data exchange with T0 / T1 peers • Sustained for months, not hours • People: (at least) • Tier-1: M. Bly, A. Sansum, N. White ++ • Net: P. Clarke, R. Hughes-Jones, Y. Li, M. Rio, R. Tasker ++ • CMSUK: T. Barrass, O. Maroney, S. Metson, D. Newbold • CMS: I. Fisk (UCSD), N. Sinanis (CERN), T. Wildish (CERN) ++ Dave Newbold, University of Bristol 16/12/2002

  2. Programme of work • The overall programme (of CMS), next few months • A: infrastructure tests / optimisation (at this stage now in UK) • B: replica management system functional / stress tests (starting now); test (break) a selection of ‘products’ at different layers • Globus RM, mysql-based EDG RM, SRB; dcache, EDG SE, etc • C: large scale deployment of chosen combination with existing data (ties to GridPP milestones); around 20TB to publish • D: use as baseline replica management service for DC04 • Short-term goals, next few weeks • Measure what the real throughput situation is (within UK and between T1s); repeat regular measurements • Attack the problem at bottleneck points, typically through hardware and software improvements at endpoints • Start to deploy selected high-level replica management tools Dave Newbold, University of Bristol 16/12/2002

  3. First results • First attempts at controlled monitoring • Endpoints: RAL, Bristol, CERN • Simple RTT and throughput measurements over 24 hours • Throughput: measured with iperf, 8 streams, up to 256k buffers • Clearly can be done more effectively by deploying ‘real’ monitoring package • ‘Real world’ tests (not in parallel!) • Instrumented existing production data movement tools • Averaged figures for a 1TB data copy (more detail soon) • Disk -> disk, RAL to CERN • Disk -> MSS, RAL • Hardware available: • RAL, Bristol: dedicated machines (which we can tweak) • CERN: ‘fast-path’ but shared server; new hardware coming Dave Newbold, University of Bristol 16/12/2002

  4. Results: RTT Dave Newbold, University of Bristol 16/12/2002

  5. Results: mem-mem copy Dave Newbold, University of Bristol 16/12/2002

  6. Results: ‘real world’ copy • Disk -> disk • csfnfs15.rl.ac.uk -> cmsdsrv08.cern.ch • 1TB dataset transferred from disk -> disk (and thence to castor) • bbcp using 8 streams and CRC check; not CPU or disk limited • 65Mbps aggregate throughput; somewhat ‘lumpy’ • Castor not yet a bottleneck (but did we fill up the stage pool?) • Disk -> tape • csfnfs15.rl.ac.uk -> csfb.rl.ac.uk (NFS mount) -> datastore • Same dataset as above; average file size ~1.5GByte. • Three parallel write processes; volumes created on the fly • 40Mbps aggregate throughput • Try in-order and out-of-order readback next • Reason to believe resources were not ‘shared’ in this case Dave Newbold, University of Bristol 16/12/2002

  7. Upcoming work • Monitoring: • Deploy ‘real’ tools at some stage soon (at many sites) • Hardware / infrastructure • High-spec dedicated server obtained at CERN (US generosity); online soon • HW upgrades / OS and config tweaks at all sites as necessary • Investigate disk and MSS performance at RAL in more detail • “experts working” • Replica management • Installation of latest EDG tools awaits end of CMS stress test • Have achieved unprecedented levels of stress 8-) • SRB / dcache work going on in US; we will follow later • Need to work to understand how SE system fits into this • Make use of short-term MSS interface at RAL (talk today?) Dave Newbold, University of Bristol 16/12/2002

  8. Other points • ‘Horizontal’ approach • Focused ‘task force’ to solve a well-understood (?) problem • Interesting contrast to ‘vertical’ EDG approach • We will see if it works - looks good so far. • Hardware, etc • CERN hardware situation somewhat embarrassing • We have been bailed out by a US institute with ‘spare’ hardware • How do we leverage the UK LCG contribution for resources to attack this kind of problem? (A little goodwill goes a very long way) • EDG situation • Has become very clear that many parts of system do not scale (yet) • Data challenge planning • We are solving the problem for one (two?) experiments • We have no idea what the scaling factor for other DC’s in 03/04 is • New areas will become bottlenecks – so we need to know! Dave Newbold, University of Bristol 16/12/2002

More Related