1 / 16

Massive Data Transfers

George McLaughlin Mark Prior AARNet. Massive Data Transfers. “Big Science” projects driving networks. Large Hadron Collider Coming on-stream in 2007 Particle collisions generating terabytes/second of “raw” data at a single, central, well-connected site

ervin
Download Presentation

Massive Data Transfers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. George McLaughlin Mark Prior AARNet Massive Data Transfers

  2. “Big Science” projects driving networks • Large Hadron Collider • Coming on-stream in 2007 • Particle collisions generating terabytes/second of “raw” data at a single, central, well-connected site • Need to transfer data to global “tier 1” sites. A tier 1 site must have a 10Gbps path to CERN • Tier 1 sites need to ensure gigabit capacity to the Tier2 sites they serve • Square Kilometre Array • Coming on-stream in 2010? • Greater data generator than LHC • Up to 125 sites at remote locations, data need to be brought together for correlation • Can’t determine “noise” prior to correlation • Many logistic issues to be addressed Billion dollar globally funded projects Massive data transfer needs

  3. From very small to very big

  4. Scientists and Network Engineers coming together • HEP community and R&E network community have figured out mechanisms for interaction – probably because HEP is pushing network boundaries • eg the ICFA workshops on HEP, Grid and the Global Digital Divide bring together scientists, network engineers and decision makers – and achieve results • http://agenda.cern.ch/List.php

  5. What’s been achieved so far • A new generation of real-time Grid systems is emerging - support worldwide data analysis by the physics community • Leading role of HEP in developing new systems and paradigms for data intensive science • Transformed view and theoretical understanding of TCP as an efficient, scalable protocol with a wide field of use • Efficient standalone and shared use of 10 Gbps paths of virtually unlimited length; progress towards 100 Gbps networking • Emergence of a new generation of “hybrid” packet- and circuit- switched networks

  6. LHC data (simplified) 1 Megabyte (1MB) A digital photo 1 Gigabyte (1GB) = 1000MB A DVD movie 1 Terabyte (1TB) = 1000GB World annual book production 1 Petabyte (1PB) = 1000TB 10% of the annual production by LHC experiments 1 Exabyte (1EB) = 1000 PB World annual information production • Per experiment • 40 million collisions per second • After filtering, 100 collisions of interest per second • A Megabyte of digitised information for each collision = recording rate of 100 Megabytes/sec • 1 billion collisions recorded = 1 Petabyte/year CMS LHCb ATLAS ALICE

  7. CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1 ~PByte/sec ~100-1500 MBytes/sec Online System Experiment CERN Center PBs of Disk; Tape Robot Tier 0 +1 Tier 1 ~2.5-10 Gbps FNAL Center IN2P3 Center INFN Center RAL Center 2.5-10 Gbps Tier 2 Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center ~2.5-10 Gbps Tier 3 Institute Institute Institute Institute Tens of Petabytes by 2007-8.An Exabyte ~5-7 Years later. Physics data cache 0.1 to 10 Gbps Tier 4 Workstations LHC Computing Hierarchy

  8. Lightpaths for Massive data transfers • From CANARIE A small number of users with large data transfer needs can use more bandwidth than all other users

  9. Why? • Cees de Laat classifies network users into 3 broad groups. • Lightweight users, browsing, mailing, home use. Who need full Internet routing, one to many; • Business applications, multicast, streaming, VPN’s, mostly LAN. Who need VPN services and full Internet routing, several to several + uplink; and • Scientific applications, distributed data processing, all sorts of grids. Need for very fat pipes, limited multiple Virtual Organizations, few to few, peer to peer. Type 3 users: High Energy Physics Astronomers, eVLBI, High Definition multimedia over IP Massive data transfers from experiments running 24x7

  10. What is the GLIF? • Global Lambda Infrastructure Facility - www.glif.is • International virtual organization that supports persistent data-intensive scientific research and middleware development • Provides ability to create dedicated international point to point Gigabit Ethernet circuits for “fixed term” experiments

  11. Huygens Space Probe – a practical example • Cassini spacecraft left Earth in October 1997 to travel to Saturn • On Christmas Day 2004, the Huygens probe separated from Cassini • Started it’s descent through the dense atmosphere of Titan on 14 Jan 2005 • Using this technique 17 telescopes in Australia, China, Japan and the US were able to accurately position the probe to within a kilometre (Titan is ~1.5 billion kilometres from Earth) • Need to transfer Terabytes of data between Australia and the Netherlands Very Long Baseline Interferometry (VLBI) is a technique where widely separated radio-telescopes observe the same region of the sky simultaneously to generate images of cosmic radio sources

  12. AARNet - CSIRO ATNF contribution • Created “dedicated” circuit • The data from two of the Australian telescopes (Parkes [The Dish] & Mopra) was transferred via light plane to CSIRO Marsfield (Sydney) • CeNTIE based fibre from CSIRO Marsfield to AARNet3 GigaPOP • SXTransPORT 10G to Seattle • “Lightpath” to Joint Institute for VLBI in Europe (JIVE) across CA*net4 and SURFnet optical infrastructure

  13. But……….. • 9 organisations in 4 countries involved in “making it happen” • Required extensive human-human interaction (mainly emails…….lots of them) • Although a 1Gbps path was available, maximum throughput was around 400Gbps • Issues with protocols, stack tuning, disk-to-disk transfer, firewalls, different formats, etc • Currently scientists and engineers need to test thoroughly before important experiments, not yet “turn up and use” • Ultimate goal is for the control plane issues to be transparent to the end-user who simply presses the “make it happen” icon Although time from concept to undertaking the scientific experiment was only 3 weeks……..

  14. International path for Huygens transfer

  15. EXPReS and Square Kilometre Array • SKA bigger data generator than LHC • But in a remote location Australia one of countries bidding for SKA – significant infrastructure challenges Also, Eu Commision funded EXPReS project to link 16 radio telescopes around the world at gigabit speeds

  16. In Conclusion • scientists and network engineers working together can exploit the new opportunities that high capacity networking opens up for “big science” • Need to solve issues associated with scalability, control plane, ease of use • QUESTIONS?

More Related