1 / 76

LHC Scale Physics in 2008: Grids, Networks and Petabytes

LHC Scale Physics in 2008: Grids, Networks and Petabytes. Shawn McKee (smckee@umich.edu) May 18 th , 2005 Pan-American Advanced Studies Institute (PASI) Mendoza, Argentina. Acknowledgements. Much of this talk was constructed from various sources. I would like acknowledge:

clara
Download Presentation

LHC Scale Physics in 2008: Grids, Networks and Petabytes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LHC Scale Physics in 2008: Grids, Networks and Petabytes Shawn McKee (smckee@umich.edu) May 18th, 2005 Pan-American Advanced Studies Institute (PASI) Mendoza, Argentina

  2. Acknowledgements • Much of this talk was constructed from various sources. I would like acknowledge: • Rob Gardner (U Chicago) • Harvey Newman (Caltech) • Paul Avery (U Florida) • Ian Foster (U Chicago/ANL) • Alan Wilson (Michigan) • The Globus Team • The ATLAS Collaboration • Trillium Shawn McKee - PASI - Mendoza, Argentina

  3. Outline • Large Datasets in High Energy Physics • Overview of High Energy Physics and the LHC • The ATLAS Experiment’s Data Model • Managing LHC Scale Data • Grids and Networks Computing Model • Current Planning, Tools, Middleware and Projects • LHC Scale Physics in 2008 • Grids and Networks at Michigan • Virtual Data • The Future of Data Intensive Science Shawn McKee - PASI - Mendoza, Argentina

  4. Large Datasets in High Energy Physics

  5. Introduction to High-Energy Physics • Before I can talk in detail about large datasets I want to provide a quick context for you to understand where all this data comes from. • High Energy physics explores the very small constituents of nature by colliding “high energy” particles and reconstructing the zoo of particles which result. • One of the most intriguing issues in High Energy physics we are trying to address is the origin of mass… Shawn McKee - PASI - Mendoza, Argentina

  6. Physics with ATLAS: The Higgs Particle • The Riddle of Mass • One of the main goals of the ATLAS program is to discover and study the Higgs particle. The Higgs particle is of critical importance in particle theories and is directly related to the concept of particle mass and therefore to all masses. Shawn McKee - PASI - Mendoza, Argentina

  7. High-Energy: From an Electron-Volt to Trillions of Electron-Volts • Energies are often expressed in units of "electron-volts". An electron-volt (eV) is the energy acquired by a electron (or any particle with the same charge) when it is accelerated by a potential difference of 1 volt. • Typical energies involved in atomic processes (processes such as chemical reactions or the emission of light) are of order a few eV. That is why batteries typically produce about 1 volt, and have to be connected in series to get much larger potentials. • Energies in nuclear processes (like nuclear fission or radioactive decay) are typically of order one million electron-volts (1 MeV). • The highest energy accelerator now operating (at Fermilab) accelerates protons to 1 million million electron volts (1 TeV =1012 eV). • The Large Hadron Collider (LHC) at CERN will accelerate each of two counter-rotating beams of protons to 7 TeV per proton. Shawn McKee - PASI - Mendoza, Argentina

  8. What is an Event? • In the ATLAS detector there will be about a billion collision events per second, a data rate equivalent to twenty simultaneous telephone conversations by every person on the earth. ATLAS will measure the collisions of 7 TeV protons. Each time protons collide or single particles decay is called an “event” Shawn McKee - PASI - Mendoza, Argentina

  9. How Many Collisions? • If two bunches of protons meet head on, the number of collisions from zero upwards. How often are there actually collisions? • For a fixed bunch size, this depends on how many protons there are in each bunch, and how large each proton is. • A proton can be roughly thought of as being about 10-15 meter in radius. If you had bunches 10-6 meters in radius, and only, say, 10 protons in each bunch, the chance of even one proton-proton collision when two bunches met would be extremely small. • If each bunch had a billion-billion (1018) protons so that its entire cross section were just filled with protons, every proton from one bunch would collide with one from the other bunch, and you would have a billion-billion collisions per bunch crossing. • The LHC situation is in between these two extremes, a few collisions (up to 20) per bunch crossing, which requires about a billion protons in each bunch. As you will see, this leads to a lot of data to sift through. Shawn McKee - PASI - Mendoza, Argentina

  10. The Large Hadron Collider (LHC)CERN, Geneva: 2007 Start • 27 km Tunnel in Switzerland & France CMS TOTEM pp, general purpose; HI pp, general purpose; HI Atlas First Beams: April 2007 Physics Runs: from Summer 2007 ALICE : HI LHCb: B-physics Shawn McKee - PASI - Mendoza, Argentina

  11. Data Comparison: LHC vs Prior Exp. High Level-1 Trigger(1 MHz) High No. ChannelsHigh Bandwidth(500 Gbit/s) Level 1 Rate (Hz) 106 LHCB ATLAS CMS 105 HERA-B KLOE TeV II 104 Hans Hoffman DOE/NSF Review, Nov 00 High Data Archive(PetaBytes) CDF/D0 103 H1ZEUS ALICE NA49 UA1 102 104 105 106 107 LEP Event Size (bytes) Shawn McKee - PASI - Mendoza, Argentina

  12. The ATLAS Experiment Shawn McKee - PASI - Mendoza, Argentina

  13. Shawn McKee - PASI - Mendoza, Argentina

  14. ATLAS • A Torroidal LHC ApparatuS • Collaboration • 150 institutes • 1850 physicists • Detector • Inner tracker • Calorimeter • Magnet • Muon • United States ATLAS • 29 universities, 3 national labs • 20% of ATLAS Shawn McKee - PASI - Mendoza, Argentina

  15. Data Flow from ATLAS 40 MHz (~PB/sec) level 1 - special hardware 75 KHz (75 GB/sec) level 2 - embedded processors 5 KHz (5 GB/sec) ATLAS: 10 PB/y (simulated + raw+sum) level 3 - PCs 200 Hz (100-400 MB/sec) data recording & offline analysis Shawn McKee - PASI - Mendoza, Argentina

  16. LHC Timeline for Service Challenges We are here … not much time to get things ready! Shawn McKee - PASI - Mendoza, Argentina

  17. Managing LHC Scale Data

  18. The Data Challenge for LHC • There is a very real challenge to managing 10’s of Petabytes of data yearly for a globally distributed collaboration of 2000 physicists! • While much of the interesting data we seek is small in volume we must understand and sort through a huge volume of relatively uninteresting “events” to discover new physics. • The primary (only!) plan for LHC is to utilize Grid Middleware and high performance networks to harness the complete global resources of our collaborations to manage this data analysis challenge Shawn McKee - PASI - Mendoza, Argentina

  19. Managing LHC Scale Data Grids and Networks Computing Model

  20. The Problem Petabytes… Shawn McKee - PASI - Mendoza, Argentina

  21. The Solution Shawn McKee - PASI - Mendoza, Argentina

  22. What is “The Grid”? • There are many answers and interpretations • The term was originally coined in the mid-1990’s (in analogy with the power grid) and can be described thusly: “The grid provides flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources (virtual organizations:VOs)” Shawn McKee - PASI - Mendoza, Argentina

  23. Grid Perspectives • Users Viewpoint: • A virtual computer which minimizes time to completion for my application while transparently managing access to inputs and resources • Programmers Viewpoint: • A toolkit of applications and API’s which provide transparent access to distributed resources • Administrators Viewpoint: • An environment to monitor, manage and secure access to geographically distributed computers, storage and networks. Shawn McKee - PASI - Mendoza, Argentina

  24. Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center HPSS HPSS HPSS HPSS Data Grids for High Energy Physics CERN/Outside Resource Ratio ~1:4Tier0/( Tier1)/( Tier2) ~1:2:2 ~PByte/sec ~100-400 MBytes/sec Online System Offline Farm,CERN Computer Ctr ~25 TIPS Tier0 +1 10-40 Gbits/sec HPSS Tier 1 France Italy UK BNL Center Tier 2 ~10+ Gbps Tier 3 Physicists work on analysis “channels” Each institute has ~10 physicists working on one or more channels Institute ~0.25TIPS Institute Institute Institute 100 - 10000 Mbits/sec Physics data cache Tier 4 Workstations ATLAS version from Harvey Newman’s original Shawn McKee - PASI - Mendoza, Argentina

  25. Managing LHC Scale Data Current Planning, Tools, Middleware and Testbeds

  26. Grids and Networks: Why Now? • Moore’s law improvements in computing produce highly functional end systems • The Internet and burgeoning wired and wireless provide ~universal connectivity • Changing modes of working and problem solving emphasize teamwork, computation • Network exponentials produce dramatic changes in geometry and geography Shawn McKee - PASI - Mendoza, Argentina

  27. Living in an Exponential World(1) Computing & Sensors Moore’s Law: transistor count doubles each ~18 months Magnetohydro- dynamics star formation Shawn McKee - PASI - Mendoza, Argentina

  28. Living in an Exponential World:(2) Storage • Storage density doubles every ~12 months • This led to a dramatic growth in HEP online data (1 petabyte = 1000 terabyte = 1,000,000 gigabyte) • 2000 ~0.5 petabyte • 2005 ~10 petabytes • 2010 ~100 petabytes • 2015 ~1000 petabytes • Its transforming entire disciplines in physical and, increasingly, biological sciences; humanities next? Shawn McKee - PASI - Mendoza, Argentina

  29. Network Exponentials • Network vs. computer performance • Computer speed doubles every 18 months • Network speed doubles every 9 months • Difference = order of magnitude per 5 years • 1986 to 2000 • Computers: x 500 • Networks: x 340,000 • 2001 to 2010 • Computers: x 60 • Networks: x 4000 Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins. Shawn McKee - PASI - Mendoza, Argentina

  30. The Network • As can be seen in the previous transparency, it can be argued it is the evolution of the network which has been the primary motivator for the Grid. • Ubiquitous, dependable worldwide networks have opened up the possibility of tying together geographically distributed resources • The success of the WWWfor sharing information has spawned a push for a system to share resources • The network has become the “virtual bus” of a virtual computer. • More on this later… Shawn McKee - PASI - Mendoza, Argentina

  31. What Is Needed for LHC-HEP? • We require a number of high level capabilities to do High-Energy Physics: • Data Processing: All data needs to be reconstructed, first into fundamental components like tracks and energy deposition and then into “physics” objects like electrons, muons, hadrons, neutrinos, etc. • Raw -> Reconstructed ->Summarized • Simulation, same path. Critical to understanding our detectors and the underlying physics. • Data Discovery: We must be able to locate events of interest • Data Movement: We must be able to move discovered data as needed for analysis or reprocessing • Data Analysis: We must be able to apply our analysis to the data to determine if • Collaborative Tools:Vital to maintain our global collaborations • Policy and Resource Management:Allow resource owners to specify conditions under which they will share and allow them to manage those resources as they evolve Shawn McKee - PASI - Mendoza, Argentina

  32. Monitoring Example on OSG-ITB Shawn McKee - PASI - Mendoza, Argentina

  33. Collaborative Tools Example: EVO

  34. Managing LHC Scale Data HEP Related Grid/Network Projects

  35. Shawn McKee - PASI - Mendoza, Argentina

  36. The Evolution of Data Movement • The recent history of data movement capabilities exemplifies the evolution of network capacity. • NSFNet started with a 56Kbit modem link as the US network backbone • Current networks are so fast that end systems are only able to fully drive them when storage clusters are used at each end Shawn McKee - PASI - Mendoza, Argentina

  37. 256 s (4 min) 1024 s (17 min) 150,000 s (41 hrs) 4 MB/s 1 MB/s .007 MB/s NSFNET 56 Kb/s Site Architecture Bandwidth in terms of burst data transfer and user wait time. VAX Fuzzball Across the room Across the country 1024MB Shawn McKee - PASI - Mendoza, Argentina

  38. 2000 s (33 min) 13k s (3.6h) 0.5 GB/s 78 MB/s 2002 Cluster-WAN Architecture OC-48 Cloud OC-12 n x GbE (small n) Across the room Across the country 1 TB Shawn McKee - PASI - Mendoza, Argentina

  39. 2000 s (33 min) 5 GB/s* (Wire speed limit…not yet achieved) Distributed Terascale Cluster Interconnect Big Fast Interconnect OC-192 n x GbE (large n) 10 TB 10 TB Shawn McKee - PASI - Mendoza, Argentina

  40. UltraLight Goal (Near Future) • A more modest goal in terms of bandwidth achieved is being targeted by the UltraLight collaboration. • Build, tune and deploy moderately priced servers capable of delivering 1 GB/s between 2 such servers over the WAN • Provides the ability to utilize the full capability of lambda’s, as available, without requiring 10-100’s of nodes at each end. • Easier to manage, coordinate and deploy a smaller number of performant servers than a much larger number of less capable ones • Easier to scale-up as needed to match the available bandwidth Shawn McKee - PASI - Mendoza, Argentina

  41. What is UltraLight? • UltraLight is a program to explore the integration of cutting-edge network technology with the grid computing and data infrastructure of HEP/Astronomy • The program intends to explore network configurations from common shared infrastructure (current IP networks) thru dedicated optical paths point-to-point. • A critical aspect of UltraLight is its integration with two driving application domains in support of their national and international eScience collaborations: LHC-HEP and eVLBI-Astronomy • The Collaboration includes: • Caltech • Florida Int. Univ. • MIT • Univ. of Florida • Univ. of Michigan • UC Riverside • BNL • FNAL • SLAC • UCAID/Internet2 Shawn McKee - PASI - Mendoza, Argentina

  42. UltraLight Network: PHASE I • Implementation via “sharing” with HOPI/NLR • MIT not yet “optically” coupled Shawn McKee - PASI - Mendoza, Argentina

  43. UltraLight Network: PHASE III By 2008 • Move into production – Terabyte datasets in 10 minutes • Optical switching fully enabled amongst primary sites • Integrated international infrastructure Shawn McKee - PASI - Mendoza, Argentina

  44. LHC Scale Physics in 2008

  45. ATLAS Discovery Potential for SM Higgs Boson • Good sensitivity over the full mass range from ~100 GeV to ~ 1 TeV • For most of the mass range at least two channels available • Detector performance is crucial: b-tag, leptons, g, E resolution, g / jet separation, ... Shawn McKee - PASI - Mendoza, Argentina

  46. ATLAS Shawn McKee - PASI - Mendoza, Argentina

  47. Data IntensiveComputing and Grids • The term “Data Grid” is often used • Unfortunate as it implies a distinct infrastructure, which it isn’t; but easy to say • Data-intensive computing shares numerous requirements with collaboration, instrumentation, computation, … • Security, resource mgt, info services, etc. • Important to exploit commonalities as very unlikely that multiple infrastructures can be maintained • Fortunately this seems easy to do! Shawn McKee - PASI - Mendoza, Argentina

  48. A Model Architecture for Data Grids Attribute Specification Replica Catalog Metadata Catalog Application Multiple Locations Logical Collection and Logical File Name MDS Selected Replica Replica Selection Performance Information & Predictions NWS GridFTP Control Channel Disk Cache GridFTPDataChannel TapeLibrary Disk Array Disk Cache Replica Location 1 Replica Location 2 Replica Location 3 Shawn McKee - PASI - Mendoza, Argentina

  49. Examples ofDesired Data Grid Functionality • High-speed, reliable access to remote data • Automated discovery of “best” copy of data • Manage replication to improve performance • Co-schedule compute, storage, network • “Transparency” wrt delivered performance • Enforce access control on data • Allow representation of “global” resource allocation policies • Not there yet! Back to the physics… Shawn McKee - PASI - Mendoza, Argentina

  50. Needles in LARGE Haystacks • When protons collide, some events are "interesting" and may tell us about exciting new particles or forces, whereas many others are "ordinary" collisions (often called "background"). The ratio of their relative rates is about 1 interesting event for 10 million background events. One of our key needs is to separate the interesting events from the ordinary ones. • Furthermore the information must be sufficiently detailed and precise to allow eventual recognition of certain "events" that may only occur at the rate of one in one million-million collisions (10-12), a very small fraction of the recorded events, which are a very small fraction of all events. • I will outline the steps ATLAS takes in getting to these interesting particles Shawn McKee - PASI - Mendoza, Argentina

More Related