1 / 33

High Energy Physics and Data Grids

High Energy Physics and Data Grids. Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ avery@phys.ufl.edu. US/UK Grid Workshop San Francisco August 4-5, 2001.  e e.   .   . u d. c s. t b. Essentials of High Energy Physics.

zagiri
Download Presentation

High Energy Physics and Data Grids

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Energy Physicsand Data Grids Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ avery@phys.ufl.edu US/UK Grid Workshop San Francisco August 4-5, 2001 Paul Avery

  2. e e     u d c s t b Essentials of High Energy Physics • Better name  “Elementary Particle Physics” • Science: Elementary particles, fundamental forces Particles Forces Strong  gluon Electro-weak , W, Z0 Gravity graviton Leptons Quarks • Goal  unified theory of nature • Unification of forces (Higgs, superstrings, extra dimensions, …) • Deep connections to large scale structure of universe • Large overlap with astrophysics, cosmology, nuclear physics Paul Avery

  3. HEP Short History + Frontiers 10-10 m ~ 10 eV >300,000 Y 1900.... Quantum MechanicsAtomic physics 1940-50 Quantum Electro Dynamics ~ 3 min 10 m -15 MeV - GeV 1950-65 Nuclei, HadronsSymmetries, Field theories 10 m >> GeV ~10 sec 1965-75 Quarks. Gauge theories -16 -6 Z u e + ~ 100 GeV ~10 sec 197083 SPS ElectroWeak unification, QCD 10 m -10 -18 u e - 1990 LEP 3 families, Precision Electroweak 1994 Tevatron Top quark Origin of masses 10-19 m ~10 GeV ~10 sec 2007 LHC Higgs ? Supersymmetry ? 2 -12 The next step... GRAND Unified Theories ? Proton Decay ? 10 m ~1016 GeV ~10 sec -32 -32 Underground The Origin of the 10 m ~1019 GeV ~10 sec ?? Quantum Gravity? -35 -43 Universe Superstrings ? (Planck scale) Paul Avery

  4. HEP Research • Experiments primarily accelerator based • Fixed target, colliding beams, special beams • Detectors • Small, large, general purpose, special purpose • … but wide variety of other techniques • Cosmic rays, proton decay, g-2, neutrinos, space missions • Increasing scale of experiments and laboratories • Forced on us by ever higher energies • Complexity, scale, costs  large collaborations • International collaborations are the norm today • Global collaborations are the future (LHC) LHC discussed in next few slides Paul Avery

  5. The CMS Collaboration Belgium Bulgaria Austria USA Finland CERN France Germany Russia Greece Uzbekistan Hungary Ukraine Italy Slovak Republic Georgia UK Belarus Poland Turkey Armenia India Portugal Spain China Estonia Pakistan Switzerland Cyprus Korea China (Taiwan) Associated Institutes Croatia Number of Scientists Number of Laboratories 36 5 Number of Laboratories Member States 58 Non-Member States 50 USA 36 144 Total Number of Scientists Member States 1010 Non-Member States 448 USA 351 Total 1809 1809 Physicists and Engineers 31 Countries 144 Institutions Paul Avery

  6. CERN LHC site CMS LHCb ALICE Atlas Paul Avery

  7. High Energy Physics at the LHC “Compact” Muon Solenoid at the LHC (CERN) Smithsonianstandard man Paul Avery

  8. l e + l Higgs e - Z o e + Z o jet jet SUSY..... e - Collisions at LHC (2007?) ProtonProton 2835 bunch/beam Protons/bunch 1011 Beam energy 7 TeV (7x1012 ev) Luminosity 1034 cm2s1 Bunch Crossing rate 40 MHz(every 25 nsec) Proton Collision rate ~109 Hz (Average ~20 Collisions/Crossing) Parton (quark, gluon) New physics rate ~ 105 Hz Selection: 1in 1013 Particle Paul Avery

  9. HEP Data • Scattering is principal technique for gathering data • Collisions of beam-beam or beam-target particles • Typically caused by a single elementary interaction • But also background collisions  obscures physics • Each collision generates many particles: “Event” • Particles traverse detector, leaving electronic signature • Information collected, put into mass storage (tape) • Each event is independent  trivial computational parallelism • Data Intensive Science • Size of raw event record: 20KB  1MB • 106 109 events per year • 0.3 PB per year (2001) BaBar (SLAC) • 1 PB per year (2005) CDF, D0 (Fermilab) • 5 PB per year (2007) ATLAS, CMS (LHC) Paul Avery

  10. Data Rates: From Detector to Storage 40 MHz ~1000 TB/sec Physics filtering Level 1 Trigger: Special Hardware 75 GB/sec 75 KHz Level 2 Trigger: Commodity CPUs 5 GB/sec 5 KHz Level 3 Trigger: Commodity CPUs 100 MB/sec 100 Hz Raw Data to storage Paul Avery

  11. LHC Data Complexity • “Events” resulting from beam-beam collisions: • Signal event is obscured by 20 overlapping uninteresting collisions in same crossing • CPU time does not scale from previous generations 2000 2007 Paul Avery

  12. Example: Higgs Decay into 4 Muons 40M events/sec, selectivity: 1 in 1013 Paul Avery

  13. LHC Computing Challenges • Complexity of LHC environment and resulting data • Scale: Petabytes of data per year (100 PB by ~2010) Millions of SpecInt95s of CPU • Geographical distribution of people and resources 1800 Physicists 150 Institutes 32 Countries Paul Avery

  14. Transatlantic Net WG (HN, L. Price) Tier0 - Tier1 BW Requirements [*] [*] Installed BW in Mbps. Maximum Link Occupancy 50%; work in progress

  15. Hoffmann LHC Computing Report 2001 Tier0 – Tier1 link requirements (1) Tier1  Tier0 Data Flow for Analysis 0.5 - 1.0 Gbps (2) Tier2  Tier0 Data Flow for Analysis 0.2 - 0.5 Gbps (3) Interactive Collaborative Sessions (30 Peak) 0.1 - 0.3 Gbps (4) Remote Interactive Sessions (30 Flows Peak) 0.1 - 0.2 Gbps (5) Individual (Tier3 or Tier4) data transfers 0.8 Gbps Limit to 10 Flows of 5 Mbytes/sec each TOTAL Per Tier0 - Tier1 Link 1.7 - 2.8 Gbps • Corresponds to ~10 Gbps Baseline BW Installed on US-CERN Link • Adopted by the LHC Experiments (Steering Committee Report) Paul Avery

  16. LHC Computing Challenges • Major challenges associated with: • Scale of computing systems • Network-distribution of computing and data resources • Communication and collaboration at a distance • Remote software development and physics analysis Result of these considerations: Data Grids Paul Avery

  17. Tier 0 (CERN) 3 3 3 3 T2 T2 3 T2 Tier 1 3 3 T2 T2 3 3 3 3 3 3 4 4 4 4 Global LHC Data Grid Hierarchy Tier0 CERNTier1 National LabTier2 Regional Center (University, etc.)Tier3 University workgroupTier4 Workstation • Key ideas: • Hierarchical structure • Tier2 centers • Operate as unified Grid Paul Avery

  18. Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center HPSS HPSS HPSS HPSS Example: CMS Data Grid CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1 Experiment ~PBytes/sec Online System ~100 MBytes/sec Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size CERN Computer Center > 20 TIPS Tier 0 +1 HPSS 2.5 Gbits/sec France Center Italy Center UK Center USA Center Tier 1 2.5 Gbits/sec Tier 2 ~622 Mbits/sec Tier 3 Institute ~0.25TIPS Institute Institute Institute 100 - 1000 Mbits/sec Physics data cache Physicists work on analysis “channels”. Each institute has ~10 physicists working on one or more channels Tier 4 Workstations,other portals Paul Avery

  19. Tier1 and Tier2 Centers • Tier1 centers • National laboratory scale: large CPU, disk, tape resources • High speed networks • Many personnel with broad expertise • Central resource for large region • Tier2 centers • New concept in LHC distributed computing hierarchy • Size  [national lab * university]1/2 • Based at large University or small laboratory • Emphasis on small staff, simple configuration & operation • Tier2 role • Simulations, analysis, data caching • Serve small country, or region within large country Paul Avery

  20. Data Server LHC Tier2 Center (2001) GEth Switch FEth Switch FEth Switch FEth Switch FEth Switch FEth Router Hi-speedchannel WAN Tape >1 RAID Paul Avery

  21. Hardware Cost Estimates • Buy late, but not too late: phased implementation • R&D Phase 2001-2004 • Implementation Phase 2004-2007 • R&D to develop capabilities and computing model itself • Prototyping at increasing scales of capability & complexity 1.1 years 1.4 years 2.1 years 1.2 years Paul Avery

  22. HEP Related Data Grid Projects • Funded projects • GriPhyN USA NSF, $11.9M + $1.6M • PPDG I USA DOE, $2M • PPDG II USA DOE, $9.5M • EU DataGrid EU $9.3M • Proposed projects • iVDGL USA NSF, $15M + $1.8M + UK • DTF USA NSF, $45M + $4M/yr • DataTag EU EC, $2M? • GridPP UK PPARC, > $15M • Other national projects • UK e-Science (> $100M for 2001-2004) • Italy, France, (Japan?) Paul Avery

  23. Submit GriPhyN proposal, $12.5M Q2 00 Q3 00 GriPhyN approved, $11.9M+$1.6M Q4 00 Outline of US-CMS Tier plan DataTAG approved Caltech-UCSD install proto-T2 EU DataGrid approved, $9.3M Submit DataTAG proposal, $2M Submit DTF proposal, $45M 2nd Grid coordination meeting DTF approved? Submit PPDG proposal, $12M Submit iVDGL preproposal 1st Grid coordination meeting iVDGL approved? Q1 01 Q2 01 Submit iVDGL proposal, $15M PPDG approved, $9.5M Q3 01 (HEP Related) Data Grid Timeline Paul Avery

  24. Coordination Among Grid Projects • Particle Physics Data Grid (US, DOE) • Data Grid applications for HENP • Funded 1999, 2000 ($2M) • Funded 2001-2004 ($9.4M) • http://www.ppdg.net/ • GriPhyN (US, NSF) • Petascale Virtual-Data Grids • Funded 9/2000 – 9/2005 ($11.9M+$1.6M) • http://www.griphyn.org/ • European Data Grid (EU) • Data Grid technologies, EU deployment • Funded 1/2001 – 1/2004 ($9.3M) • http://www.eu-datagrid.org/ • HEP in common • Focus: infrastructure development & deployment • International scope • Now developing joint coordination framework GridPP, DTF, iVDGL  very soon? Paul Avery

  25. Data Grid Management Paul Avery

  26. BaBar HENPGCUsers D0 Condor Users BaBar Data Management HENP GC D0 Data Management Condor PPDG SRB Users CDF SRB Team CDF Data Management Globus Team Nuclear Physics Data Management Atlas Data Management CMS Data Management Nuclear Physics Globus Users Atlas CMS Paul Avery

  27. EU DataGrid Project         Paul Avery

  28. PPDG and GriPhyN Projects • PPDG focus on today’s (evolving) problems in HENP • Current HEP: BaBar, CDF, D0 • Current NP: RHIC, JLAB • Future HEP: ATLAS , CMS • GriPhyN focus on tomorrow’s solutions • ATLAS, CMS, LIGO, SDSS • Virtual data, “Petascale” problems (Petaflops, Petabytes) • Toolkit, export to other disciplines, outreach/education • Both emphasize • Application sciences drivers • CS/application partnership (reflected in funding) • Performance • Explicitly complementary Paul Avery

  29. PPDG Multi-site Cached File Access System Satellite Site Tape, CPU, Disk, Robot PRIMARY SITE Data Acquisition, Tape, CPU, Disk, Robot University CPU, Disk, Users Satellite Site Tape, CPU, Disk, Robot Satellite Site Tape, CPU, Disk, Robot University CPU, Disk, Users University CPU, Disk, Users Resource Discovery, Matchmaking, Co-Scheduling/Queueing, Tracking/Monitoring, Problem Trapping + Resolution Paul Avery

  30. GriPhyN: PetaScale Virtual-Data Grids Production Team Individual Investigator Workgroups ~1 Petaflop ~100 Petabytes Interactive User Tools Request Planning & Request Execution & Virtual Data Tools Management Tools Scheduling Tools Resource Other Grid • Resource • Security and • Other Grid Security and Management • Management • Policy • Services Policy Services Services • Services • Services Services Transforms Distributed resources(code, storage, CPUs,networks) Raw data source Paul Avery

  31. Item request Virtual Data in Action • Data request may • Compute locally • Compute remotely • Access local data • Access remote data • Scheduling based on • Local policies • Global policies • Cost Major facilities, archives Regional facilities, caches Local facilities, caches Paul Avery

  32. GriPhyN Goals for Virtual Data • Transparency with respect to location • Caching, catalogs, in a large-scale, high-performance Data Grid • Transparency with respect to materialization • Exact specification of algorithm components • Traceability of any data product • Cost of storage vs CPU vs networks • Automated management of computation • Issues of scale, complexity, transparency • Complications: calibrations, data versions, software versions, … Explore concept of virtual data and itsapplicability to data-intensive science Paul Avery

  33. Discipline - Specific Data Grid Applications Application Usage Request Request Consistency Accounting Management Planning Management Services Services Services Services Collective Replica Replica System Resource Selection Management Monitoring Brokering Services Services Services Services Distributed Community Online Information Coallocation Catalog Authorization Certificate Services Services Services Service Repository Storage Compute Network Catalog Code Service Enquiry Mgmt Mgmt Mgmt Mgmt Mgmt Reg. Resource Protocol Protocol Protocol Protocol Protocol Protocol Protocol Connectivity Communication, service discovery (DNS), authentication, delegation Storage Compute Code Fabric Networks Catalogs Systems Systems Repositories Data Grid Reference Architecture Paul Avery

More Related