1 / 55

Paul Avery University of Florida avery@phys.ufl

Grid Computing in High Energy Physics Enabling Data Intensive Global Science. Paul Avery University of Florida avery@phys.ufl.edu. Lishep2004 Workshop UERJ, Rio de Janeiro February 17, 2004. The Grid Concept.

dandre
Download Presentation

Paul Avery University of Florida avery@phys.ufl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Computing in High Energy Physics Enabling Data Intensive Global Science Paul Avery University of Florida avery@phys.ufl.edu Lishep2004 Workshop UERJ, Rio de JaneiroFebruary 17, 2004 Paul Avery

  2. The Grid Concept • Grid: Geographically distributed computing resources configured for coordinated use • Fabric: Physical resources & networks provide raw capability • Ownership: Resources controlled by owners and shared w/ others • Middleware: Software ties it all together: tools, services, etc. • Goal: Transparent resource sharing US-CMS “Virtual Organization” Paul Avery

  3. LHC: Key Driver for Data Grids • Complexity: Millions of individual detector channels • Scale: PetaOps (CPU), 100s of Petabytes (Data) • Distribution: Global distribution of people & resources 2000+ Physicists 159 Institutes 36 Countries CMS Collaboration Paul Avery

  4. LHC Data and CPU Requirements CMS ATLAS Storage • Raw recording rate 0.1 – 1 GB/s • Monte Carlo data • 100 PB total by ~2010 Processing • PetaOps (> 300,000 3 GHz PCs) LHCb Paul Avery

  5. Complexity: Higgs Decay into 4 muons 109 collisions/sec, selectivity: 1 in 1013 Paul Avery

  6. LHC Global Collaborations CMS ATLAS Several 1000 per expt. Paul Avery

  7. Driver for Transatlantic Networks (Gb/s) 2001 estimates, now seen as conservative! Paul Avery

  8. HEP Bandwidth Roadmap (Gb/s) HEP: Leading role in network development/deployment Paul Avery

  9. Korea Russia UK USA Tier2 Center Tier2 Center Tier2 Center Tier2 Center Institute Institute Institute Institute Global LHC Data Grid Hierarchy CMS Experiment Online System 0.1 - 1.5 GBytes/s CERN Computer Center Tier 0 10-40 Gb/s Tier 1 2.5-10 Gb/s Tier 2 1-2.5 Gb/s Tier 3 Physics caches 1-10 Gb/s ~10s of Petabytes/yr by 2007-8~1000 Petabytes in < 10 yrs? Tier 4 PCs Paul Avery

  10. Most IT Resources Outside CERN 10% 17% 38% Est. 2008 Resources Tier0 + Tier1 + Tier2 Paul Avery

  11. Analysis by Globally Distributed Teams • Non-hierarchical: Chaotic analyses + productions • Superimpose significant random data flows Paul Avery

  12. Data Grid Projects Paul Avery

  13. U.S. Projects GriPhyN (NSF) iVDGL (NSF) Particle Physics Data Grid (DOE) PACIs and TeraGrid (NSF) DOE Science Grid (DOE) NEESgrid (NSF) NSF Middleware Initiative (NSF) EU, Asia projects European Data Grid (EU) EDG-related national Projects DataTAG (EU) LHC Computing Grid (CERN) EGEE (EU) CrossGrid (EU) GridLab (EU) Japanese, Korea Projects Global Context of Data Grid Projects HEP-related Grid infrastructure projects • Two primary project clusters: US + EU • Not exclusively HEP, but driven/led by HEP (with CS) • Many 10s  $M brought into the field Paul Avery

  14. International Grid Coordination • Global Grid Forum (GGF) • Grid standards body • New PNPA research group • HICB: HEP Inter-Grid Coordination Board • Non-competitive forum, strategic issues, consensus • Cross-project policies, procedures and technology, joint projects • HICB-JTB Joint Technical Board • Technical coordination, e.g. GLUE interoperability effort • LHC Computing Grid (LCG) • Technical work, deployments • POB, SC2, PEB, GDB, GAG, etc. Paul Avery

  15. LCG: LHC Computing Grid Project • A persistent Grid infrastructure for LHC experiments • Matched to decades long research program of LHC • Prepare & deploy computing environment for LHC expts • Common applications, tools, frameworks and environments • Deployment oriented: no middleware development(?) • Move from testbed systems to real production services • Operated and supported 24x7 globally • Computing fabrics run as production physics services • A robust, stable, predictable, supportable infrastructure Paul Avery

  16. LCG Activities • Relies on strong leverage • CERN IT • LHC experiments: ALICE, ATLAS, CMS, LHCb • Grid projects: Trillium, DataGrid, EGEE • Close relationship with EGEE project (see Gagliardi talk) • EGEE will create EU production Grid (funds 70 institutions) • Middleware activities done in common (same manager) • LCG deployment in common (LCG Grid is EGEE prototype) • LCG-1 deployed (Sep. 2003) • Tier-1 laboratories (including FNAL and BNL) • LCG-2 in process of being deployed Paul Avery

  17. Sep. 29, 2003 announcement Paul Avery

  18. LCG-1 Sites Paul Avery

  19. Grid Analysis Environment • GAE and analysis teams • Extending locally, nationally, globally • Sharing worldwide computing, storage & network resources • Utilizing intellectual contributions regardless of location • HEPCAL and HEPCAL II • Delineate common LHC use cases & requirements (HEPCAL) • Same for analysis (HEPCAL II) • ARDA: Architectural Roadmap for Distributed Analysis • ARDA document released LCG-2003-033 (Oct. 2003) • Workshop January 2004, presentations by the 4 experiments CMS: CAIGEE ALICE: AliEn ATLAS: ADA LHCb: DIRAC Paul Avery

  20. UserApplication StorageElement JobMonitor ComputingElement One View of ARDA Core Services InformationService JobProvenance Authentication Authorization Auditing Grid AccessService Accounting MetadataCatalog GridMonitoring FileCatalog WorkloadManagement SiteGatekeeper DataManagement PackageManager LCG-2003-033 (Oct. 2003) Paul Avery

  21. CMS Example: CAIGEE HTTP, SOAP, XML/RPC • Clients talk standard protocols to “Grid Services Web Server”, a.k.a. Clarens data/services portal with simple WS API • Clarens portal hides complexity of Grid Services from client • Key features: Global scheduler, catalogs, monitoring, and Grid-wide execution service • Clarens servers formglobal peerpeer network Analysis Client Analysis Client Grid Services Web Server Scheduler Catalogs Fully- Abstract Planner Metadata Partially- Abstract Planner Virtual Data Applications Data Management Monitoring Replica Fully- Concrete Planner Grid Execution Priority Manager Grid Wide Execution Service Paul Avery

  22. LCG Timeline • Phase 1: 2002 – 2005 • Build a service prototype, based on existing grid middleware • Gain experience in running a production grid service • Produce the Computing TDR for the final system • Phase 2: 2006 – 2008 • Build and commission the initial LHC computing environment • Phase 3: ? Paul Avery

  23. “Trillium”: U.S. Physics Grid Projects • Trillium = PPDG + GriPhyN + iVDGL • Large overlap in leadership, people, experiments • HEP members are main drivers, esp. LHC experiments • Benefit of coordination • Common software base + packaging: VDT + PACMAN • Collaborative / joint projects: monitoring, demos, security, … • Wide deployment of new technologies, e.g. Virtual Data • Forum for US Grid projects • Joint strategies, meetings and work • Unified U.S. entity to interact with international Grid projects • Build significant Grid infrastructure: Grid2003 Paul Avery

  24. 2009 2007 2005 Community growth Data growth 2003 2001 Science Drivers for U.S. HEP Grids • LHC experiments • 100s of Petabytes • Current HENP experiments • ~1 Petabyte (1000 TB) • LIGO • 100s of Terabytes • Sloan Digital Sky Survey • 10s of Terabytes Future Grid resources • Massive CPU (PetaOps) • Large distributed datasets (>100PB) • Global communities (1000s) Paul Avery

  25. Will use NMI processes soon Virtual Data Toolkit NMI VDT Test Sources (CVS) Build Binaries Build & Test Condor pool (37 computers) Pacman cache Package Patching … RPMs Build Binaries GPT src bundles Build Binaries Test Contributors (VDS, etc.) A unique resource for managing, testing, supporting, deploying, packaging, upgrading, & troubleshooting complex sets of software Paul Avery

  26. Globus Alliance Grid Security Infrastructure (GSI) Job submission (GRAM) Information service (MDS) Data transfer (GridFTP) Replica Location (RLS) Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds EDG & LCG Make Gridmap Cert. Revocation List Updater Glue Schema/Info provider ISI & UC Chimera & related tools Pegasus NCSA MyProxy GSI OpenSSH LBL PyGlobus Netlogger Caltech MonaLisa VDT VDT System Profiler Configuration software Others KX509 (U. Mich.) Virtual Data Toolkit: Tools in VDT 1.1.12 Paul Avery

  27. VDT Growth (1.1.13 Currently) VDT 1.1.8 First real use by LCG VDT 1.1.11 Grid2003 VDT 1.0 Globus 2.0b Condor 6.3.1 VDT 1.1.7 Switch to Globus 2.2 VDT 1.1.3, 1.1.4 & 1.1.5 pre-SC 2002 Paul Avery

  28. Virtual Data: Derivation and Provenance • Most scientific data are not simple “measurements” • They are computationally corrected/reconstructed • They can be produced by numerical simulation • Science & eng. projects are more CPU and data intensive • Programs are significant community resources (transformations) • So are the executions of those programs (derivations) • Management of dataset dependencies critical! • Derivation: Instantiation of a potential data product • Provenance: Complete history of any existing data product • Previously: Manual methods • GriPhyN: Automated, robust tools (Chimera Virtual Data System) Paul Avery

  29. Other cuts Other cuts Other cuts Other cuts LHC Higgs Analysis with Virtual Data Scientist adds a new derived data branch & continues analysis decay = bb decay = WW WW  leptons decay = ZZ mass = 160 decay = WW WW  e Pt > 20 decay = WW WW  e decay = WW Paul Avery

  30. Sloan Data Galaxy cluster size distribution Chimera: Sloan Galaxy Cluster Analysis Paul Avery

  31. Outreach: QuarkNet-Trillium Virtual Data Portal • More than a web site • Organize datasets • Perform simple computations • Create new computations & analyses • View & share results • Annotate & enquire (metadata) • Communicate and collaborate • Easy to use, ubiquitous, • No tools to install • Open to the community • Grow & extend Initial prototype implemented by graduate student Yong Zhao and M. Wilde (U. of Chicago) Paul Avery

  32. CHEPREO: Center for High Energy Physics Research and Educational OutreachFlorida International University • Physics Learning Center • CMS Research • iVDGL Grid Activities • AMPATH network (S. America) Funded September 2003

  33. Grid2003: An Operational Grid • 28 sites (2100-2800 CPUs) • 400-1300 concurrent jobs • 10 applications • Running since October 2003 Korea http://www.ivdgl.org/grid2003 Paul Avery

  34. Grid2003 Grid2003 Participants Korea US-CMS iVDGL GriPhyN US-ATLAS DOE Labs PPDG Biology Paul Avery

  35. Grid2003 Applications • High energy physics • US-ATLAS analysis (DIAL), • US-ATLAS GEANT3 simulation (GCE) • US-CMS GEANT4 simulation (MOP) • BTeV simulation • Gravity waves • LIGO: blind search for continuous sources • Digital astronomy • SDSS: cluster finding (maxBcg) • Bioinformatics • Bio-molecular analysis (SnB) • Genome analysis (GADU/Gnare) • CS Demonstrators • Job Exerciser, GridFTP, NetLogger-grid2003 Paul Avery

  36. Grid2003 Site Monitoring Paul Avery

  37. Grid2003: Three Months Usage Paul Avery

  38. Grid2003 Success • Much larger than originally planned • More sites (28), CPUs (2800), simultaneous jobs (1300) • More applications (10) in more diverse areas • Able to accommodate new institutions & applications • U Buffalo (Biology) Nov. 2003 • Rice U. (CMS) Feb. 2004 • Continuous operation since October 2003 • Strong operations team (iGOC at Indiana) • US-CMS using it for production simulations (next slide) Paul Avery

  39. Production Simulations on Grid2003 US-CMS Monte Carlo Simulation Used = 1.5  US-CMS resources Non-USCMS USCMS Paul Avery

  40. Grid2003: A Necessary Step • Learning how to operate a Grid • Add sites, recover from errors, provide info, update, test, etc. • Need tools, services, procedures, documentation, organization • Need reliable, intelligent, skilled people • Learning how to cope with “large” scale • “Interesting” failure modes as scale increases • Increasing scale must not overwhelm human resources • Learning how to delegate responsibilities • Multiple levels: Project, Virtual Org., service, site, application • Essential for future growth • Grid2003 experience critical for building “useful” Grids • Frank discussion in “Grid2003 Project Lessons” doc Paul Avery

  41. Grid2003 Lessons (1): Investment • Building momentum • PPDG 1999 • GriPhyN 2000 • iVDGL 2001 • Time for projects to ramp up • Building collaborations • HEP: ATLAS, CMS, Run 2, RHIC, Jlab • Non-HEP: Computer science, LIGO, SDSS • Time for collaboration sociology to kick in • Building testbeds • Build expertise, debug Grid software, develop Grid tools & services • US-CMS 2002 – 2004+ • US-ATLAS 2002 – 2004+ • WorldGrid 2002 (Dec.) Paul Avery

  42. Grid2003 Lessons (2): Deployment • Building something useful draws people in • (Similar to a large HEP detector) • Cooperation, willingness to invest time, striving for excellence! • Grid development requires significant deployments • Required to learn what works, what fails, what’s clumsy, … • Painful, but pays for itself • Deployment provides powerful training mechanism Paul Avery

  43. Grid2003 Lessons (3): Packaging • Installation and configuration (VDT + Pacman) • Simplifies installation, configuration of Grid tools + applications • Major advances over 13 VDT releases • A strategic issue and critical to us • Provides uniformity + automation • Lowers barriers to participation  scaling • Expect great improvements (Pacman 3) • Automation: the next frontier • Reduce FTE overhead, communication traffic • Automate installation, configuration, testing, validation, updates • Remote installation, etc. Paul Avery

  44. Grid2003 and Beyond • Further evolution of Grid3: (Grid3+, etc.) • Contribute resources to persistent Grid • Maintain development Grid, test new software releases • Integrate software into the persistent Grid • Participate in LHC data challenges • Involvement of new sites • New institutions and experiments • New international partners (Brazil, Taiwan, Pakistan?, …) • Improvements in Grid middleware and services • Integrating multiple VOs • Monitoring • Troubleshooting • Accounting • … Paul Avery

  45. U.S. Open Science Grid • Goal: Build an integrated Grid infrastructure • Support US-LHC research program, other scientific efforts • Resources from laboratories and universities • Federate with LHC Computing Grid • Getting there: OSG-1 (Grid3+), OSG-2, … • Series of releases  increasing functionality & scale • Constant use of facilities for LHC production computing • Jan. 12 meeting in Chicago • Public discussion, planning sessions • Next steps • Creating interim Steering Committee (now) • White paper to be expanded into roadmap • Presentation to funding agencies (April/May?) Paul Avery

  46. Inputs to Open Science Grid Technologists Trillium US-LHC Universityfacilities OpenScienceGrid Educationcommunity Multi-disciplinaryfacilities Laboratorycenters ComputerScience Otherscienceapplications Paul Avery

  47. Grids: Enhancing Research & Learning • Fundamentally alters conduct of scientific research • “Lab-centric”: Activities center around large facility • “Team-centric”: Resources shared by distributed teams • “Knowledge-centric”: Knowledge generated/used by a community • Strengthens role of universities in research • Couples universities to data intensive science • Couples universities to national & international labs • Brings front-line research and resources to students • Exploits intellectual resources of formerly isolated schools • Opens new opportunities for minority and women researchers • Builds partnerships to drive advances in IT/science/eng • HEP  Physics, astronomy, biology, CS, etc. • “Application” sciences  Computer Science • Universities  Laboratories • Scientists  Students • Research Community  IT industry Paul Avery

  48. Grids and HEP’s Broad Impact • Grids enable 21st century collaborative science • Linking research communities and resources for scientific discovery • Needed by LHC global collaborations pursuing “petascale” science • HEP Grid projects driving important network developments • Recent US – Europe “land speed records” • ICFA-SCIC, I-HEPCCC, US-CERN link, ESNET, Internet2 • HEP Grids impact education and outreach • Providing technologies, resources for training, education, outreach • HEP has recognized leadership in Grid development • Many national and international initiatives • Partnerships with other disciplines • Influencing funding agencies (NSF, DOE, EU, S.A., Asia) • HEP Grids will provide scalable computing infrastructures for scientific research Paul Avery

  49. Grid References • Grid2003 • www.ivdgl.org/grid2003 • Globus • www.globus.org • PPDG • www.ppdg.net • GriPhyN • www.griphyn.org • iVDGL • www.ivdgl.org • LCG • www.cern.ch/lcg • EU DataGrid • www.eu-datagrid.org • EGEE • egee-ei.web.cern.ch 2nd Edition www.mkp.com/grid2 Paul Avery

  50. Extra Slides Paul Avery

More Related