1 / 81

Paul Avery University of Florida avery@phys.ufl

Integrating Universities and Laboratories In National Cyberinfrastructure. Paul Avery University of Florida avery@phys.ufl.edu. PASI Lecture Mendoza, Argentina May 17, 2005. Outline of Talk. Cyberinfrastructure and Grids Data intensive disciplines and Data Grids

helga
Download Presentation

Paul Avery University of Florida avery@phys.ufl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrating Universities and Laboratories In National Cyberinfrastructure Paul Avery University of Florida avery@phys.ufl.edu PASI LectureMendoza, ArgentinaMay 17, 2005 Paul Avery

  2. Outline of Talk • Cyberinfrastructure and Grids • Data intensive disciplines and Data Grids • The Trillium Grid collaboration • GriPhyN, iVDGL, PPDG • The LHC and its computing challenges • Grid3 and the Open Science Grid • A bit on networks • Education and Outreach • Challenges for the future • Summary Presented from a physicist’s perspective! Paul Avery

  3. Cyberinfrastructure (cont) • Software programs, services, instruments, data, information, knowledge, applicable to specific projects, disciplines, and communities. • Cyberinfrastructure layer of enabling hardware, algorithms, software, communications, institutions, and personnel. A platform that empowers researchers to innovate and eventually revolutionize what they do, how they do it, and who participates. • Base technologies: Computation, storage, and communication components that continue to advance in raw capacity at exponential rates. [Paraphrased from NSF Blue Ribbon Panel report, 2003] Challenge: Creating and operating advanced cyberinfrastructure andintegrating it in science and engineering applications. Paul Avery

  4. Cyberinfrastructure and Grids • Grid: Geographically distributed computing resources configured for coordinated use • Fabric: Physical resources & networks provide raw capability • Ownership: Resources controlled by owners and shared w/ others • Middleware: Software ties it all together: tools, services, etc. • Enhancing collaboration via transparent resource sharing US-CMS “Virtual Organization” Paul Avery

  5. Data Grids & Collaborative Research • Team-based 21st century scientific discovery • Strongly dependent on advanced information technology • People and resources distributed internationally • Dominant factor: data growth (1 Petabyte = 1000 TB) • 2000 ~0.5 Petabyte • 2005 ~10 Petabytes • 2010 ~100 Petabytes • 2015-7 ~1000 Petabytes? • Drives need for powerful linked resources: “Data Grids” • Computation Massive, distributed CPU • Data storage and access Distributed hi-speed disk and tape • Data movement International optical networks • Collaborative research and Data Grids • Data discovery, resource sharing, distributed analysis, etc. How to collect, manage, access and interpret this quantity of data? Paul Avery

  6. Examples of Data Intensive Disciplines • High energy & nuclear physics • Belle, BaBar, Tevatron, RHIC, JLAB • Large Hadron Collider (LHC) • Astronomy • Digital sky surveys, “Virtual” Observatories • VLBI arrays: multiple- Gb/s data streams • Gravity wave searches • LIGO, GEO, VIRGO, TAMA, ACIGA, … • Earth and climate systems • Earth Observation, climate modeling, oceanography, … • Biology, medicine, imaging • Genome databases • Proteomics (protein structure & interactions, drug delivery, …) • High-resolution brain scans (1-10m, time dependent) Primary driver Paul Avery

  7. Our Vision & Goals • Develop the technologies & tools needed to exploit a Grid-based cyberinfrastructure • Apply and evaluate those technologies & tools in challenging scientific problems • Develop the technologies & procedures to support a permanent Grid-based cyberinfrastructure • Create and operate a persistent Grid-based cyberinfrastructure in support of discipline-specific research goals End-to-end GriPhyN + iVDGL + DOE Particle Physics Data Grid (PPDG) = Trillium Paul Avery

  8. 2009 2007 2005 Community growth Data growth 2003 2001 Our Science Drivers • Experiments at Large Hadron Collider • New fundamental particles and forces • 100s of Petabytes 2007 - ? • High Energy & Nuclear Physics expts • Top quark, nuclear matter at extreme density • ~1 Petabyte (1000 TB) 1997 – present • LIGO (gravity wave search) • Search for gravitational waves • 100s of Terabytes 2002 – present • Sloan Digital Sky Survey • Systematic survey of astronomical objects • 10s of Terabytes 2001 – present Paul Avery

  9. Grid Middleware: Virtual Data Toolkit VDT NMI Test Sources (CVS) Build Binaries Build & Test Condor pool 22+ Op. Systems Pacman cache Package Patching RPMs Build Binaries GPT src bundles Build Binaries Test Many Contributors A unique laboratory for testing, supporting, deploying, packaging, upgrading, & troubleshooting complex sets of software! Paul Avery

  10. VDT Growth Over 3 Years www.griphyn.org/vdt/ VDT 1.1.8 First real use by LCG VDT 1.0 Globus 2.0b Condor 6.3.1 # of components VDT 1.1.11 Grid3 VDT 1.1.7 Switch to Globus 2.2 Paul Avery

  11. Globus 3.2.1 Condor 6.7.6 RLS 3.0 ClassAds 0.9.7 Replica 2.2.4 DOE/EDG CA certs ftsh 2.0.5 EDG mkgridmap EDG CRL Update GLUE Schema 1.0 VDS 1.3.5b Java Netlogger 3.2.4 Gatekeeper-Authz MyProxy1.11 KX509 System Profiler GSI OpenSSH 3.4 Monalisa 1.2.32 PyGlobus 1.0.6 MySQL UberFTP 1.11 DRM 1.2.6a VOMS 1.4.0 VOMS Admin 0.7.5 Tomcat PRIMA 0.2 Certificate Scripts Apache jClarens 0.5.3 New GridFTP Server GUMS 1.0.1 Components of VDT 1.3.5 Paul Avery

  12. Collaborative Relationships:A CS + VDT Perspective Partner science projects Partner networking projects Partner outreach projects Requirements Prototyping & experiments Production Deployment • Other linkages • Work force • CS researchers • Industry Computer Science Research Virtual Data Toolkit Larger Science Community Techniques & software Tech Transfer Globus, Condor, NMI, iVDGL, PPDG EU DataGrid, LHC Experiments, QuarkNet, CHEPREO, Dig. Divide U.S.Grids Int’l Outreach Paul Avery

  13. U.S. “Trillium” Grid Partnership • Trillium = PPDG + GriPhyN + iVDGL • Particle Physics Data Grid: $12M (DOE) (1999 – 2006) • GriPhyN: $12M (NSF) (2000 – 2005) • iVDGL: $14M (NSF) (2001 – 2006) • Basic composition (~150 people) • PPDG: 4 universities, 6 labs • GriPhyN: 12 universities, SDSC, 3 labs • iVDGL: 18 universities, SDSC, 4 labs, foreign partners • Expts: BaBar, D0, STAR, Jlab, CMS, ATLAS, LIGO, SDSS/NVO • Coordinated internally to meet broad goals • GriPhyN: CS research, Virtual Data Toolkit (VDT) development • iVDGL: Grid laboratory deployment using VDT, applications • PPDG: “End to end” Grid services, monitoring, analysis • Common use of VDT for underlying Grid middleware • Unified entity when collaborating internationally Paul Avery

  14. Transforms Goal: Peta-scale Data Grids forGlobal Science Production Team Single Researcher Workgroups Interactive User Tools Request Execution & Management Tools Request Planning &Scheduling Tools Virtual Data Tools ResourceManagementServices Security andPolicyServices Other GridServices • PetaOps • Petabytes • Performance Distributed resources(code, storage, CPUs,networks) Raw datasource Paul Avery

  15. Sloan Data Galaxy cluster size distribution Sloan Digital Sky Survey (SDSS)Using Virtual Data in GriPhyN Paul Avery

  16. + 3 EU sites (Cardiff/UK, AEI/Germany) Birmingham• • Cardiff AEI/Golm • * LHO, LLO: observatory sites * LSC - LIGO Scientific Collaboration - iVDGL supported The LIGO Scientific Collaboration (LSC)and the LIGO Grid LIGO Grid: 6 US sites iVDGL has enabled LSC to establish a persistent production grid Paul Avery

  17. Large Hadron Collider & its Frontier Computing Challenges Paul Avery

  18. Large Hadron Collider (LHC)@ CERN • 27 km Tunnel in Switzerland & France TOTEM CMS ALICE LHCb Search for • Origin of Mass • New fundamental forces • Supersymmetry • Other new particles • 2007 – ? ATLAS Paul Avery

  19. CMS: “Compact” Muon Solenoid Inconsequential humans Paul Avery

  20. LHC Data Rates: Detector to Storage 40 MHz ~TBytes/sec Physics filtering Level 1 Trigger: Special Hardware 75 GB/sec 75 KHz Level 2 Trigger: Commodity CPUs 5 GB/sec 5 KHz Level 3 Trigger: Commodity CPUs 0.15 – 1.5 GB/sec 100 Hz Raw Data to storage(+ simulated data) Paul Avery

  21. Complexity: Higgs Decay to 4 Muons (+30 minimum bias events) All charged tracks with pt > 2 GeV Reconstructed tracks with pt > 25 GeV 109 collisions/sec, selectivity: 1 in 1013 Paul Avery

  22. LHC: Petascale Global Science • Complexity: Millions of individual detector channels • Scale: PetaOps (CPU), 100s of Petabytes (Data) • Distribution: Global distribution of people & resources BaBar/D0 Example - 2004 700+ Physicists 100+ Institutes 35+ Countries CMS Example- 2007 5000+ Physicists 250+ Institutes 60+ Countries Paul Avery

  23. Korea Russia UK USA U Florida Caltech UCSD Iowa FIU Maryland LHC Global Data Grid (2007+) • 5000 physicists, 60 countries • 10s of Petabytes/yr by 2008 • 1000 Petabytes in < 10 yrs? CMS Experiment Online System CERN Computer Center 150 - 1500 MB/s Tier 0 10-40 Gb/s Tier 1 >10 Gb/s Tier 2 2.5-10 Gb/s Tier 3 Tier 4 Physics caches PCs Paul Avery

  24. University Tier2 Centers • Tier2 facility • Essential university role in extended computing infrastructure • 20 – 25% of Tier1 national laboratory, supported by NSF • Validated by 3 years of experience (CMS, ATLAS, LIGO) • Functions • Perform physics analysis, simulations • Support experiment software • Support smaller institutions • Official role in Grid hierarchy (U.S.) • Sanctioned by MOU with parent organization (ATLAS, CMS, LIGO) • Selection by collaboration via careful process • Local P.I. with reporting responsibilities Paul Avery

  25. Grids and Globally Distributed Teams • Non-hierarchical: Chaotic analyses + productions • Superimpose significant random data flows Paul Avery

  26. Grid3 and Open Science Grid Paul Avery

  27. Grid3: A National Grid Infrastructure • 32 sites, 4000 CPUs: Universities + 4 national labs • Part of LHC Grid, Running since October 2003 • Sites in US, Korea, Brazil, Taiwan • Applications in HEP, LIGO, SDSS, Genomics, fMRI, CS Brazil http://www.ivdgl.org/grid3 Paul Avery

  28. Grid3 World Map Paul Avery

  29. Grid3 Components • Computers & storage at ~30 sites: 4000 CPUs • Uniform service environment at each site • Globus Toolkit: Provides basic authentication, execution management, data movement • Pacman: Installs numerous other VDT and application services • Global & virtual organization services • Certification & registration authorities, VO membership services, monitoring services • Client-side tools for data access & analysis • Virtual data, execution planning, DAG management, execution management, monitoring • IGOC: iVDGL Grid Operations Center • Grid testbed: Grid3dev • Middleware development and testing, new VDT versions, etc. Paul Avery

  30. Grid3 Applications www.ivdgl.org/grid3/applications Paul Avery

  31. ATLAS DC2 CMS DC04 Grid3 Shared Use Over 6 months Usage: CPUs Sep 10 Paul Avery

  32. Grid3 Production Over 13 Months Paul Avery

  33. U.S. CMS 2003 Production • 10M p-p collisions; largest ever • 2x simulation sample • ½ manpower • Multi-VO sharing Paul Avery

  34. Grid3 as CS Research Lab:E.g., Adaptive Scheduling • Adaptive data placementin a realistic environment(K. Ranganathan) • Enables comparisonswith simulations Paul Avery

  35. Grid3 Lessons Learned • How to operate a Grid as a facility • Tools, services, error recovery, procedures, docs, organization • Delegation of responsibilities (Project, VO, service, site, …) • Crucial role of Grid Operations Center (GOC) • How to support people  people relations • Face-face meetings, phone cons, 1-1 interactions, mail lists, etc. • How to test and validate Grid tools and applications • Vital role of testbeds • How to scale algorithms, software, process • Some successes, but “interesting” failure modes still occur • How to apply distributed cyberinfrastructure • Successful production runs for several applications Paul Avery

  36. Grid3  Open Science Grid • Iteratively build & extend Grid3 • Grid3  OSG-0  OSG-1  OSG-2  … • Shared resources, benefiting broad set of disciplines • Grid middleware based on Virtual Data Toolkit (VDT) • Emphasis on “end to end” services for applications • OSG collaboration • Computer and application scientists • Facility, technology and resource providers (labs, universities) • Further develop OSG • Partnerships and contributions from other sciences, universities • Incorporation of advanced networking • Focus on general services, operations, end-to-end performance • Aim for Summer 2005 deployment Paul Avery

  37. http://www.opensciencegrid.org Paul Avery

  38. activity 1 activity 1 activity 1 Activities OSG Organization Technical Groups Advisory Committee Universities,Labs Service Providers Executive Board (8-15 representatives Chair, Officers) Sites Researchers VOs Research Grid Projects Enterprise OSG Council (all members above a certain threshold, Chair, officers) Core OSG Staff (few FTEs, manager) Paul Avery

  39. OSG Technical Groups & Activities • Technical Groups address and coordinate technical areas • Propose and carry out activities related to their given areas • Liaise & collaborate with other peer projects (U.S. & international) • Participate in relevant standards organizations. • Chairs participate in Blueprint, Integration and Deployment activities • Activities are well-defined, scoped tasks contributing to OSG • Each Activity has deliverables and a plan • … is self-organized and operated • … is overseen & sponsored by one or more Technical Groups TGs and Activities are where the real work gets done Paul Avery

  40. OSG Technical Groups Paul Avery

  41. OSG Activities Paul Avery

  42. Connections to European Projects:LCG and EGEE Paul Avery

  43. The Path to the OSG Operating Grid OSG Integration Activity Readiness plan Effort Resources Readiness plan adopted VO Application Software Installation Software & packaging OSG Deployment Activity Service deployment OSG Operations-Provisioning Activity Release Candidate Application validation Middleware Interoperability Functionality & Scalability Tests feedback Metrics & Certification Release Description Paul Avery

  44. OSG Integration Testbed Brazil Paul Avery

  45. Status of OSG Deployment • OSG infrastructure release accepted for deployment. • US CMS MOP “flood testing” successful • D0 simulation & reprocessing jobs running on selected OSG sites • Others in various stages of readying applications & infrastructure(ATLAS, CMS, STAR, CDF, BaBar, fMRI) • Deployment process underway: End of July? • Open OSG and transition resources from Grid3 • Applications will use growing ITB & OSG resources during transition http://osg.ivdgl.org/twiki/bin/view/Integration/WebHome Paul Avery

  46. Interoperability & Federation • Transparent use of Federated Grid infrastructures a goal • There are sites that appear as part of “LCG” as well as part of OSG/Grid3 • D0 bringing reprocessing to LCG sites through adaptor node • CMS and ATLAS can run their jobs on both LCG and OSG • Increasing interaction with TeraGrid • CMS and ATLAS sample simulation jobs are running on TeraGrid • Plans for TeraGrid allocation for jobs running in Grid3 model: with group accounts, binary distributions, external data management, etc Paul Avery

  47. Networks Paul Avery

  48. Evolving Science Requirements for Networks (DOE High Perf. Network Workshop) See http://www.doecollaboratory.org/meetings/hpnpw/ Paul Avery

  49. UltraLight: Advanced Networkingin Applications Funded by ITR2004 10 Gb/s+ network • Caltech, UF, FIU, UM, MIT • SLAC, FNAL • Int’l partners • Level(3), Cisco, NLR Paul Avery

  50. UltraLight: New Information System • A new class of integrated information systems • Includes networking as a managed resource for the first time • Uses “Hybrid” packet-switched and circuit-switched optical network infrastructure • Monitor, manage & optimize network and Grid Systems in realtime • Flagship applications: HEP, eVLBI, “burst” imaging • “Terabyte-scale” data transactions in minutes • Extend Real-Time eVLBI to the 10 – 100 Gb/s Range • Powerful testbed • Significant storage, optical networks for testing new Grid services • Strong vendor partnerships • Cisco, Calient, NLR, CENIC, Internet2/Abilene Paul Avery

More Related