1 / 23

Cyberinfrastructure and the Role of Grid Computing

Or, “Science 2.0”. Cyberinfrastructure and the Role of Grid Computing. Ian Foster Computation Institute Argonne National Lab & University of Chicago. “Web 2.0”. Software as services Data- & computation-rich network services Services as platforms

jolie
Download Presentation

Cyberinfrastructure and the Role of Grid Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Or, “Science 2.0” Cyberinfrastructure and the Role of Grid Computing Ian Foster Computation Institute Argonne National Lab & University of Chicago

  2. “Web 2.0” • Software as services • Data- & computation-richnetwork services • Services as platforms • Easy composition of services to create new capabilities (“mashups”)—that themselves may be made accessible as new services • Enabled by massive infrastructure buildout • Google projected to spend $1.5B on computers, networks, and real estate in 2006 • Dozens of others are spending substantially • Paid for by advertising Declan Butler, Nature

  3. User Discovery tools Analysis tools Science 2.0:E.g., Virtual Observatories Gateway Data Archives Figure: S. G. Djorgovski

  4. Data Service @ uchicago.edu Science 2.0:E.g., Cancer Bioinformatics Grid <BPEL Workflow Doc> <Workflow Inputs> link BPEL Engine Analytic service @ duke.edu link link <Workflow Results> link Analytic service @ osu.edu caBiG: https://cabig.nci.nih.gov/; BPEL work: Ravi Madduri et al.

  5. Users Discovery tools Analysis tools Data Archives Fig: S. G. Djorgovski The Two Dimensions of Science 2.0 • Decompose across network Clients integrate dynamically • Select & compose services • Select “best of breed” providers • Publish result as new services • Decouple resource & service providers Function Resource

  6. Provisioning Technology Requirements: Integration & Decomposition Users • Service-oriented Gridinfrastructure • Provision physicalresources to support application workloads • Service-oriented applications • Wrap applications & data as services • Compose servicesinto workflows Composition Workflows Invocation ApplnService ApplnService “The Many Faces of IT as Service”, ACM Queue, Foster, Tuecke, 2005

  7. Globus SoftwareEnables Grid Infrastructure • Web service interfaces for behaviors relating to integration and decomposition • Primitives: resources, state, security • Services: execution, data movement, … • Open source software that implements those interfaces • In particular, Globus Toolkit (GT4) • All standard Web services • “Grid is a use case for Web services, focused on resource management”

  8. Python Runtime C Runtime Java Runtime Open Source Grid Software Globus Toolkit v4 www.globus.org Data Replication CredentialMgmt Replica Location Grid Telecontrol Protocol Delegation Data Access & Integration Community Scheduling Framework WebMDS Reliable File Transfer CommunityAuthorization Workspace Management Trigger Authentication Authorization GridFTP Grid Resource Allocation & Management Index Security Data Mgmt Execution Mgmt Info Services CommonRuntime Globus Toolkit Version 4: Software for Service-Oriented Systems, LNCS 3779, 2-13, 2005

  9. http://dev.globus.org Guidelines(Apache) Infrastructure(CVS, email,bugzilla, Wiki) Projects Include … dev.globus — Community Driven Improvement of Globus Software, NSF OCI

  10. Hosted Science Services 1) Integrate services from external sources • Virtualize “services” from providers 2) Coordinate & compose • Create new services from existing ones Community Content Services Provider Services Capacity Provider Capacity “Service-Oriented Science”, Science, 2005

  11. Cardiff AEI/Golm The Globus-BasedLIGO Data Grid LIGO Gravitational Wave Observatory Birmingham• Replicating >1 Terabyte/day to 8 sites >40 million replicas so far MTBF = 1 month www.globus.org/solutions

  12. Data Replication Service • Pull “missing” files to a storage system Data Location Data Movement GridFTP Local ReplicaCatalog Replica LocationIndex Reliable File Transfer Service GridFTP Local Replica Catalog Replica LocationIndex Data Replication List of required Files Data Replication Service “Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005

  13. Example: Biology Public PUMA Knowledge Base Information about proteins analyzed against ~2 million gene sequences Back OfficeAnalysis on Grid Millions of BLAST, BLOCKS, etc., onOSG and TeraGrid Natalia Maltsev et al.,http://compbio.mcs.anl.gov/puma2

  14. Example:Earth System Grid • Climate simulation data • Per-collection control • Different user classes • Server-side processing • Implementation (GT) • Portal-based User Registration (PURSE) • PKI, SAML assertions • GridFTP, GRAM, SRM • >2000 users • >100 TB downloaded www.earthsystemgrid.org — DOE OASCR

  15. Under the Covers

  16. + + + + + + + = Example:Astro Portal Stacking Service • Purpose • On-demand “stacks” of random locations within ~10TB dataset • Challenge • Rapid access to 10-10K “random” files • Time-varying load • Solution • Dynamic acquisition of compute, storage Sloan Data S4 Web page or Web Service Joint work with Ioan Raicu & Alex Szalay

  17. Preliminary Performance (TeraGrid, LAN GPFS) Joint work with Ioan Raicu & Alex Szalay

  18. Spectral Acceleration Hazard Curve Strain Green Tensor Synthetic Seismogram Rupture Forecast Example: Cybershake Calculate hazard curves by generating synthetic seismograms from estimated rupture forecast Hazard Map Tom Jordan et al., Southern California Earthquake Center

  19. TeraGrid Storage TeraGrid Compute VO Scheduler Enlisting TeraGridResources Provenance Catalog Data Catalog Workflow Scheduler/Engine VO Service Catalog SCEC Storage 20 TB, 1.8 CPU-year Ewa Deelman, Carl Kesselman, et al., USC Information Sciences Institute

  20. Gigabytes Tarballs Journals Individuals Community codes Supercomputer centers Makefile Computational science Physical sciences Computational scientists NSF-funded  Terabytes  Services  Wikis  Communities  Science gateways  TeraGrid, OSG, campus  Workflow  Science as computation  All sciences (& humanities)  All scientists  NSF-funded Science 1.0 Science 2.0

  21. Science 2.0 Challenges • A need for new technologies, skills, & roles • Creating, publishing, hosting, discovering, composing, archiving, explaining … services • A need for substantial software development • “30-80% of modern astronomy projects is software”—S. G. Djorgovski • A need for more & different infrastructure • Computers & networks to host services • Can we leverage commercial spending? • To some extent, but not straightforward

  22. Acknowledgements • Carl Kesselman for many discussions • Many colleagues, including those named on slides, for research collaborations and/or slides • Colleagues involved in the TeraGrid, Open Science Grid, Earth System Grid, caBIG, and other Grid infrastructures • Globus Alliance members for Globus software R&D • DOEOASCR, NSF OCI, & NIH for support

  23. For More Information • Globus Alliance • www.globus.org • Dev.Globus • dev.globus.org • Open Science Grid • www.opensciencegrid.org • TeraGrid • www.teragrid.org • Background • www.mcs.anl.gov/~foster 2nd Edition www.mkp.com/grid2 Thanks for DOE, NSF, and NIH for research support!!

More Related