1 / 71

The Open Science Grid

The Open Science Grid. Miron Livny OSG Facility Coordinator University of Wisconsin-Madison. Some History and background …. U.S. “Trillium” Grid Partnership. Trillium = PPDG + GriPhyN + iVDGL Particle Physics Data Grid: $18M (DOE) (1999 – 2006) GriPhyN: $12M (NSF) (2000 – 2005)

Download Presentation

The Open Science Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Open Science Grid Miron Livny OSG Facility Coordinator University of Wisconsin-Madison

  2. Some History and background …

  3. U.S. “Trillium” Grid Partnership • Trillium = PPDG + GriPhyN + iVDGL • Particle Physics Data Grid: $18M (DOE) (1999 – 2006) • GriPhyN: $12M (NSF) (2000 – 2005) • iVDGL: $14M (NSF) (2001 – 2006) • Basic composition (~150 people) • PPDG: 4 universities, 6 labs • GriPhyN: 12 universities, SDSC, 3 labs • iVDGL: 18 universities, SDSC, 4 labs, foreign partners • Expts: BaBar, D0, STAR, Jlab, CMS, ATLAS, LIGO, SDSS/NVO • Complementarity of projects • GriPhyN: CS research, Virtual Data Toolkit (VDT) development • PPDG: “End to end” Grid services, monitoring, analysis • iVDGL: Grid laboratory deployment using VDT • Experiments provide frontier challenges • Unified entity when collaborating internationally

  4. From Grid3 to OSG OSG 0.2.1 OSG 0.4.0 11/03 2/05 OSG 0.4.1 4/05 9/05 12/05 OSG 0.6.0 2/06 4/06 7/06

  5. What is OSG? The Open Science Grid is a US national distributed computing facility that supports scientific computing via an open collaboration of science researchers, software developers and computing, storage  and network providers. The OSG Consortium is building and operating the OSG, bringing resources and researchers from universities and national laboratories together and cooperating with other national and international infrastructures to give scientists from a broad range of disciplines access to shared resources worldwide.

  6. The OSG Project • Co-funded by DOE and NSF at an annual rate of ~$6M for 5 years starting FY-07 • Currently main stakeholders are from physics - US LHC experiments, LIGO, STAR  experiment, the Tevatron Run II and Astrophysics experiments • A mix of DOE-Lab and campus resources • Active “engagement” effort to add new domains and resource providers to the OSG consortium

  7. OSG Consortium

  8. OSG Project Execution Role includes Provision of middleware Executive Director Ruth Pordes OSG PI Miron Livny OSG Executive Board External Projects √ Resources Managers Paul Avery, Albert Lazzarini Deputy Executive Director Rob Gardner, Doug Olson √ Facility Coordinator Miron Livny √ Education, Training, Outreach Coordinator: Mike Wilde Applications Coordinators Torre Wenaus, Frank Würthwein Security Officer Don Petravick √ √ Operations Coordinator Leigh Grundhoefer Engagement Coordinator Alan Blatecky Software Coordinator Alain Roy √

  9. OSG Principles • Characteristics - • Provide guaranteed and opportunistic access to shared resources. • Operate a heterogeneous environment both in services available at any site and for any VO, and multiple implementations behind common interfaces. • Interface to Campus and Regional Grids. • Federate with other national/international Grids. • Support multiple software releases at any one time. • Drivers - • Delivery to the schedule, capacity and capability of LHC and LIGO: • Contributions to/from and collaboration with the US ATLAS, US CMS, LIGO software and computing programs. • Support for/collaboration with other physics/non-physics communities. • Partnerships with other Grids - especially EGEE and TeraGrid. • Evolution by deployment of externally developed new services and technologies:.

  10. Grid of Grids - from Local to Global National Campus Community

  11. Who are you? • A resource can be accessed by a user via the campus, community or national grid. • A user can access a resource with a campus, community or national grid identity.

  12. OSG sites

  13. running (and monitored) “OSG jobs” in 06/06.

  14. Example GADU run in 04/06

  15. Taiwan UK Italy Purdue Wisconsin UCSD Caltech Florida CMS Experiment - an exemplar community grid OSG EGEE CERN USA France Germany UNL MIT Data & jobs moving locally, regionally & globally within CMS grid. Transparently across grid boundaries from campus to global.

  16. The CMS Grid of Grids Job submission: 16,000 jobs per day submitted across EGEE & OSG via INFN Resource Broker (RB). Data Transfer: Peak IO of 5Gbps from FNAL to 32 EGEE and 7 OSG sites. All 7 OSG sites have reached 5TB/day goal. 3 OSG sites (Caltech, Florida, UCSD) exceeded 10TB/day.

  17. CMS Xfer on OSG All sites have exceeded 5TB per day in June.

  18. CMS Xfer FNAL to World The US CMS center at FNAL transfers data to 39 sites worldwide in CMS global Xfer challenge Peak Xfer rates of ~5Gbps are reached.

  19. EGEE–OSG inter-operability • Agree on a common Virtual Organization Management System (VOMS) • Active Joint Security groups: leading to common policies and procedures. • Condor-G interfaces to multiple remote job execution services (GRAM, Condor-C). • File Transfers using GridFTP. • SRM V1.1 for managed storage access. SRM V2.1 in test. • Publish OSG BDII to shared BDII for Resource Brokers to route jobs across the two grids. • Automate ticket routing between GOCs.

  20. NSF Middleware Initiative (NMI): Condor, Globus, Myproxy OSG Middleware Layering … CMSServices & Framework ATLAS Services &Framework CDF, D0SamGrid & Framework LIGOData Grid Applications OSG Release Cache: VDT + Configuration, Validation, VO management Virtual Data Toolkit (VDT) Common Services NMI + VOMS, CEMon (common EGEE components), MonaLisa, Clarens, AuthZ Infrastructure

  21. OSG Middleware Pipeline Domain science requirements. Condor, Globus, EGEE etc OSG stakeholders and middleware developer (joint) projects. Test on “VO specific grid” Integrate into VDT Release. Deploy on OSG integration grid Test Interoperability With EGEE and TeraGrid Provision in OSG release & deploy to OSG production.

  22. The Virtual Data Toolkit Alain Roy OSG Software Coordinator Condor Team University of Wisconsin-Madison

  23. What is the VDT? • A collection of software • Grid software (Condor, Globus and lots more) • Virtual Data System (Origin of the name “VDT”) • Utilities • An easy installation • Goal: Push a button, everything just works • Two methods: • Pacman: installs and configures it all • RPM: installs some of the software, no configuration • A support infrastructure

  24. How much software?

  25. Who makes the VDT? • The VDT is a product of Open Science Grid (OSG) • VDT is used on all OSG grid sites • OSG is new, but VDT has been around since 2002 • Originally, VDT was a product of the GriPhyN/iVDGL • VDT was used on all Grid2003 sites

  26. Who makes the VDT? 1 Mastermind + 3 FTEs Miron Livny Alain Roy Tim Cartwright Andy Pavlo

  27. Who uses the VDT? • Open Science Grid • LIGO Data Grid • LCG • LHC Computing Grid, from CERN • EGEE • Enabling Grids for E-Science

  28. Why should you care? • The VDT gives insight into technical challenges in building a large grid • What software do you need? • How do you build it? • How do you test it? • How do you deploy it? • How do you support it?

  29. What software is in the VDT? • Job Management • Condor (including Condor-G & Condor-C) • Globus GRAM • Data Management • GridFTP (data transfer) • RLS (replication location) • DRM (storage management) • Globus RFT • Information Services • Globus MDS • GLUE schema & providers • Security • VOMS (VO membership) • GUMS (local authorization) • mkgridmap (local authorization) • MyProxy (proxy management) • GSI SSH • CA CRL updater • Monitoring • MonaLISA • gLite CEMon • Accounting • OSG Gratia Note:The type, quantity, and variety of software is more important to my talk today than the specific software I’m naming

  30. What software is in the VDT? • Client tools • Virtual Data System • SRM clients (V1 and V2) • UberFTP (GridFTP client) • Developer Tools • PyGlobus • PyGridWare • Testing • NMI Build & Test • VDT Tests • Support • Apache • Tomcat • MySQL (with MyODBC) • Non-standard Perl modules • Wget • Squid • Logrotate • Configuration Scripts • And More!

  31. Building the VDT • We distribute binaries • Expecting everyone to build from source is impractical • Essential to be able to build on many platforms, and replicate builds • We build all binaries with NMI Build and Test infrastructure

  32. Building the VDT NMI VDT RPM downloads Build & Test Condor pool (70+ computers) Sources (CVS) Test Users Package Patching Pacman Cache Build Binaries Test Build Binaries … Contributors

  33. Testing the VDT • Every night, we test: • Full VDT install • Subsets of VDT • Current release: You might be surprised how often things break after release! • Upcoming release • On all supported platforms • Supported means “we test it every night” • VDT works on some unsupported platforms • We care about interactions between the software

  34. Supported Platforms • RedHat 7 • RedHat 9 • Debian 3.1 • RHAS 3 • RHAS 3/ia64 • RHAS 3/x86-64 • RHAS 4 • Scientific Linux 3 • The number of Linux distributions grows constantly, and they have important differences • People ask for new platforms, but rarely ask to drop platforms • System administration for heterogeneous systems is a lot of work • Fedora Core 3 • Fedora Core 4 • Fedora Core 4/x86-64 • ROCKS 3.3 • SuSE 9/ia64

  35. Tests • Results on web • Results via email • A daily reminder!

  36. Deploying the VDT • We want to support root and non-root installations • We want to assist with configuration • We want it to be simple • Our solution: Pacman • Developed by Saul Youssef, BU • Downloads and installs with one command • Asks questions during install (optionally) • Does not require root • Can install multiple versions at same time

  37. Challenges we struggle with • How should we smoothly update a production service? • In-place vs. on-the-side • Preserve old configuration while making big changes. • As easy as we try to make it, it still takes hours to fully install and set up from scratch • How do we support more platforms? • It’s a struggle to keep up with the onslaught of Linux distributions • Mac OS X? Solaris?

  38. More challenges • Improving testing • We care about interactions between the software: “When using a VOMS proxy with Condor-G, can we run a GT4 job with GridFTP transfer, keeping the proxy in MyProxy, while using PBS as the backend batch system…” • Some people want native packaging formats • RPM • Deb • What software should we have? • New storage management software

  39. One more challenge • Hiring • We need high quality software developers • Creating the VDT involves all aspects of software development • But: Developers prefer writing new code instead of • Writing lots of little bits of code • Thorough testing • Lots of debugging • User support

  40. Where do you learn more? • http://vdt.cs.wisc.edu • Support: • Alain Roy: roy@cs.wisc.edu • Miron Livny: miron@cs.wisc.edu • Official Support: vdt-support@ivdgl.org

  41. Security Infrastructure • Identity: X509 Certificates • OSG is a founding member of the US TAGPMA. • DOEGrids provides script utilities for bulk requests of Host certs, CRL checking etc. • VDT downloads CA information from IGTF. • Authentication and Authorization using VOMS extended attribute certficates. • DN-> Account mapping done at Site (multiple CEs, SEs) by GUMS. • Standard authorization callouts to Prima(CE) and gPlazma(SE).

  42. Security Infrastructure • Security Process modeled on NIST procedural controls starting from an inventory of the OSG assets: • Management - Risk assessment, planning, Service auditing and checking • Operational - Incident response, Awareness and Training, Configuration management, • Technical - Authentication and Revocation, Auditing and analysis.End to end trust in quality of code executed on remote CPU -signatures?

  43. User and VO Management • VO Registers with with Operations Center • Provides URL for VOMS service to be propagated to the sites. • Several VOMS are shared with EGEE as part of WLCG. • User registers through VOMRS or VO administrator • User added to VOMS of one or more VOs. • VO responsible for users to sign AUP. • VO responsible for VOMS service support. • Site Registers with the Operations Center • Signs the Service Agreement. • Decides which VOs to support (striving for default admit) • Populates GUMS from VOMSes of all VOs. Chooses account UID policy for each VO & role. • VOs and Sites provide Support Center Contact and joint Operations. • For WLCG: US ATLAS and US CMS Tier-1s directly registered to WLCG. Other support centers propagated through OSG GOC to WLCG.

  44. Operations and User Support • Virtual Organization (VO) • Group of one or more researchers • Resource provider (RP) • Operates Compute Elements and Storage Elements • Support Center (SC) • SC provides support for one or more VO and/or RP • VO support centers • Provide end user support including triage of user-related trouble tickets • Community Support • Volunteer effort to provide SC for RP for VOs without their own SC, and general help discussion mail list

  45. Operations Model Real support organizations often play multiple roles Lines represent communication paths and, in our model, agreements. We have not progressed very far with agreements yet. Gray shading indicates that OSG Operations composed of effort from all the support centers

  46. Taiwan, S.Korea Sao Paolo OSG Release Process • ApplicationsIntegrationProvisionDeploy • Integration Testbed (15-20) Production (50+) sites ITB OSG

  47. Integration Testbed • As reported in GridCat status catalog service facility ITB release site Ops map Tier 2 sites status

  48. OSG 0.4.0 Release Schedule Incremental Updates Incremental Updates Incremental Updates (minor release) OSG 1.0.0! OSG 0.8.0 Functionality OSG 0.6.0 OSG 0.4.1 SC4 CMS CSA06 Advanced LIGO 06/06 01/06 03/06 09/06 12/06 03/07 06/07 09/07 12/07 3/08 6/08 9/08 WLCG Service Commissioned ATLAS Cosmic Ray Run

  49. OSG Release Timeline Production OSG 0.2.1 OSG 0.4.0 2/05 OSG 0.4.1 4/05 9/05 12/05 ITB 0.1.2 OSG 0.6.0 2/06 4/06 ITB 0.1.6 7/06 ITB 0.3.0 ITB 0.3.4 Integration ITB 0.3.7 ITB 0.5.0

  50. Deployment and Maintenance • Distribute s/w through VDT and OSG caches. • Progress technically via VDT weekly office hours - problems, help, planning - fed from multiple sources (Ops, Int, VDT-Support, mail, phone). • Publish plans and problems through VDT “To do list”, Int-Twiki and ticket systems. • Critical updates and patches follow Standard Operating Procedures.

More Related