1 / 20

Open Science Grid at Condor Week

Open Science Grid at Condor Week. Ruth Pordes Fermilab April 25th 2006. Outline. OSG goals and organization Drivers and use today Middleware Focus and roadmap. What, Who is Open Science Grid?. High Throughput Distributed Facility

lsaldana
Download Presentation

Open Science Grid at Condor Week

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Open Science Gridat Condor Week Ruth Pordes Fermilab April 25th 2006

  2. Outline • OSG goals and organization • Drivers and use today • Middleware • Focus and roadmap

  3. What, Who is Open Science Grid? • High Throughput Distributed Facility • Shared opportunistic access to existing clusters, storage and networks. • Owner controlled resources and usage policies. • Supports Science • Funded by NSF and DOE projects. • Common technologies & cyber-infrastructure. • Open & Inclusive • Collaboration of users, developers, grid technologists & facility administrators. • Training & help for existing & new administrators and users • Heterogeneous

  4. OSG Organization

  5. OSG Organization Executive Director: Ruth Pordes Facility Coordinator: Miron Livny Application Coordinators: Torre Wenaus & fkw Resource Managers: P. Avery & A. Lazzarini Education Coordinator: Mike Wilde Engagement Coord.: Alan Blatecky Middleware Coord.: Alain Roy Ops Coordinator: Leigh Grundhoefer Security Officer: Don Petravick Liaison to EGEE: John Huth Liaison to Teragrid: Mark Green Council Chair: Bill Kramer Condor project

  6. OSG Drivers: LIGO- gravitational wave physics; STAR - nuclear physics, CDF, D0, - high energy physics, SDSS - astrophysics GADU - bioinformatics Nanohub etc. • Research groups transitioning from & extending (legacy) systems to Grids: • Evolution of iVDGL, GriPhyN, PPDG • US LHC Collaborations • Contribute to & depend on milestones, functionality, capacity of OSG. • Commitment to general solutions, sharing resources & technologies; • Application Computer Scientists • Contribute to technology, integration, operation. • Excited to see technology put to production use! • Federations with Campus Grids • Bridge & interface Local & Wide Area Grids. • Interoperation & partnerships with national/ international infrastructures • Ensure transparent and ubiquitous access. • Work towards standards. NMI, Condor, Globus, SRM GLOW, FermiGrid, GROW, Crimson, TIGRE EGEE, TeraGrid, INFNGrid

  7. LHC Physics gives schedule and performance stress: • Beam starts in 2008: • Distributed System must serve 20PB of data in served across 30PB disk distributed across 100 sites worldwide to be analyzed by 100MSpecInt2000 of CPU. • Service Challenges give steps to full system 1 GigaByte/sec

  8. Priority to many other stakeholders • New science enabled by opportunistic use of resources • E.g. From OSG Proposal: LIGO : With an annual science run of data collected at roughly a terabyte of raw data per day, this will be critical to the goal of transparently carrying out LIGO data analysis on the opportunistic cycles available on other VOs hardware • Opportunity to share use of “standing army” of resources • E.g. From OSG news: Genome Analysis and Database Update system, uses OSG resources to run BLAST, Blocks and Chisel for all publicly available genome sequence data used by over 2,400 researchers worldwide GADU processes 3.1 million protein sequences, about 500 batch jobs per day for several weeks, to be repeated every monthly. • Interface existing computing and storage facilities and Campus Grids to a common infrastructure. • E.g. FermiGrid Strategy: To allow opportunistic use of otherwise dedicated resources. To save effort by implementing shared services. To work coherently to move all of our applications and services to run on the Grid.

  9. OSG Use - the Production Grid

  10. OSG Sites Today 23 VOs More than 18,000 batch slots registered. • … but only 15% of it used via grid interfaces that are monitored. • Large fraction of local use rather than grid use. • Not all registered slots are available to grid users. • Not all available slots are available to every grid user. • Not all slots used are monitored.

  11. “bridged” GLOW jobs Genome analyis (GADU) Use - Daily Monitoring04/23/2006 2000 running jobs 500 waiting jobs

  12. Use - 1 Year View Deploy OSG Release 0.4.0 LHC OSG “launch”

  13. Common Middleware provided through Virtual Data Toolkit Domain science requirements. Globus, Condor, EGEE etc OSG stakeholders and middleware developer (joint) projects. Test on “VO specific grid” Integrate into VDT Release. Deploy on OSG integration grid Include in OSG release & deploy to OSG production. Condor project

  14. Grid Access: X509 User certificate extended using EGEE-VOMS. Condor-G job submission. Jobs may place initial data. Middleware Services VO Management: EGEE-VOMS extends certificate with attributes based on Role. Grid Site Interfaces Monitoring & Information: Processing Service: Job Submission: GRAM (pre-WS or WS) Overhead Mitigations: Condor Gridmonitor; Condor Managed Fork Queue Site and VO based Authorization and Account Mapping Data Movement: GridFTP Identity & Authorization: GUMS Certificate to Account Mapping. Data Storage and Management: Storage Resource Management (SRM) or Local Environment Variables ($APP, $GRID_WRITE etc) Shared File Systems Site and VO based Authorization and Access mapping

  15. Providing a OSG allows you to S U R F Secure Distributed Facility Usable Reliable Flexible The Grid

  16. Security - Management, Operational and Technical Controls • Management: Risk assessment and security planning, Service auditing and checking. • Operational: Incident response, Awareness and Training, Configuration management. • Technical: Authentication and Revocation, Auditing and analysis.End to end trust in quality of code executed on remote CPU.

  17. Usability - Throughput, Scaling, Fault Diagnosis • OSG users need high throughput. • Focus on effective utilization. • VO control of use: examples: • Batch system priority in based on VO activity. • Control Write-authorization and quotas based on VO role. • Prioritize data transfer based on role of the user. • Usability Goals include: • Minimize entry threshold for resource owners • Minimize software stack. • Minimize support load. • Minimize entry threshold for users • Feature rich software stack. • Excellent user support.

  18. Reliability - Central Operations Activities • Automated validation of basic services and site configuration Configuration of HeadNode and Storage to reduce errors: • Remove dependence on Shared File System • Condor-managed GRAM fork queue • Scaling tests of WS-GRAM and GridFTP. Daily Grid Exerciser:

  19. Flexibility • Multiple user interfaces. • Support for multiple versions of the Middleware Releases. • Heterogeneous resources. • Sites have local control of use of the resources - including storage and space (no global file system). • Distributed support model through VO and Facility Support Centers which contribute to global operations.

  20. Finally - the OSG Roadmap Sustain the Users, the Facility, and the Experts. Continually improve effectiveness and throughput. Capabilities and schedule driven by Science Stakeholders - maintain production system while integrating new features. Enable new Sites, Campus and Grids to contribute while maintaining self-control.

More Related