1 / 36

Operations and Support for Grid Environments

This presentation discusses the OSG operations model and implementation, including monitoring instruments, support workflows, and conclusions.

leolam
Download Presentation

Operations and Support for Grid Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Operations and Support for Grid Environments Leigh Grundhoefer Indiana University leighg@indiana.edu

  2. Agenda • Introduction to OSG Operations model and implementation • Monitoring Instruments • Support Workflows • Conclusions leighg@indiana.edu

  3. Defining Grid Support • What kind of infrastructure? • Definition of “instrumentation” software • Deployment policies and procedures • Error handling methods • What is the structure for the support? • Try to reduce duplication of effort • Integration of grid support to a variable set of existing resource provider support mechanisms • Interfacing support staff and grid experts leighg@indiana.edu

  4. Integrating grid support NOC Facility Machine Operators Support Security Czar Grid ops Network Admins System Admin Resources leighg@indiana.edu

  5. Ops Storage Security Integration site admins Activity OSG landscape VOs & apps TG Mon&Info TG Policy Arch MIS Policy OSG deployment TG Storage Support Centers Technical Group oversees Operations Activity (Ops) TG Security TG Support Centers Chairs leighg@indiana.edu

  6. Operations Scope • Runs the grid-wide services including provisioning and installation of middleware and operational support for those services, resource providers and VO's running on OSG. • Coordinates with other Grids and between support organizations. • Applies Users and Service Agreements • Provides a repository for collected registrations and agreements of participating organizations leighg@indiana.edu

  7. leighg@indiana.edu

  8. leighg@indiana.edu

  9. Engineering • Maintained grid-controlled software packages and cache • Provide common grid software support through VDT • Verify software compatibilities • Provision releases of the OSG middleware and services • Troubleshoot service failures • Deployment guidance and assistance • liason to other service support centers • Monitor status of grid resources • Publish monitoring information for grid resources leighg@indiana.edu

  10. Infrastructure • Trouble Ticketing system and interface • Monitoring tools development and maintenance • Accounting services • Discovery services • Identity services • Grid information index • Grid Catalog • VO-level services for monitoring services • Knowledge base • Mailing Lists • Formal and collaborative web information repositories leighg@indiana.edu

  11. Provisioning Tasks • Set up the pre-release candidates for production installation tests • Add version control to production release • Deploy and validate auxiliary services • Adjust middleware configuration setup • Pre-release testing of production installation • Pre-release test of services • Full documentation preparation • Installation Manuals • Releases Notes, Change Logs, Patches, Upgrades • Description of the services provided for the release and access information leighg@indiana.edu

  12. Support Services • Coordinates and Tracks: • problems for service providers • Security incidents • Requests for assistance • Schedule grid service and middleware changes • Monitor policy compliance • Detailed later in this talk leighg@indiana.edu

  13. Agenda • Introduction to OSG Operations model and implementation • Monitoring Instruments • Support Workflows • Conclusions leighg@indiana.edu

  14. OSG Service Integration • Grid Catalog -- GridCat • MonaLisa • MIS-CI • New “core” monitoring services leighg@indiana.edu

  15. Integrated Monitoring Framework • Globus Meta Directory System (LDAP directory) • MonALISA, Monitoring Agents in Large Integrated Service Architecture (Pub/Sub) • MonALISA repository (WS/WAP) • Ganglia performance monitoring (Multicast/Hierarchical) • Job Monitoring System at the Advanced Center for Distributed Computing (non invasive archive) • The Grid Site Status Cataloging System at iGOC (human/automatic managed DB) leighg@indiana.edu

  16. Grid Telemetry • Information • Site list • Test result • Load • Jobs running • Jobs queued • Heterogeneous • Redundant leighg@indiana.edu

  17. What is GridCat ? • AGridSiteStatusCataloging System - A Web App. • High level simple status map : • Computing Resource Information Collector/Presenter • Static and dynamic information about all sites • Simple grid status presentation on the web • Identifies site readiness • A web application easy to develop and deploy • Displays disk space and CPU slots • Parallel information collecting, storing, and archiving among sites (BE) • Web pages: In templated html+php+js(FE) leighg@indiana.edu

  18. iGOC GridCat View leighg@indiana.edu

  19. iGOC MonaLisa View leighg@indiana.edu

  20. iGOC MonaLisa View (partial) leighg@indiana.edu

  21. OSG MIS advancement • Monitoring and information Services - Core Infrastructure (MIS-CI) • MIS Compute Element and Storage Element • Discovery Service • Consumer Interface • Resource Repository • Information Gathering leighg@indiana.edu

  22. MIS-CI Resource Throttling Discovery Service User SQL-Lite Local Historical Repository SQL-Lite Remote VO Historical Repository Cron jobs Schema Eval gridftp Resource Gatekeeper diskspace • jobmanager-mis • -profile (default) • jobs • gridftp • diskspace • -accounting • policy • environment • software • VO • statistics • security SQL-Lite Database accounting MIS-CI Consumer Throttling Self- Monitor policy environment software VO SQL-Lite Database Backup statistics MIS-CI Profiles security ? Custom Remote Repository Grid Scheduler OSG Auditing/Accounting MIS-CI Architecture leighg@indiana.edu

  23. Agenda • Introduction to OSG Operations model and implementation • Monitoring Instruments • Support Workflows • Conclusions leighg@indiana.edu

  24. Grid Operations Center Operations Indiana iGOC VO Support Centers Service Support Centers • Provisioning • Ops procedures • Coordination Resource Provider Support Centers leighg@indiana.edu

  25. Leveraging the NOC • Global NOC at Indiana University • The Global NOC provides 24x7 network engineering and operations services for research and education networks and international interconnections, including Internet2 Abilene, National LambdaRail, TransPAC and AMPATH networks, the STAR TAP and MANLAN layer 3 international exchange points, and the STAR LIGHT optical exchange. In addition, the Global NOC supports activities of the iVDGL Grid Operations Center and the REN-ISAC cybersecurity Watch Desk. By virtue of the R&E network, grid, and cybersecurity activities, the Global NOC possesses a unique and embracing view of R&E cyberinfrastructure. leighg@indiana.edu

  26. NOC Grid Systems and Services(run every 15m) Trouble Tickets • Ticket 894 GOC NOC Mon Nagios Contact DB Monitoring the GOC services leighg@indiana.edu

  27. http://www.ivdgl.org/grid3 leighg@indiana.edu

  28. Problem to Trouble Ticket • Scope • A single resource / Multiple resources • Application wide • VO wide • Grid wide • Operations Resource/ Operations Service • Severity • Critical, High, Elevated, Normal • Problem Owner • Problem Contact • Problem Description leighg@indiana.edu

  29. Monitoring Event GOC Site Fails Grid Catalog Test (run every 5 hours) Trouble Tickets NOC Monitors Grid Catalog Map • Ticket 854 Grid Experts GOC Mon GridCat MonaLisa Contact DB Security/Incidence Handling Resource VO support Or Facility Resouce Resource leighg@indiana.edu

  30. Trouble Tickets • Ticket 803 • Ticket 823 • Ticket 833 • Ticket 843 Reactive Support workflow igoc@ivdgl.org GOC Web form & Telephone Grid Experts Web Docs Developers Contact DB User/Admin Application Failure Planned Outages Security problems Installation help Configuration assistance Identity management Authorization problems Other Support Centers Security/Incidence Handling leighg@indiana.edu

  31. Agenda • Introduction to OSG Operations model and implementation • Monitoring Instruments • Support Workflows • Conclusions leighg@indiana.edu

  32. Operations Enables Applications • Provide operational services that provide Applications with the “instruments” to: • Publish site policies and environment • Know the status of grid middleware on sites • Know the job queue for compute resources • Know the status and load of grid resources • Access historical monitoring information • Manage grid services • Keep apprised of security incidents in the collaborative leighg@indiana.edu

  33. Lessons Learned • Configuration management efforts in the development and deployment areas are rewarded many times over during production. • A monitoring infrastructure allows a significant problem solving advantage, esp. redundant monitoring. • Establishment of clear communications between resources providers, users and Virtual Organizations is hard. leighg@indiana.edu

  34. More Lessons Learned • Human interactions in grid building costly • Keeping resource provider requirements light lead to heavy loads on gatekeeper hosts ( monitoring framework ) • Diverse set of resource configurations made jobs requirements exchange difficult • Troubleshooting: efficiency for submitted jobs was not as high as we’d like. leighg@indiana.edu

  35. Upcoming Challenges • Shared problem handling with application-centric and VO centric support structures • Ticket passing to and from other Grid environments • Establishing a working monitoring framework for distributed storage resources and virtual data cataloging infrastructure leighg@indiana.edu

  36. Thank You - leighg@indiana.edu

More Related