1 / 39

Regional Grid Monitoring Introduction & database components

Regional Grid Monitoring Introduction & database components. Wojciech Lapka SAM Team CERN EGEE’09 Conference, 21 - 25 September 2009, Barcelona. Outline. Introduction to the new Service Availability Monitoring System Description of the Database Components

gaerwn
Download Presentation

Regional Grid Monitoring Introduction & database components

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regional Grid MonitoringIntroduction & database components Wojciech Lapka SAM Team CERN EGEE’09 Conference, 21 - 25 September 2009, Barcelona

  2. Outline Introduction to the new Service Availability Monitoring System Description of the Database Components • Aggregated Topology Provider (ATP) • Metric Description Database (MDDB) • Metric Results Store (Metric Store)

  3. Outline Introduction to the new Service Availability Monitoring System Description of the Database Components • Aggregated Topology Provider (ATP) • Metric Description Database (MDDB) • Metric Results Store (Metric Store)

  4. SAM – existing architecture

  5. SAM - enhanced architecture

  6. Data Flow

  7. Data Flow

  8. Data Flow

  9. Data Flow

  10. Data Flow

  11. Data Flow

  12. Data Flow

  13. Data Flow

  14. Data Flow

  15. MyEGEE portal & iGoogle

  16. Outline Introduction to the new Service Availability Monitoring System Description of the Database Components • Aggregated Topology Provider (ATP) • Metric Description Database (MDDB) • Metric Results Store (Metric Store)

  17. Databases - ATP ? ? ? How it will be tested? What to do with test results? What will be tested?

  18. Databases - ATP ? ? Aggregated Topology Provider How it will be tested? What to do with test results? What will be tested?

  19. Databases - ATP What information is provided by the ATP? • Topology information containing: • Projects (WLCG) and grid infrastructures (EGEE, OSG, NDGF) • Sites, Services, VOs and their groupings • Downtimes • A history of the above Why do we need it? • For availability re-calculations, history of grid topology is needed • We couldn’t name groups of arbitrary grid resources (e.g. ATLAS clouds) • Single authoritative information source with topology information

  20. ATP - why do we need it? • Current flow of Grid topology data across various monitoring tools:

  21. ATP - why do we need it? Streamlined grid topology data flow using the ATP:

  22. ATP – data sources OSG IM GOCDB BDII VO / service mappings OSG topology & downtimes VO feeds Installed capacity Alice Voboxes Gstat 2.0 ATP sync Aggregated Topology Provider Project feeds VO cards WLCG MOU Portal CIC Portal EGEE topology & downtimes

  23. ATP – status What do we have today? • MySQL and Oracle version • Synchronizer • A programmatic interface to retrieve ATP information (XML/JSON):

  24. ATP – status What needs to be added? • History tables to record changes in topology information • Programmatic Interface - parameterised queries (similar to SAM PI)

  25. Databases ? ? Aggregated Topology Provider How it will be tested? What to do with test results? What will be tested?

  26. Databases - MDDB ? Metric Description Database Aggregated Topology Provider How it will be tested? What to do with test results? What will be tested?

  27. Databases - MDDB What information is provided the MDDB? • Metrics which are used to test Grid infrastructure • Profiles – combination of metrics for computation of different availabilities and configuration of Nagios installations Why do we need it? • More flexible availability calculations: • Example: CMS would like to test Tier-1 and Tier-2 sites differently • Maintain a history of which metrics and calculations were valid at each point in time

  28. MDDB - Architecture CENTRAL MDDB MDDB Sync Local Cache

  29. MDDB - Status What do we have today? • MySQL and Oracle version • Integration with ATP • Web User Interface • A programmatic interface to retrieve MDDB information (JSON) What needs to be added? • Synchronizer between Central DB and local (ROC) caches • Interface for populating and querying profiles • Profiles: Mapping with grid resources

  30. Databases ? Metric Description Database Aggregated Topology Provider How it will be tested? What to do with test results? What will be tested?

  31. Databases – Metric Store Metric Results Store Metric Description Database Aggregated Topology Provider How it will be tested? What will be tested? What to do with test results?

  32. Databases – Metric Store What information is provided by the Metric Store? • Metric results for service end-points for the grid infrastructure • Status changes for service end-points in the infrastructure What do we have today? • MySQL and Oracle versions: • Integration with MDDB and ATP • Per-service status change calculation for Profiles • Data loader • Data from 11 ROCs is being loaded to Central Metric Store: • Some of the records rejected (Mainly due to service end-points not defined correctly in GOCDB)

  33. Metric Store – status What needs to be added: • MySQL – tuning of DB (e.g. table partitioning) • Programmatic Interface - parameterised queries • Purging mechanism • Alerting mechanism integrated with Nagios (e.g. when not enough metric results received in given period of time)

  34. Central Metric Store Population Active & Passive Checks Results Service Definition Metric & Profile Definition

  35. Outline Introduction to the new Service Availability Monitoring System Description of the Database Components • Aggregated Topology Provider (ATP) • Metric Description Database (MDDB) • Metric Results Store (Metric Store) Publicity

  36. Publicity - Demo Watch our demo and vote for it: • Tuesday 16:30-17:00 • Wednesday lunch • http://tinyurl.com/EgeeSAM (YouTube) • http://www.youtube.com/watch?v=PADq2x8q0kw

  37. Acknowledgments Thanks to the following people for their contributions: • James Casey (CERN) • Emir Imamagic (SRCE) • Pradyumna Joshi (BARC) • Rajesh Kalmady (BARC) • Vaibhav Kumar (BARC) • Steve Traylen (CERN) SAM Team at CERN: • John Shade • David Collados • Karolis Eigelis • Judit Novak • Konstantin Skaburskas

  38. Summary New enhanced SAM system, based on Nagios - a very popular powerful open-source tool, will: • Simplify transition to the EGI era • Help site administrators with fabric monitoring ATP, acting as a single authoritative information aggregator, will simplify the job of assimilating grid resource information MDDB will allow flexible availability calculations Metric Results Store will help MyEGEE portal in displaying of the test results. Demo: http://tinyurl.com/EgeeSAM

  39. Thank you! • Questions? • egee3-operations-automation-discuss@cern.ch

More Related