1 / 10

Grid Monitoring

Grid Monitoring. A conceptual introduction to GridICE. Sergio Andreozzi sergio.andreozzi@cnaf.infn.it DataTAG WP4. OUTLINE. Definition of Grid Monitoring Service GridICE: Architecture overview Data flow: from resources to users Current deployment in the DataTAG/Glue Testbed.

Download Presentation

Grid Monitoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Monitoring A conceptual introduction to GridICE Sergio Andreozzi sergio.andreozzi@cnaf.infn.it DataTAG WP4

  2. OUTLINE • Definition of Grid Monitoring Service • GridICE: • Architecture overview • Data flow: from resources to users • Current deployment in the DataTAG/Glue Testbed

  3. Defining Grid Monitoring Service • Grid Monitoring Service: • the activity of measuring significant grid resources related parameters • in order to • analyzeusage, behaviorandperformance of the grid • detect and notify • fault situations • contract violations (SLA) • user-defined events • Essential part of a Grid Management Activity

  4. Components for a Grid Monitoring Service • Measurement Service: • service able to probe the resources for certain parameters (especially QoS related) • Discovery Service: • service able to find out which resources are currently available • rely on Grid Information Service • Detection and Notification Service: • Fault situations, SLA violations, user-defined events • Data Analyzer: • Performance, Usage, general reports and statistics • Presentation Service: • web-based graphic user interface • role-based view

  5. Measurement Service • service able to probe the resources for certain • parameters • Parameters: • Based on Glue Schema (vers. 1.1) • Richer host related parameter set • soon: • Glue Network Service, Job Details Monitoring, Collective Services • Collecting observations: • Worker node related: • Rely on EDG WP4 fmon (customized) in order to collect worker nodes params at cluster head node • Injecting params in the GIS: • Standard EDG4Glue + extensions for worker nodes info

  6. Discovery Service • service able to find out which resources are • currently available: • rely on available Grid Information Service in order to be able to automatic discovery resources • at the moment, MDS 2.x is supported • Porting on LCG-1/EDG 2.0 Grid Information Service is straightforward

  7. Detection and Notification Service • Rely on the following Nagios functionalities: • Activity Scheduler • Event Notification • At the moment a pre-defined set of events is checked for notification • LCG should quickly define the interesting set of events • Dynamic event configuration is foreseen as a low priority development task

  8. Presentation Service • Web-based graphic user interface • Role-based view: • VO-manager • Resources available to the VO • Total running jobs owned by users part of the VO • Site manager • Status of local resources • Grid manager (e.g. Operator of LCG GOC) • Status of all general services • User • Total Accessible Free Processor

  9. Data flow from resources to user Presentation Service Data Collection (historical) Detection & Notification Data Analyzer Grid Information Service Measurement Service

  10. CentralMonitoringDatabase GRIS (GLUE schema) Second discovery phase information providers First discovery phase GIIS (GLUE schema) web interface EDG-WP4 monitoring agent EDG-WP4 monitoring agent EDG-WP4 fmonserver run metric output run metric output WP4 sensor WP4 sensor read read metric output metric output /procfilesystem /procfilesystem cluster worker node cluster workernode An example scenario: cluster ldap query informationindex ldap query monitoringserver write run ldif output farm monitoringarchive read cluster head node

More Related