Overview of monitoring tools
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

Overview of monitoring tools for Grid Systems Varenna , 12 May 2008 Antonio Pierro PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on
  • Presentation posted in: General

Overview of monitoring tools for Grid Systems Varenna , 12 May 2008 Antonio Pierro INFN-BARI (Italy) Antonio.pierro <at> ba.infn.it. Outlines. Overview of EGEE monitoring tools: SAM (Service Availability Monitoring) GridMap GStat (Global Grid Information Monitoring System) GridView

Download Presentation

Overview of monitoring tools for Grid Systems Varenna , 12 May 2008 Antonio Pierro

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

Overview of monitoring tools

for Grid Systems

Varenna, 12 May 2008

Antonio Pierro

INFN-BARI (Italy)

Antonio.pierro <at> ba.infn.it

Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


Outlines

Outlines

Overview of EGEE monitoring tools:

SAM (Service Availability Monitoring)

GridMap

GStat (Global Grid Information Monitoring System)

GridView

GridICE (infrastructure and application monitoring)

Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

Why do we need monitoring?

  • Resource Utilization and Performance Evaluation

    • Resources observability is needed for an optimized Grid utilization

  • Management Decisions

    • To reduce time spent waiting for Resource Availability

    • Be always aware of what is happening

  • Debugging purposes

    • to help the operations team locate and troubleshoot the problems

    • Grid resources and services are subject to failures

  • Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    Requirements for a Grid Monitoring tool

    • Scalable

    • Dynamic

    • Robust

    • Should be integrated with other Grid Technologies and middleware (security infrastructure, resource brokers, schedulers, ...)

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Sam introduction

    SAM (introduction)

    Service Availability Monitoring framework (SAM) :

    Monitoring all grid services and nodes not only CE

    It is used in the validation process of sites and services

    SAM wiki : http://goc.grid.sinica.edu.tw/gocwiki/SAM

    SAM portal : https://lcg-sam.cern.ch:8443/sam/sam.py

    Service and Site status are recorded (several snapshots per day)

    Daily, weekly, monthly availability is calculated using integration (averaging) over the given period

    Official evaluation of T0,T1 and T2 sites.

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Sam performed tests 1 2

    SAM(performed tests) 1/2

    CE

    job submission - UI->RB->CE->WN chain

    version of CA certificates installed (on WN!) and software middleware (on WN!)

    replica management tests-using lcg-utils,default SE defined on WN and a selected “central” SE

    accessibility of experiments software directory - environment variable, directory existence

    accessibility of VO tag management tools

    other tests: R-GMA client check, Apel accounting records

    SE, SRM

    storing file from the UI - using lcg-cr command with LFC registration

    getting file back to the UI - using lcg-cp command

    removing file - using lcg-del command with LFC de-registration

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Sam performed tests 2 2

    SAM(performed tests) 2/2

    LFC

    directory listing - using lfc-ls command on /grid

    creating file entry in /grid/<VO> area

    FTS

    checking if FTS is published correctly in the BDII

    channel listing - using glite-transfer-channel-list command with ChannelManagement service

    transfer test (in development):

    Standalone tests

    GSTAT, RB

    VO specific tests as well

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    SAM - CE sensor TestsFrance Region, VO OPS


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    SAM - CE sensor TestsFrance Region, VO OPS

    OK: normal status

    subject may fail soon

    • *** Running R-GMA client test on alifarm57.ct.infn.it ***

    • Inserting tuple: ERROR: Could not contact R-GMA server at grid005.ct.infn.it:8443 –

    • (104, 'Connection reset by peer')

    • ERROR: Could not contact R-GMA server at grid005.ct.infn.it:8443 –

    • (104, 'Connection reset by peer') Failed Timeout when executing test

    • CE-sft-rgma after 600 seconds!

    Errror: subject has failed and problem is localized


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    GridMAP

    • It publishes the same data of SAM in a different way

    • Is a simple interactive and user-friendly interface to see the state of Grid

    • Sites or services of the Grid are represented by rectangles of different size and colour allowing two dimensions of data to be visualized simultaneously.

    • This representation of monitoring data requires much less space than conventional sorted tables or bar charts.

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    GridMAP

    GridMap Prototype – visualizing the state of the grid

    Daily availability

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)

    the state of the grid – SAM test


    Gridview 1 2

    GridView 1/2

    It is a visualization system for viewing monitoring information

    Approach:

    Collections monitoring information from different sources, e.g.:

    SAM, GridFTP monitor, RB Logs

    The records of monitoring information are in a central Oracle database at CERN

    Visualizations of summary data through Web interface

    Target: Grid operators, Site administrators, VO managers

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Gridview web page 2 2

    GridView (web page) 2/2

    Statistic of data transfert

    jobs running

    service availability

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Gstat 1 2

    GStat 1/2

    GStat is built using Python scripts that generate web based reports used by Grid site administrators to troubleshoot Information System issues or access usage information.

    GStat scripts are executed periodically to query and collect the information published by each site in the Grid Infrastructure.

    The information published is then processed by extensible analysis framework that checks for IS failures and errors.

    Target:

    Grid operators

    Site administrators

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Gstat 2 2

    GStat 2/2

    The main page of GStat shows the overall status and usage statistic for each site.

    GStat site detailed report

    GStat site resource status

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    GridICE: Overview

    • It is a distributed monitoring tool for Grid systems

      • is evolving in the context of EU-EGEE and many other EU Grid projects

    • fully integrated with the gLite-3.x Middleware

    • Self-configurable collection and presentation

      • just give the URL of the root Grid Information Service (GIS)

    • Installedservers are monitoringGridresources in the scope of:

    EGEE EGEE-SWE RDIG EGEE-SEE Grid.it GILDA CMS ATLAS EUMedGrid EUChinaGrid

    EUIndiaGrid BalticGrid LIBI BioinfoGRID EELA OMIIBeGrid

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    Recent evolution of GridICElightweight sensor + VOMS information

    • Attributes measured by the Job Monitoring sensor

    • To reduce its intrusiveness in terms of resources consumption:

    • Two daemons running and a probe executed periodically

    • They listen to a set of log files and collect the relevant information

    • Few LRMS commands to retrieve jobs status

    • The status of all jobs is stored in a cache (stateful behaviour)

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    Integration with local

    monitoring systems (LEMON)

    • Gridmonitoringintegratedwithlocalmonitoring

    • The last server versionisverysimpletoinstall

      • The client installation may be turned on in the standard middleware LCG installation (no additional operation are needed)

    • The LEMON monitoring system and alarm management are integrated in the new version of the GridICE server

    • The local sensor currently used for farm monitoring can be interfaced with GridICE to collect all the available data

    • The back-end is realized with LEMON

      • Local farm monitoring that are using LEMON can be integrated with GridICE

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Lrmsinfo

    LRMSinfo

    The LRMS Info sensor provides aggregated information of the Local Resource Manager System

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    We focus on the following categories of users

    We focus on the following categories of users:

    VO manager

    actual set of resources accessible to VO members: “How many jobs submitted by my users are running or queued?” (with details of the VOMS groups and/or single user)

    Grid operator

    all resources under responsibility of a Grid Operator Center (“How many resources are available?”)

    Site administrator

    site resources offered to a Grid (“Is there any service down?”)

    Grid users

    The status of their jobs on a grid.

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    How do we identify the user role

    How do we identify the user/role?

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)

    • The users are identified with the digital certificate installed in its browser

      • a valid CA certificate

      • server based on https protocol

    • The new sensor are able to retrieve the VOMS information

      • VOMS information: groups and roles of users submitting the jobs

      • The related role (e.g., site manager, VO manager) can be retrieved by GridICE database.


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    “Standard user ” monitoring (1)

    • User that has no jobs submitted and no role registered

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    “Standard user ” monitoring (2)

    An authenticated user sees only his/her own jobs

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    “Standard user ” monitoring (3)

    An authenticated user sees only his/her own jobs

    exit status = 0 => successfully jobs

    exit status <> 0 =>failure jobs

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    Grid monitoring from the VO Manager perspectives

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    Grid monitoring from the Site Manager perspectives

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    Acronyms and Abbreviations (1):

    ACL - Access Control List

    APEL - Accounting Processor for Event Logs

    API - Application Programming Interface

    BDII - Berkeley Database Information Index

    CA - Certificate Authority

    CE Computing Element: a Grid-enabled computing resource

    CERN - European Organisation for Nuclear Research

    GIIS - Grid Index Information Service. MDS index node. Aggragates information

    dCache - (disk pool management system)

    DN - Distinguished Name (X.500, LDAP)

    EGEE - Enabling Grids for E-sciencE

    FTS - File Transfer Service (EGEE)

    GARR - Gruppo per l'Armonizzazione delle Reti della Ricerca

    GGUS - Global Grid User Support

    GIIS - Grid Information Index Server

    GILDA - Grid Infn Laboratory for Dissemination Activities

    GRIS - Grid Resource Information Service. Collects information for MDS.

    IN2P3 - Institut National de Physique Nucléaire et de Physique des Particules

    INFN - Istituto Nazionale di Fisica Nucleare (in Italy)

    ISO - International Standardization Organization

    JDL - Job Description Language

    LB - Logging and Bookeeping service

    LEMON - LHC Era Monitoring

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    Acronyms and Abbreviations (2):

    LCG - LHC Computing Grid

    LDAP - Lightweight Directory Access Protocol

    LDIF - LDAP Data Interchange Format

    LDN - Logical Dataset Name

    LFC - LCG File Catalog

    LFN - Logical File Name

    LHC - Large Hadron Collider. Under construction. Hosts CMS, ATLAS, and other experiments.

    LRMS - Local Resource Management System

    MDS - Meta Directory Service, or Monitoring and Discovery Service (Globus)

    MPI - Message Passing Interface (Globus)

    PhEDEx - Physics Experiment Data Export (CMS)

    RFIO - Remote File I/O

    R-GMA - Relational Grid Monitoring Architecture (EGEE). A monitoring system similar to MDS

    ROC - Regional Operations Centre

    RLS - Replica Locator Service

    SE - Storage Element

    SOAP - Simple Object Access Protocol

    SRM - Storage Resource Management

    VO - Virtual Organization, e.g., an experiment

    VOBOX - VO box

    VOMRS - Virtual Organization Management Registration Service

    VOMS - VO Management Service

    X.509 - (ITU-T standard for Public-key and attribute certificate frameworks)

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    References

    References

    SAM

    http://goc.grid.sinica.edu.tw/gocwiki/SAME_Planning

    https://lcg-sam.cern.ch:8443/sam/sam.py?sensors=CE&regions=

    GRIDMAP

    http://gridmap.cern.ch/gm/

    http://cerncourier.com/cws/article/cnl/31986

    Gstat

    http://goc.grid.sinica.edu.tw/gstat/

    GridView:

    Portal: http://gridview.cern.ch/

    TWiki: https://twiki.cern.ch/twiki/bin/view/LCG/GridView

    GridICE:

    http://gridice.forge.cnaf.infn.it/

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    Conclusions

    • There are several monitoring tools available for the Grid system

    • Which tool do you use?

      • It depends by your role in grid

    • Sometimes you could use more tools at the same time to satisfy your needs

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


    Overview of monitoring tools for grid systems varenna 12 may 2008 antonio pierro

    Thank You

    Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)


  • Login