Grid discovery and monitoring systems
1 / 51

Grid Discovery and Monitoring Systems - PowerPoint PPT Presentation

  • Uploaded on

Grid Discovery and Monitoring Systems. Laura Pearlman USC/Information Sciences Institute With materials from Ben Clifford and others from the Globus Project Team. Outline. Overview of information systems Some real implementations Globus MDS2 / BDII Globus MDS4 Inca GMA / R-GMA.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Grid Discovery and Monitoring Systems' - ulric

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Grid discovery and monitoring systems

Grid Discovery and Monitoring Systems

Laura Pearlman

USC/Information Sciences Institute

With materials from Ben Clifford and others from the Globus Project Team


  • Overview of information systems

  • Some real implementations

    • Globus MDS2 / BDII

    • Globus MDS4

    • Inca

    • GMA / R-GMA

Discovery and monitoring
Discovery and Monitoring

  • Discovery: finding resources that exist, at any moment, possibly meeting some criteria

    • E.g., “find linux boxes with Java 1.5 installed”

  • Monitoring: determining the state of one or more resources

    • E.g., “how much memory is free on machine X”?

  • “Monitoring” and “Discovery” information sometimes overlap

    • “find me machines with 2G memory” vs. “how much memory does Machine X have”

Examples of useful information
Examples of Useful Information

  • Characteristics of a compute resource

    • Software available, networks connected to, load, type of CPU, disk space

  • Characteristics of a network

    • Bandwidth and latency, protocols

  • Information about a service

    • Contact info, version number, etc.

Who uses this information
Who uses this information?

  • Individual users, trying to pick the ‘best’ resource

  • Brokers or workflow systems trying to find suitable resources

  • VO administrators who want to know the state of every resource.

    • System administrators may use this information, but probably also have local site monitoring systems in place

What interfaces are needed
What Interfaces are Needed?

  • Graphic and command-line interfaces for individual users and administrators

  • Programmatic interfaces for brokers, workflow systems, etc.

  • Asynchronous notifications for administrators

    • “send me mail when we’re almost out of disk space”

Monitoring discovery problems in grids
Monitoring/Discovery Problems in Grids

  • Dynamic in nature

    • VOs come and go

    • Resources join and leave VOs

    • Resources change status and fail

  • Geographically distributed users

  • Geographically distributed resources

  • Heterogeneous implementations

Grid information facts of life
Grid Information: Facts of Life

  • Information is always old

  • Distributed state hard to obtain

  • Components will fail

    • We must deal with this gracefully

  • Scalability and overhead

  • Many different usage scenarios

Resource discovery monitoring
Resource Discovery/Monitoring













dispersed users














  • Distributed users and resources

  • Variable resource status

  • Variable grouping



Resource discovery monitoring1
Resource Discovery/Monitoring













dispersed users














  • Some resources have failed

  • A network partition has occurred

  • Still, some work can get done…




  • Large numbers

    • Many resources

    • Many users

  • Independence

    • Resources shouldn’t affect one another

    • VOs shouldn’t affect one another

  • Graceful degradation of service

    • “As much function as possible”

    • Tolerate partitions, prune failures

Failure scenarios
Failure Scenarios

  • User is disconnected

  • Resource fails or is disconnected

  • Discovery service fails or is disconnected

  • Network partition

When a user is disconnected
When a user is disconnected

  • This should not adversely affect other users

  • Some state (such as the user’s subscriptions) may need to be cleaned up.

  • Some systems use soft-state to deal with this issue:

    • Subscriptions are valid for a limited time and must be periodically refreshed

    • If the user does not come back in time to refresh the subscription, it will be removed automatically.

When a resource disappears
When a resource disappears

  • Monitoring services should indicate that the resource is no longer there

  • Discovery services should stop advertising the resource

  • Neither of these can be gauranteed to happen instantaneously.

When a discovery service dies
When a discovery service dies

  • Users cannot discover new resources.

  • They may have old information cached – this data is still useful, although it degrates in quality/usefulness.

  • Users can contact the resources directly and determine their status.

  • Some implementations allow for mirroring of discovery services.

When the network is partitioned
When the network is partitioned

  • This could be seen as a generalization of some the previous scenarios – all of the previous scenarios can be modelled as appropriate network partitions.

  • If there is a discovery service in a user’s partition, the user should be able to discover resources in that partition.

Information systems
Information Systems

  • We sometimes refer to Discovery and Monitoring as “Information Systems”

    • This is misleading, as we’re not including general-purpose database systems

  • Discovery and Monitoring information is:

    • Often stale as soon as it’s reported

    • Sometimes inconsistent

    • Often updated by running probes, either on-demand or periodically

Discovery services
Discovery Services

  • Used to locate monitoring services with information about resources.

  • May cache some resource data

    • May even cache enough resource data to act as a monitoring system.

  • Generally involve a database-like query interface

    • Languages like ldap, xpath, sql

  • Usually a relatively small number (maybe even just one, or one with a mirror) are deployed in a VO.

Two models for discovery services
Two Models for Discovery Services










& Discovery








Monitoring services
Monitoring Services

  • Used to monitor the state of a resource

  • Service interface usually involves db-like queries

    • With languages like ldap, xpath, sql

    • Often also provides for asynchronous notification

  • Typically also includes a back-end provider interface

    • Allows locally-written scripts, programs, etc. to collect information for the monitoring service

  • Typically deployed on each host that houses a resource.

How different implementations differ
How Different Implementations Differ

  • Overall architecture

    • Are monitoring and discovery separate?

  • Wire protocol

    • LDAP, Web Services, custom

  • Query Language

    • LDAP, Xpath, SQL

  • Caching Strategies

  • Schemas

    • Really more a deployment issue

Mds2 bdii history
MDS2 / BDII history

  • MDS2 was developed as part of the Globus Toolkit

    • It’s now superseded by MDS4, which has a different architecture.

  • BDII is a reimplementation of MDS2 by EGEE, and is still in use.

Mds2 architecture overview
MDS2 Architecture Overview

  • The Grid Resource Information Service (GRIS) collects information about a local resource and responds to requests for that information

    • Uses pluggable information providers

  • The Grid Index Information Service (GIIS) aggregates information from various GRIS servers

  • Users may query the GIIS for aggregated information or query the GRIS servers directly.

  • GIIS servers may be arranged hierarchically.

Mds2 architecture
MDS2 Architecture













Mds2 giis

  • Grid Index Information Service (GIIS) servers aggregate information from GRIS servers and other GIIS servers.

    • These other servers register themselves to the GIIS server.

    • Registrations must be periodically refreshed

  • GIIS servers cache information (results from previous queries).

  • If a GIIS server receives a query for which there is no fresh cached information, it forwards the query to its registered servers.

Mds2 gris

  • A Grid Resource Information Server (GRIS):

    • Runs on each host that has resources to be monitored.

    • Accepts requests for information about local resources

      • May come from users or GIIS servers

    • Runs a local “information provider” to collect and format the information

      • Unless the requested information is cached and relatively fresh

    • Caches the information and replies to the request

Mds2 query language
MDS2 Query Language

  • Both the GIIS and GRIS servers use LDAP as the service protocol and query language.

Ldap basics
LDAP Basics

  • Hierarchical data model

  • Each entry has a distinguished name and a set of attribute/value pairs

  • Distinguished name

    • Is a collection of name-value pairs

    • Must be unique

    • Determines the entry’s place in the hierarchy

      • Each entry’s DN must include its parent’s DN

  • Queries

    • Can search on attributes or DNs

    • Results can include children (or not) or include only certain attributes.

Mds4 overview
MDS4 Overview

  • MDS4 is a redesign of MDS

  • The MDS4 Index Service acts as both a monitoring and discovery service.

    • Uses WSRF standard resource property queries as its query interface.

  • A second monitoring service, the MDS4 Trigger Service, examines aggregated information and takes action when certain conditions are met.

    • E.g., “send email when a remote system appears to be down”.

  • MDS4 uses WSRF standards for its query and registration interfaces.

Ws resource review
WS-Resource Review

  • A WS-Resource is a Web Service that exposes internal state as Resource Properties

    • An XML element of arbitrary complexity

  • Each WS-Resource has a Resource Property Document

    • An XML document that includes all its Resource Properties

  • Example: The WS-GRAM service advertises information about its associated queues and clusters as a resource property.

Retrieving resource properties
Retrieving Resource Properties

  • GetResourceProperty

    • Gets a single named resource property

  • GetMultipleResourceProperties

    • Gets a set of named resource properties

  • QueryResourceProperty

    • Returns the results of a query against a resource’s resource property set

  • Subscription/notification

    • Clients subscribe and get periodic or occasional notifications

What this means
What this means…

  • Standard requests can be used to get state information from any WS-Resource.

  • This means that every WS-Resource is also a monitoring service!

    • But not necessarily monitoring anything (i.e., providing any interesting state)

  • We sometimes want information from sources other than WS Resources

    • Non-WSRF services

    • General system information

    • Catalogues of installed software

Service groups review
Service Groups Review

  • A service group is a service that represents a group of other services or resources

  • Service groups contain Service Group Entries (SGEs), which consist of:

    • The address of the SGE itself,

    • The address of the Service Group that the SGE belongs to, and

    • A Content element consisting of arbitrarily-formatted data

  • SGEs are created via the Service Group Add request

The mds4 index service
The MDS4 Index Service

  • Acts as a Discovery Service

    • Gathers information from other WS-Resources

    • Including other Index Servers

  • Acts as a Monitoring Service

    • Caches all the information it gathers

    • Also has a pluggable interface for Information Providers

      • Programs or Java classes that gather information

An mds4 index deployment
An MDS4 Index Deployment












The mds4 index data model
The MDS4 Index Data Model

  • The Index Service keeps its data as a Service Group

    • Registering a new resource to be monitored is accomplished by adding a service group entry to the service group.

  • The data in each SGE contains both:

    • Configuration information

      • E.g., “query the X resource property from server Y”

    • and the actual collected data.

Index data model simplified
Index Data Model (simplified)

Index Service Group


















Data model continued
Data Model continued

  • In the Index Service data model, data is grouped with its configuration information

  • Can have the “same” data two different places in the tree, if it was acquired from two different information sources.

    • E.g., information about a host’s load average from two different GRAM servers running on that host.

  • Relatively easy to find where each piece of data came from.

How the index updates its data
How the Index Updates its Data

  • Periodically, the Index Service examines each SGE in its Service Group

  • If the SGE’s registration has expired and not been renewed, it is destroyed.

  • Otherwise, the Index

    • looks at the Config part of the SGE content,

    • gathers data as specified by that config information, and

    • updates the data in the Data part of the SGE content

  • Data is updated periodically, not on demand.

Querying the index service
Querying the Index Service

  • The Index Service advertises its service group as a resource property

    • You can fetch the whole thing with GetRP or GetMultipleRPs

    • Most people use QueryRP to query it.

  • QueryRP allows you to specify a dialect and a query

    • Currently, only Xpath is supported as a dialect

Xpath queries
XPath Queries

  • Search an XML document and return some subset of the XML entities.

  • If an entity is included in the results, it’s included in its entirety

    • Unlike LDAP, no way to leave out attributes or children

Mds4 trigger service
MDS4 Trigger Service

  • A second monitoring service in MDS4

  • The Index is geared more towards queries intended for resource location and selection.

  • The Trigger service is intended to alert people to problems.

    • Can be configured to take action (e.g., send mail to an administrator) when issues arise.

Mds4 trigger service1
MDS4 Trigger Service

  • Maintains information in a service group, like the Index Service

  • SGE config information also includes an xpath query and an action

    • The action is the name of a program to run.

  • Periodically, the trigger service looks at each SGE in its servicegroup:

    • It evaluates the SGE’s xpath query against the SGE’s data.

    • If the query returns true, it runs the program specified by the action.

Mds4 webmds

  • Provides a simple HTTP interface to query an MDS Index Service

    • Really, to query resource properties of any WS-Resource

  • Optionally applies XSLT transforms to the query results.

  • Designed as a user interface, to be used with a web browser

    • But some people are using it to provide a REST-like interface to MDS4.

Grid discovery and monitoring systems

  • Monitoring system developed at SDSC

  • Users define tests for Inca to run.

  • Inca runs them and stores the results in a database.

  • Users can view the results on a web page.

  • Can be configured to send mail if tests fail, etc.

  • Can run tests using the user’s credentials

Grid discovery and monitoring systems

From the Inca 2.1 User’s Guide,

Inca query interface
Inca Query Interface

  • Uses an SQL database internally

  • End-users can query using a web page or receive notifications via email.

  • A web-services interface is also available

    • Uses a custom query language

  • Overall a nice monitoring/testing framework

  • Not designed as a discovery service

Gma grid monitoring architecture
GMA (Grid Monitoring Architecture)

  • Proposed architecture with three components:

    • Producers produce information

    • Consumers consume information

    • Directories keep track of what information is available

      • what producers can be queried, not the actual data

Diagram from “A Grid Monitoring Architecture”, B. Tierney et al.,

R gma

  • Relational Grid Monitoring Architecture

  • Implements the GMA model

    • Except that users never interact with the directory service (called a “registry” in R-GMA)

    • A consumer service does that instead, and users query the consumer service.

  • Uses SQL as its query language.

An r gma query
An R-GMA Query

  • Client sends SQL query to Consumer Service

  • Consumer Service contacts registry for list of producers to contact

  • Consumer service queries producers and buffers results

  • Client retrieves results from consumer service

Diagram from “R-GMA: Architectural Design” at

For more information
For More Information

  • Globus:

  • Inca:

  • R-GMA:

  • XML / Xpath / XSLT: