Crossgrid approach to application performance measurement and monitoring
1 / 39

CrossGrid Approach to Application Performance Measurement and Monitoring - PowerPoint PPT Presentation

  • Uploaded on

CrossGrid Approach to Application Performance Measurement and Monitoring. Marian Bubak Bartosz Baliś, Włodzimierz Funika, Roland Wismueller, Tomasz Arodź, Marcin Kurdziel, Marcin Radecki, Tomasz Szepieniec Institute of Computer Science & ACC CYFRONET, AGH, Kraków, Poland TUM Munich, Germany

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'CrossGrid Approach to Application Performance Measurement and Monitoring' - Jeffrey

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Crossgrid approach to application performance measurement and monitoring l.jpg

CrossGrid Approach to Application Performance Measurement and Monitoring

Marian Bubak

Bartosz Baliś, Włodzimierz Funika, Roland Wismueller,

Tomasz Arodź, Marcin Kurdziel, Marcin Radecki, Tomasz Szepieniec

Institute of Computer Science & ACC CYFRONET, AGH, Kraków, Poland

TUM Munich, Germany

Institute for Software Science, University of Vienna, Austria

Outline l.jpg
Outline and Monitoring

  • Introduction

  • Performance analysis of grid interactive appls

  • G-PM tool

    • Architecture

    • Measurements

    • Example of an use case

  • OCM-G

    • Motivation

    • Architecture

    • Functionality

    • Security

  • Status

  • Future work

Use c ase d escription l.jpg
Use and MonitoringCase: Description

  • Medical simulation application with visualization kernel

  • Simulation on different site (server) than the visualization (client)

  • Task

    • analyse performance of simulation to visualization communication

Features of interactive grid computing l.jpg
Features and Monitoring of Interactive Grid Computing

  • Run time application control

    • Performance data on-line

  • Possible effects of decisions

    • Access to benchmark information

  • Interpreting application’s behavior in heterogeneous open system

    • Access to infrastructure performance

  • Information meaningful in the context of application field

    • more application specific performance data

  • Need in on-line standard and user-defined metrics

Background l.jpg
Background and Monitoring

  • 1995 OMIS 1.0

  • 1997 OMIS 2.0

  • 1997 OCM for PVM clusters

  • 1997 OMIS Tools (Detop, Patop, …)

  • 1997 Collaboration LRR TUM-ICS AGH

  • 1999 Porting to MPI

  • 2000 First proposal of OCM for Grids

G pm tool objectives l.jpg
G-PM Tool and Monitoring – Objectives:

  • Evaluation of grid applications performance:

    • Providing rich set of predefined measurements

    • Allowing for user-defined measurements

    • Allowing for probe-based measurements

    • Providing on-line performance measurement visualization

    • Compliant with OMIS 2.0 monitoring standard interface

G pm tool architecture l.jpg

High Level Analysis Component: and Monitoring

Classes supporting

user-defined measurements

G-PM Tool Architecture

Common interface to pre- and user-defined measurements

Performance Measurement Component:

Provides predefined

measurements classes

User Interface and Visual Component:

Measurements specification and performance visualization classes

External interface to monitoring tool OCM-G based on OMIS (CG Task 3.3)

Standard m etrics 1 l.jpg
Standard and MonitoringMetrics (1)

  • Wall clock/CPU time

    • Total

    • In communication:

      • Send, Receive, Collective, Barier

    • In I/O:

      • Read, Write

  • Data volume

    • communication

    • IO

  • Number of library calls

    • communication

    • IO

Standard m etrics 2 l.jpg
Standard and MonitoringMetrics (2)

  • Host metrics

    • CPU load

    • Available memory

  • Network metrics

    • Load

    • Bandwidth

  • Benchmark metrics

    • CPU, Network

User defined metrics l.jpg
User-defined Metrics and Monitoring

  • Support for high level performance analysis

  • Custom metrics

    • Defined on the basis of standard metrics

    • Providing higher level of abstraction

    • Programmed in dedicated specification language

  • Probes

    • Special function calls inserted into source code by the programmer

    • Define events that can be used in definition of custom metrics

    • Provide a way of passing arguments to G-PM

Measurements parametrisation l.jpg
Measurements Parametrisation and Monitoring

  • Measurements can be restricted to specific:

    • Objects

      • Sites, hosts, processes, files

    • Partner objects

      • Sites, hosts, processes

    • Locations in source code

      • Modules, functions

  • Time resolution

    • Integral, Mean value, Current value

  • Virtual time

Types of m easurements l.jpg
Types of and MonitoringMeasurements

  • Sampled measurements

    • Quantities that change continuously and can only be sampled at some intervals

    • Based on a direct query about an object by the OCM-G

    • Example: CPU time

  • Function-based measurements

    • Quantities that change in result of function calls, defined by the calls’ input and output parameters

    • Require a library instrumentation

    • Based on counters/integrators

    • Provide a hierarchy of metrics: e.g. send volume

  • User-defined measurements

User defined measurements metrics l.jpg
User-defined and MonitoringMeasurements: Metrics

  • Possible ways of metrics definition:

    • Metrics defined by an existing metrics, measured during an execution (e.g. with 2 probes)

    • Metrics defined by a parameter of a probe

    • Metrics derived from existing set of metrics via aggregation or comparison

User defined measurements example l.jpg
User-defined and MonitoringMeasurements: Example

  • Example of a new metrics:

    IO_volume_for_interaction(Process[] processes, File[] files,

    Region[] regions, TimeInterval currTime)


    volume[p][vt] = IO_volume(p, files, regions) AT end(p, vt)

    - IO_volume(p, files, regions) AT begin(p, vt);

    globalVol[vt] = SUM(volume[p][vt] WHERE p IN processes);

    result = SUM(globalVol[vt] WHERE vt IN currTime);

    RETURN result;

  • Components of the metrics definition:

    • Two probes: begin/end of a user interaction

    • Standard metrics IO_volume for total disk I/O

    • Volume accumulated over space (p) and time (vt)

  • Optimization: distributed measurements

Probes for u se r defined m easurements l.jpg
Probes for and MonitoringUser-defined Measurements

  • High-level performance data

    • Particular, relevant events, e.g. start/end of user interaction

    • Associated events, e.g. start/end events in different processes – entry/exit from the same comp. phase

    • Data computed within application, e.g. residuum value

  • Instrumentation code into application code

  • Probe – special function call

  • Additional parameters for app.-specific data

  • The same probe for different metrics

  • Low overhead of inactive instrumentation

M easurement d efinition window l.jpg
M and Monitoringeasurement Definition Window

In measurements that involve two processes, such as: „traffic between process A and process B” it specifies the second partner.

Specifies which part of code should be measured e.g.:a particular function

Specifies where measurement should be done e.g: on which site, host, process etc.

Specifies what should be measured, e.g:Receive Volume

Use c ase d escription18 l.jpg
Use and MonitoringCase: Description

  • Medical simulation application with visualization kernel

  • Simulation on different site (server) than the visualization (client)

  • Task

    • analyse performance of simulation to visualization communication

Use c ase c ode i nstrumentation l.jpg
Use and MonitoringCase: Code Instrumentation

  • Programmer inserts three probes:

    • In the source code on server:

      • Probe A

        • After server asks client to visualize frame

      • Probe B

        • After data is sent to client

    • In the source code on client:

      • Probe C

        • Before data is passed to graphics engine

  • Programmer recompiles the application

Use c ase n ew m etrics l.jpg
Use and MonitoringCase: New Metrics

  • Three new custom metrics:

    • Generate frames/sec =

      =1/(time between invocations of probe A)

    • Compression factor =

      =(data passed to probe C) / (sent volume between execution of probe A and probe B)

    • Visualisation Kernal processing time/frame =

      =(time interval between execution of probe A and probe B)

  • New metrics can be used in the same way as the built-in ones

Why omis ocm g l.jpg
Why OMIS / OCM-G ? and Monitoring

  • Long experience in OMIS monitoring

  • 150k reusable lines of OCM code already existing since 1997-1999

  • Existing OMIS Tools

    • Relatively easy to port due to universal interface

  • Versatility of the approach

    • Monitoring services for different types of tools

    • Information /manipulation / event services

    • Extendibility

  • Transparency for the user

  • Portability, flexibility

From ocm to ocm g l.jpg
From OCM to OCM-G and Monitoring

  • Inherited from the OCM

    • Core monitoring concepts

    • 99% of monitoring functionality

    • Instrumentation techniques

  • New in the OCM-G

    • Grid-enabled start-up

    • GSI security

    • Permanent service concept

    • Grid-specific services

    • Probes – support for user-defined arbitrary events

    • New objects – sites

Ocm g architecture l.jpg

SM and Monitoring

Service Manager



Local Monitor

Application Module



OCM-G – Architecture


e.g. G-PM

Application Process





Interfaces l.jpg
Interfaces and Monitoring


  • OMIS

    On-line Monitoring Interface Specification

  • Target Interface

    • /proc

    • ptrace

    • shared memory









Ocm g and ggf s gma 1 l.jpg

discover and Monitoring



Ext. Inf. System


















OCM-G and GGF’s GMA (1)

Ocm g and ggf s gma 2 l.jpg

GMA and Monitoring


Query / Response(one or more events returned)

Unconditional requests

Subscribe(event stream returned)

Conditional requests

OCM-G and GGF’s GMA (2)

  • GMA defines two monitoring scenarios

Short overview of omis l.jpg
Short Overview of OMIS and Monitoring

  • Target system view

    • hierarchical set of objects

      • sites, nodes, processes, threads

    • objects identified by tokens, e.g. n_1, p_1, etc.

  • Three types of services

    • Information

    • Manipulation

    • Event

Omis services l.jpg
OMIS Services and Monitoring

  • Information services

    • obtain information on target system

    • e.g. node_get_info = obtain information on nodes in the target system

  • Manipulation services

    • perform manipulations on the target system

    • e.g. thread_stop = stop specified threads

  • Event services

    • detect events in the target system

    • e.g. thread_started_libcall = detect invocations of specified functions

  • Information + manipulation services = actions

Omis requests l.jpg
OMIS Requests and Monitoring

Services are combined into two types of monitoring requests:

  • Unconditional requests

    • executed immediately and only once

  • Conditional requests

    • execute actions whenever event occurs

Distribution of a request l.jpg

: and Monitoringthread_stop([a_1])









Distribution of a Request
















Transparency l.jpg
Transparency and Monitoring

  • Preparation of an application for monitoring straightforward

    • ocm mpicc -o ping ping.c

    • no need of manual source code instrumentation or using automatic tools needed

  • Start-up of the OCM-G entirely transparent

  • Application submitted to run as usual

    • mpirun -np 2 ping --ocmg-regcont --ocmg-appname “myapp”

  • Tools can be attached to a running application at any time

Efficiency l.jpg
Efficiency and Monitoring

  • Selective instrumentation

    • Activated or deactivated on demand

  • Buffering and preprocessing

    • Data stored in a local buffer

    • Counters and integrators used

    • Only summarized information sent to the OCM-G, not a full trace

  • Evaluation

    • Monitoring overhead with excessive number of events ~ 4%

    • Zero overhead of inactive instrumentation

Ocm g start up sequence l.jpg

Tool and Monitoring

find SM







find SM





find LM

find LM



OCM-G Start-up Sequence

External LocalizationMechanism

Site 1

Security issues l.jpg
Security Issues and Monitoring

  • OCM-G components handle multiple users, tools and applications

  • Authentication and authorization needed at two levels

    • Tool-SM – check if the user is allowed to manipulate objects

    • SM-LM – check if the request comes from the SM and the user authorization

Security solutions l.jpg
Security – Solutions and Monitoring

  • LMs are user-bound

    • Run as user processes

    • Security ensured by OS mechanisms

  • Service Managers are permanent

    • Run as unprivileged processes (nobody)

    • User Grid Id checked internally (partial security)

    • Grid certificates for users, tools and SMs incorporated (ultimate security)

Status prototype completed l.jpg
Status – Prototype Completed and Monitoring

  • Typical metrics

  • 90% of monitoring services implemented

  • New Grid-enabled start-up mechanism

  • Support for multiple applications and tools

  • Works on one site

  • Not yet permament Grid service

Integration of g pm and ocm g l.jpg
Integration and Monitoringof G-PM and OCM-G

  • G-PM = Grid Performance Measurement tool

  • OCM-G is data source for G-PM

  • Full integration easily achieved (OMIS!)

  • Measurements

    • CPU usage

    • Delay of communication

    • Volume of data transfer

F ut ure work l.jpg
F and Monitoringuture work

  • Full set of measurements

  • Integration with Grid info services

  • Support for multiple sites

  • Permanent Grid service

    • single instance of the OCM-G

    • support for multiple users

  • Incorporate security based on GSI

  • Support for dynamic application behavior

    • migration, creation

Www eu crossgrid org l.jpg and Monitoring