CrossGrid Approach to Application Performance Measurement and Monitoring - PowerPoint PPT Presentation

Crossgrid approach to application performance measurement and monitoring l.jpg
Download
1 / 39

CrossGrid Approach to Application Performance Measurement and Monitoring. Marian Bubak Bartosz Baliś, Włodzimierz Funika, Roland Wismueller, Tomasz Arodź, Marcin Kurdziel, Marcin Radecki, Tomasz Szepieniec Institute of Computer Science & ACC CYFRONET, AGH, Kraków, Poland TUM Munich, Germany

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

CrossGrid Approach to Application Performance Measurement and Monitoring

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Crossgrid approach to application performance measurement and monitoring l.jpg

CrossGrid Approach to Application Performance Measurement and Monitoring

Marian Bubak

Bartosz Baliś, Włodzimierz Funika, Roland Wismueller,

Tomasz Arodź, Marcin Kurdziel, Marcin Radecki, Tomasz Szepieniec

Institute of Computer Science & ACC CYFRONET, AGH, Kraków, Poland

TUM Munich, Germany

Institute for Software Science, University of Vienna, Austria

www.eu-crossgrid.org


Outline l.jpg

Outline

  • Introduction

  • Performance analysis of grid interactive appls

  • G-PM tool

    • Architecture

    • Measurements

    • Example of an use case

  • OCM-G

    • Motivation

    • Architecture

    • Functionality

    • Security

  • Status

  • Future work


Use c ase d escription l.jpg

Use Case: Description

  • Medical simulation application with visualization kernel

  • Simulation on different site (server) than the visualization (client)

  • Task

    • analyse performance of simulation to visualization communication


Features of interactive grid computing l.jpg

Features of Interactive Grid Computing

  • Run time application control

    • Performance data on-line

  • Possible effects of decisions

    • Access to benchmark information

  • Interpreting application’s behavior in heterogeneous open system

    • Access to infrastructure performance

  • Information meaningful in the context of application field

    • more application specific performance data

  • Need in on-line standard and user-defined metrics


Background l.jpg

Background

  • 1995 OMIS 1.0

  • 1997 OMIS 2.0

  • 1997 OCM for PVM clusters

  • 1997 OMIS Tools (Detop, Patop, …)

  • 1997 Collaboration LRR TUM-ICS AGH

  • 1999 Porting to MPI

  • 2000 First proposal of OCM for Grids


G pm tool objectives l.jpg

G-PM Tool – Objectives:

  • Evaluation of grid applications performance:

    • Providing rich set of predefined measurements

    • Allowing for user-defined measurements

    • Allowing for probe-based measurements

    • Providing on-line performance measurement visualization

    • Compliant with OMIS 2.0 monitoring standard interface


G pm tool architecture l.jpg

High Level Analysis Component:

Classes supporting

user-defined measurements

G-PM Tool Architecture

Common interface to pre- and user-defined measurements

Performance Measurement Component:

Provides predefined

measurements classes

User Interface and Visual Component:

Measurements specification and performance visualization classes

External interface to monitoring tool OCM-G based on OMIS (CG Task 3.3)


Standard m etrics 1 l.jpg

Standard Metrics (1)

  • Wall clock/CPU time

    • Total

    • In communication:

      • Send, Receive, Collective, Barier

    • In I/O:

      • Read, Write

  • Data volume

    • communication

    • IO

  • Number of library calls

    • communication

    • IO


Standard m etrics 2 l.jpg

Standard Metrics (2)

  • Host metrics

    • CPU load

    • Available memory

  • Network metrics

    • Load

    • Bandwidth

  • Benchmark metrics

    • CPU, Network


User defined metrics l.jpg

User-defined Metrics

  • Support for high level performance analysis

  • Custom metrics

    • Defined on the basis of standard metrics

    • Providing higher level of abstraction

    • Programmed in dedicated specification language

  • Probes

    • Special function calls inserted into source code by the programmer

    • Define events that can be used in definition of custom metrics

    • Provide a way of passing arguments to G-PM


Measurements parametrisation l.jpg

Measurements Parametrisation

  • Measurements can be restricted to specific:

    • Objects

      • Sites, hosts, processes, files

    • Partner objects

      • Sites, hosts, processes

    • Locations in source code

      • Modules, functions

  • Time resolution

    • Integral, Mean value, Current value

  • Virtual time


Types of m easurements l.jpg

Types of Measurements

  • Sampled measurements

    • Quantities that change continuously and can only be sampled at some intervals

    • Based on a direct query about an object by the OCM-G

    • Example: CPU time

  • Function-based measurements

    • Quantities that change in result of function calls, defined by the calls’ input and output parameters

    • Require a library instrumentation

    • Based on counters/integrators

    • Provide a hierarchy of metrics: e.g. send volume

  • User-defined measurements


User defined measurements metrics l.jpg

User-defined Measurements: Metrics

  • Possible ways of metrics definition:

    • Metrics defined by an existing metrics, measured during an execution (e.g. with 2 probes)

    • Metrics defined by a parameter of a probe

    • Metrics derived from existing set of metrics via aggregation or comparison


User defined measurements example l.jpg

User-defined Measurements: Example

  • Example of a new metrics:

    IO_volume_for_interaction(Process[] processes, File[] files,

    Region[] regions, TimeInterval currTime)

    {

    volume[p][vt] = IO_volume(p, files, regions) AT end(p, vt)

    - IO_volume(p, files, regions) AT begin(p, vt);

    globalVol[vt] = SUM(volume[p][vt] WHERE p IN processes);

    result = SUM(globalVol[vt] WHERE vt IN currTime);

    RETURN result;

  • Components of the metrics definition:

    • Two probes: begin/end of a user interaction

    • Standard metrics IO_volume for total disk I/O

    • Volume accumulated over space (p) and time (vt)

  • Optimization: distributed measurements


Probes for u se r defined m easurements l.jpg

Probes for User-defined Measurements

  • High-level performance data

    • Particular, relevant events, e.g. start/end of user interaction

    • Associated events, e.g. start/end events in different processes – entry/exit from the same comp. phase

    • Data computed within application, e.g. residuum value

  • Instrumentation code into application code

  • Probe – special function call

  • Additional parameters for app.-specific data

  • The same probe for different metrics

  • Low overhead of inactive instrumentation


M easurement d efinition window l.jpg

Measurement Definition Window

In measurements that involve two processes, such as: „traffic between process A and process B” it specifies the second partner.

Specifies which part of code should be measured e.g.:a particular function

Specifies where measurement should be done e.g: on which site, host, process etc.

Specifies what should be measured, e.g:Receive Volume


Example of visualization widget l.jpg

Example of Visualization Widget


Use c ase d escription18 l.jpg

Use Case: Description

  • Medical simulation application with visualization kernel

  • Simulation on different site (server) than the visualization (client)

  • Task

    • analyse performance of simulation to visualization communication


Use c ase c ode i nstrumentation l.jpg

Use Case: Code Instrumentation

  • Programmer inserts three probes:

    • In the source code on server:

      • Probe A

        • After server asks client to visualize frame

      • Probe B

        • After data is sent to client

    • In the source code on client:

      • Probe C

        • Before data is passed to graphics engine

  • Programmer recompiles the application


Use c ase n ew m etrics l.jpg

Use Case: New Metrics

  • Three new custom metrics:

    • Generate frames/sec =

      =1/(time between invocations of probe A)

    • Compression factor =

      =(data passed to probe C) / (sent volume between execution of probe A and probe B)

    • Visualisation Kernal processing time/frame =

      =(time interval between execution of probe A and probe B)

  • New metrics can be used in the same way as the built-in ones


Why omis ocm g l.jpg

Why OMIS / OCM-G ?

  • Long experience in OMIS monitoring

  • 150k reusable lines of OCM code already existing since 1997-1999

  • Existing OMIS Tools

    • Relatively easy to port due to universal interface

  • Versatility of the approach

    • Monitoring services for different types of tools

    • Information /manipulation / event services

    • Extendibility

  • Transparency for the user

  • Portability, flexibility


From ocm to ocm g l.jpg

From OCM to OCM-G

  • Inherited from the OCM

    • Core monitoring concepts

    • 99% of monitoring functionality

    • Instrumentation techniques

  • New in the OCM-G

    • Grid-enabled start-up

    • GSI security

    • Permanent service concept

    • Grid-specific services

    • Probes – support for user-defined arbitrary events

    • New objects – sites


Ocm g architecture l.jpg

SM

Service Manager

OCM-G

LM

Local Monitor

Application Module

AM

AM

OCM-G – Architecture

Tool

e.g. G-PM

Application Process

site

node

AP

AP


Interfaces l.jpg

Interfaces

Tool

  • OMIS

    On-line Monitoring Interface Specification

  • Target Interface

    • /proc

    • ptrace

    • shared memory

site

SM

node

LM

AP

AM

AP

AM


Ocm g and ggf s gma 1 l.jpg

discover

discover

Tool

Ext. Inf. System

Consumer

Consumer

register

register

Registry

Producer

Producer

Producer

Producer

SM

Tool

Producer

SM

Consumer

LM

LM

LM

OCM-G and GGF’s GMA (1)


Ocm g and ggf s gma 2 l.jpg

GMA

OCM-G

Query / Response(one or more events returned)

Unconditional requests

Subscribe(event stream returned)

Conditional requests

OCM-G and GGF’s GMA (2)

  • GMA defines two monitoring scenarios


Short overview of omis l.jpg

Short Overview of OMIS

  • Target system view

    • hierarchical set of objects

      • sites, nodes, processes, threads

    • objects identified by tokens, e.g. n_1, p_1, etc.

  • Three types of services

    • Information

    • Manipulation

    • Event


Omis services l.jpg

OMIS Services

  • Information services

    • obtain information on target system

    • e.g. node_get_info = obtain information on nodes in the target system

  • Manipulation services

    • perform manipulations on the target system

    • e.g. thread_stop = stop specified threads

  • Event services

    • detect events in the target system

    • e.g. thread_started_libcall = detect invocations of specified functions

  • Information + manipulation services = actions


Omis requests l.jpg

OMIS Requests

Services are combined into two types of monitoring requests:

  • Unconditional requests

    • executed immediately and only once

  • Conditional requests

    • execute actions whenever event occurs


Distribution of a request l.jpg

:thread_stop([a_1])

:thread_stop([p_1,p_2,p_3])

:thread_stop([p_1,p_2])

:thread_stop([p_3])

:thread_stop([p_4])

Stop

Stop

Stop

Stop

Distribution of a Request

Tool

SM

SM

LM

LM

LM

AP1

AP2

AP3

AP4

node1

node2

node3

site1

site2


Transparency l.jpg

Transparency

  • Preparation of an application for monitoring straightforward

    • ocm mpicc -o ping ping.c

    • no need of manual source code instrumentation or using automatic tools needed

  • Start-up of the OCM-G entirely transparent

  • Application submitted to run as usual

    • mpirun -np 2 ping --ocmg-regcont --ocmg-appname “myapp”

  • Tools can be attached to a running application at any time


Efficiency l.jpg

Efficiency

  • Selective instrumentation

    • Activated or deactivated on demand

  • Buffering and preprocessing

    • Data stored in a local buffer

    • Counters and integrators used

    • Only summarized information sent to the OCM-G, not a full trace

  • Evaluation

    • Monitoring overhead with excessive number of events ~ 4%

    • Zero overhead of inactive instrumentation


Ocm g start up sequence l.jpg

Tool

find SM

fork()

fork()

connect

P2

P1

connect

find SM

connect

SM

LM

LM

find LM

find LM

node1

node1

OCM-G Start-up Sequence

External LocalizationMechanism

Site 1


Security issues l.jpg

Security Issues

  • OCM-G components handle multiple users, tools and applications

  • Authentication and authorization needed at two levels

    • Tool-SM – check if the user is allowed to manipulate objects

    • SM-LM – check if the request comes from the SM and the user authorization


Security solutions l.jpg

Security – Solutions

  • LMs are user-bound

    • Run as user processes

    • Security ensured by OS mechanisms

  • Service Managers are permanent

    • Run as unprivileged processes (nobody)

    • User Grid Id checked internally (partial security)

    • Grid certificates for users, tools and SMs incorporated (ultimate security)


Status prototype completed l.jpg

Status – Prototype Completed

  • Typical metrics

  • 90% of monitoring services implemented

  • New Grid-enabled start-up mechanism

  • Support for multiple applications and tools

  • Works on one site

  • Not yet permament Grid service


Integration of g pm and ocm g l.jpg

Integration of G-PM and OCM-G

  • G-PM = Grid Performance Measurement tool

  • OCM-G is data source for G-PM

  • Full integration easily achieved (OMIS!)

  • Measurements

    • CPU usage

    • Delay of communication

    • Volume of data transfer


F ut ure work l.jpg

Future work

  • Full set of measurements

  • Integration with Grid info services

  • Support for multiple sites

  • Permanent Grid service

    • single instance of the OCM-G

    • support for multiple users

  • Incorporate security based on GSI

  • Support for dynamic application behavior

    • migration, creation


Www eu crossgrid org l.jpg

www.eu-crossgrid.org


  • Login