Pipe dreams
Download
1 / 23

PIPE Dreams - PowerPoint PPT Presentation


  • 320 Views
  • Updated On :

PIPE Dreams. Trouble Shooting Network Performance for Production Science Data Grids Presented by Warren Matthews at CHEP’03, San Diego March 24-28, 2003. Abstract.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'PIPE Dreams' - ryanadan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Pipe dreams l.jpg

PIPE Dreams

Trouble Shooting Network Performance for Production Science Data Grids

Presented by Warren Matthews at CHEP’03, San Diego March 24-28, 2003


Abstract l.jpg
Abstract

The vision of science grids allocating resources to analyze huge quantities of HENP data clearly depends on reliable network performance. Tools developed at SLAC in conjunction with the Internet2 PIPES project will help to ensure this. In this talk, these tools will be discussed and the procedure for publishing performance data, in particular using the Globus toolkit's MDS and web services will be reviewed. The subsequent analysis and trouble-shooting methodology will be discussed with real world examples from the particle physics data grid (PPDG) and the European data grid (EDG).


Overview l.jpg
Overview

  • What is the problem ?

  • What is PIPES ?

  • Network performance monitoring

  • Problem identification


Network monitoring for the grid l.jpg
Network Monitoring for the Grid

  • The Data Grid consists of many components that must interoperate

Farm

requestor

Data

Farm

Data

The Network

Data

Farm

requestor

Resource Broker


Allocate resources l.jpg
Allocate Resources

  • The resource broker must be fully informed

  • Measurement is required !

Farm

requestor

Data

12% pkt loss

Farm

Data

The Network

80% Utilization

OC48

Data

Farm

requestor

Resource Broker


What is pipes l.jpg
What is PIPES ?

  • Internet2

  • End-to-end performance initiative

  • PI Performance Evaluation System (PIPES)

  • PIPES Monitoring Platform (PMP)

  • Overlap with goals of HENP

  • Tremendous resources


Iepm bw l.jpg
IEPM-BW

  • Package developed at SLAC

    • Measurement Engine

      • Iperf, bbftp, bbcp, ping, traceroute

      • Abwe, owamp, udpmon, gridftp

    • Job Manager

    • Data Storage and data server

    • Analysis Engine


Slide8 l.jpg

LANL

EDG

KEK

CERN

TRIUMF

NIKHEF

FNAL

NERSC

IN2P3

ANL

PPDG/GriPhyN

CHI

CERN

SNV

ESnet

ORNL

RAL

JLAB

NY

UCL

ORNL

SLAC

UManc

SLAC

Imperial

JAnet

DL

NNW

BNL

APAN

Stanford

RIKEN

Stanford

INFN-Roma

APAN

INFN-Padua

Geant

CalREN

INFN-Milan

Abilene

SEA

CESnet

NY

NASA

WASH

SNV

Monitoring Site

SOX

HSTN

DNVR

ATL

CLV

IPLS

UTAH

SDSC

UFL

CALTECH

I2

UTDallas

UMich

Rice

NCSA


Slide9 l.jpg

NNW

BaBar Grid

Manchester

10 Gbps

TVN

622Mbps

RAL

Janet

ESnet

SWERN

SLAC

Bristol

Geant

Stanford

DFN

Dresden

Calren

Abilene

1 Gbps

2.5 Gbps

Renater

IN2P3


Problem identification l.jpg
Problem Identification

  • Typical Scenario

    • User complains file transfer is slow

    • Net admin runs ping, traceroute, iperf test

    • Complain to upstream provider

  • Proactive

    • What do we mean by throughput?

    • How do we know there was a performance hit?

    • Our approach is diurnal changes


Alarms l.jpg
Alarms

  • Too much to keep track of

  • Rather not wait for complaints

  • Automated Alarms

  • Rolling average à la RIPE-TT

    • May not be the best approach

  • AMP Automated Detection System


Limitations l.jpg
Limitations

  • Could be over an hour before alarm is generated

  • More frequent measurements impact the network and measurements overlap

  • Low impact tools allow finer grained measurement

    • Use NWS multi-variate method

    • Use SCIDAC ABwE tool

    • Use PingER, OWAMP


Publishing l.jpg
Publishing

  • Many monitoring projects, publish data to allow them to inter-operate

  • MDS

    • EDG NM Schema

  • Web Services

    • GLUE NE Schema

  • GGF NMWG

    • Hierarchy Doc

    • Tools Doc

./get_data

2003 3 18 6 1 41 1.61 1.601 1.62 0


Net rat l.jpg
Net Rat

  • Alarm System

    • Multiple tools

    • Multiple measurement points

    • Trigger further measurements

    • Cross reference off site stats

  • Informant database

  • No measurement is ‘authoritative’

    • Cannot even believe a measurement


Slide20 l.jpg
Log

03/20/2003 20:13:46 ALARM pcgiga throughput=305.224 ctresh=512.95 athresh=312.91

03/20/2003 20:13:48 TRACE no change in route detected

03/20/2003 20:16:07 CALM Throughput within acceptable limits. ALARM CANCELLED


Toward a monitoring infrastructure l.jpg
Toward a Monitoring Infrastructure

  • MAGGIE

    • Measurement and Analysis package built on NIMI/Akenti

  • EDEE

    • production-quality Data Grid for Europe


More information l.jpg
More Information

  • IEPM Home Page

  • IEPM-BW

  • I2 E2E and PIPES

  • RIPE-TT

  • AMP Automated Event Detection

  • NWS

  • ABWE


Slide23 l.jpg
End

This talk made possible by the IEPM team at SLAC (Les Cottrell, Connie Logg, Jiri Navratil, Jerrod Williams, Fabrizio Coccetti), and the many developers and maintainers around the world.