Slac and perfsonar
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

SLAC and PerfSONAR PowerPoint PPT Presentation


  • 49 Views
  • Uploaded on
  • Presentation posted in: General

SLAC and PerfSONAR. Yee-Ting Li PerfSONAR developers workshop October 2006. SLAC IEPM. SLAC used to be primarily a High Energy Particle Physics institute Now beginning to diverge into other science’s Photon Science (SSRL and LCLS) Impact to chemistry and molecular biology

Download Presentation

SLAC and PerfSONAR

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Slac and perfsonar

SLAC and PerfSONAR

Yee-Ting Li

PerfSONAR developers workshop

October 2006


Slac iepm

SLAC IEPM

  • SLAC used to be primarily a High Energy Particle Physics institute

  • Now beginning to diverge into other science’s

    • Photon Science (SSRL and LCLS)

    • Impact to chemistry and molecular biology

  • First US based webpage at SLAC!

  • Internet End-to-end Performance Monitoring Group

    • Focus on problem detection and long term performance/trend analysis

    • Origin’s in PingER monitoring

    • Currently deploying more intrusive IEPM-BW tests


Pinger

PingER

  • PingER project originally (1995) for measuring network performance for US, Europe and Japanese HEP community

  • Extended this century to measure Digital Divide

  • Last year added monitoring sites in S. Africa, Pakistan & India

  • Uses ICMP to determine:

    • RTT

    • Loss

    • Connectivity

    • Derived TCP throughput, ie 1/sqrt(LOSS)


Pinger deployment

PingER: Deployment

  • ~120 countries

  • 99% world’s connected population

  • 35 monitor sites in 14 countries

  • Over 600 nodes currently being monitored worldwide


Pinger digital divide

PingER: Digital Divide

Behind Europe

6 Yrs: Russia, Latin America 7 Yrs: Mid-East, SE Asia

10 Yrs: South

Asia

11 Yrs: Cent. Asia

12 Yrs: Africa


Iepm bw

IEPM-BW

  • Developed as an exhibit for SC2001

  • Conducts tests using various tools

    • Achievable BW: Iperf, thrulay

    • Estimated BW: pathchirp, pathload,abwe

    • File Transfer: bbcp, bbftp, gridftp Latency/Loss: ping, traceroute, owamp

  • MySQL backend with Web-based front end

  • Collection of scripts to:

    • start/stop deamons

    • Conduct analysis (and produce web-accessible graphs)

    • Forecasting and Event detection (and notification)


Iepm bw deployment

IEPM-BW: Deployment

  • Running at CERN, SLAC, FNAL, BNL, Caltech, Taiwan to about 40 remote sites (in a semi-mesh)

  • 40 target hosts in 13 countries

  • Bottlenecks vary from 0.5Mbits/s to 10Gbits/s

  • Traverse ~50 AS’s, 15 major Internet providers

  • 5 targets at PoPs, rest at end sites


Iepm bw presentation

IEPM-BW: Presentation

  • Timeseries plots


Iepm bw presentation1

IEPM-BW: Presentation

  • Diurnal Plots


Iepm bw presentation2

IEPM-BW: Presentation


Iepm bw presentation3

IEPM-BW: Presentation

  • CDF Diagrams


Iepm bw topology

IEPM-BW: Topology

  • Topology


Iepm bw event detection

IEPM-BW: Event Detection

  • Automated problem identification:

    • Administrator’s cannot review 100’s of graphs each day

    • Alerts for network administrators

      • Changes in time-series, loss, latency, iperf, SNMP

    • Alerts for systems people

      • OS/Host metrics

    • Anomalies for security

  • Anomalous event detection

    • A series of no measurements (network out?)

    • Determine that something ‘wrong’ has happened; measured value significantly differs from expected value

  • Forecasts

    • Given trends in previous measurements, determine what is within tolerance of being ‘okay’


Event detection plateau

Observations

Event

*

Trigger % full

History mean

Event Detection: Plateau

  • Circular buffer of observations

  • Define trigger buffer of results

    • Buffer fills if an observation deviates significantly from mean of circular buffer

  • Event occurs when trigger buffer exceeds threshold

  • Filters:

    • Check if (mh -mt) / mh > D& 90% trigger in last T mins then have trigger

    • Move trigger buffer to history buffer

History mean – 2 * stdev

  • = history length = 1 day,

    t = trigger length = 3 hours

  • = standard deviations = 2


Event detection k s

Event Detection: K-S

  • For each observation: for the previous 100 observations with next 100 observations

    • Compare the vertical difference in CDFs

    • How does it differ from random CDFs

    • Expressed as % difference

    • Define threshold for % difference


Event detection holt winters

Event Detection: Holt-Winters

  • Use Holt-Winters (H-W) technique:

    • Uses triple exponential weighted moving average

    • Three parameters (a, b, ) that take into account local smoothing, long term seasonal smoothing, and trends respectively.

  • Choose parameters by minimizing (1/N)Σ(Ft-yt)2

    • Ft=forecast for time t as function of parameters, yt= observation at time t

  • H-W is a forecasting technique; need to complement with a method to identify events

    • If a percentage of residuals are outside twice the EWMA of absolute deviation, then generate event (HWE)

    • Apply Plateau on H-W residuals (PHR) and K-S on H-W residuals (KHR)


Event detection holt winters1

Event Detection: Holt-Winters


Event diagnosis

Event Diagnosis

  • Once we get alert(s) of Events, how do we correlate to diagnose problems?

  • Define heuristic’s of ‘effect and cause’

    • Define probabilities to pin-point the location of the problem

  • First pass: narrows down to where the problem occurs on a high level

    • End-host or network?

  • Next step: is to define heuristics for the location of problems in a network path and subsystems on hosts

    • Interrogate using tools such as pS, ganglia, nagios

    • Cross correlate with other measurements (eg. Meshed traceroutes)


Perfsonar

PerfSONAR

  • De-centralised network monitoring

    • Reduces overhead for us at IEPM to gather network statistics

  • Unified access to network information

    • Should enable easier methods to gather and use the network information

    • However, not all sites may provide the most useful information for our purposes

      • Define/recommend a base set of MP’s? (eg ping, traceroute, port up?…)

  • Middleware platform

    • Therefore requires applications to prove usefulness of design

    • Alarm services (event detection), trend analysis etc.


Perfsonar interests to slac iepm

PerfSONAR Interests to SLAC/IEPM

  • More statistics allow us to better understand Internet performance

  • Event Diagnosis - pS enables easier gathering of network performance data

    • Backbone and End-to-end allows us to corroborate suspicions

    • First need event detection in order to identify where problems are seen

  • Grid software development

    • SLAC will become a LHC ATLAS Tier-2 site

    • Network Service’s

      • Use of network metrics to help replica management, light path reservations etc


Perfsonar questions

PerfSONAR Questions

  • Test and possibly extend NMWG schemas to support the metrics that we are interested in

  • Interface for reoccurring and scheduled test initialisation

    • Waiting on AAA?

    • Conflicting tests?

  • Porting of our visualisation and analysis tools

    • Currently untie’ing and modularising analysis tools from IEPM-BW infrastructure

    • API

      • Input: use NMWG/pS

      • Output: Extend perfSONAR API to support ‘alerts’?

  • Access patterns for data:

    • We are more interested in gathering large windows of data rather than individual results

    • Too slow to gather data dynamically?

    • Should we cache data locally for our analysis?


Perfsonar installation

PerfSONAR: Installation

  • Java Version

  • Relatively easy; however, I have worked with java and web-services in the past

  • Documentation could do with more detail

    • What are all the ‘extra’ packages actually for? E.g. exist

    • Had to install separately; why couldn’t the perfSONAR install do that?

    • List of prerequisites/requirements

      • Machine types

      • Security requirements/Ports opened etc


Perfsonar sql ma

PerfSONAR: SQL-MA

  • Idea was to create a IEPM-BW MA

    • Provide extra characteristics

    • Easiest way to enable NMWG compliant reports

    • Tests NMWG for our purposes

  • SQL-MA

    • All data currently in MySQL tables!

    • Installation problems

      • Different snapshots give different errors!

      • Difficult to get help due to time-zone differences

      • Security policies at SLAC prevent quick and easy access to non-SLAC users

    • Class diagrams seem to make sense

      • Will report on how easy it is to actually get it working!


Perfsonar security issues

PerfSONAR: Security Issues

  • SLAC (DOE) does not allow us to run application servers individually (eg ports are blocked)

  • We are currently deploying pS on a ‘community’ tomcat installation

  • Running two instances of tomcat for LS and MA is not possible for us

  • SLAC has a ‘prove that you need it’ attitude to allow external access to network data


Summary

Summary

  • De-centralised management of pS allows us to concentrate more on analysis rather than deployment/maintenance

  • IEPM would like specific tools that have proven to be the most useful for diagnosis

    • Latency (connectivity) and traceroute

    • Extend to other metrics such as throughput etc.

  • PerfSONAR allows transparent data access

    • pS enables the unification of both end-to-end and router metric representation

      • Worry about finding correlations for diagnosis rather than determine ‘how’ to gather the data.

    • Porting of our analysis tools

      • Test perfSONAR api’s

      • Provide useful features such as event detection, other UI4 examples etc


  • Login