An algebraic approach to practical and scalable overlay network monitoring
Download
1 / 44

- PowerPoint PPT Presentation


  • 232 Views
  • Uploaded on

An Algebraic Approach to Practical and Scalable Overlay Network Monitoring. Yan Chen . David Bindel, Hanhee Song, and Randy H. Katz. University of California at Berkeley. Northwestern University. ACM SIGCOMM 2004. Motivation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - adamdaniel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
An algebraic approach to practical and scalable overlay network monitoring l.jpg

An Algebraic Approach to Practical and Scalable Overlay Network Monitoring

Yan Chen

David Bindel, Hanhee Song, and Randy H. Katz

University of California at Berkeley

Northwestern University

ACM SIGCOMM 2004


Motivation l.jpg
Motivation Network Monitoring

  • Infrastructure ossification led to thrust of overlay and P2P applications

  • Such applications flexible on paths and targets, thus can benefit from E2E distance monitoring

    • Overlay routing/location

    • VPN management/provisioning

    • Service redirection/placement …

  • Requirements for E2E monitoring system

    • Scalable & efficient: small amount of probing traffic

    • Accurate: capture congestion/failures

    • Adaptive: nodes join/leave, topology changes

    • Robust: tolerate measurement errors

    • Balanced measurement load


Related work l.jpg
Related Work Network Monitoring

  • General metrics: RON (n2measurement)

  • Latency estimation

    • Link-level-measurement: min set cover (Ozmultu et al), similar approach for giving bounds of other metrics (Tang & McKinley)

    • Clustering-based: IDMaps, Internet Isobar, etc.

    • Coordinate-based: GNP, Virtual Landmarks, Vivaldi, etc.

  • Network tomography

    • Focusing on inferring the characteristics of physical links rather than E2E paths

    • Limited measurements -> under-constrained system, unidentifiable links


Problem formulation l.jpg
Problem Formulation Network Monitoring

Given an overlay of n end hosts and O(n2) paths, how to select a minimal subset of paths to monitor so that the loss rates/latency of all other paths can be inferred.

Assumptions:

  • Topology measurable

  • Can only measure the E2E path, not the link


Outlines l.jpg
Outlines Network Monitoring

  • An algebraic approach framework

  • Algorithms for a fixed set of overlay nodes

  • Scalability analysis

  • Adaptive dynamic algorithms

  • Measurement load balancing

  • Handling topology measurement errors

  • Simulations and Internet experiments


Our approach l.jpg

Overlay Network Operation Center Network Monitoring

End hosts

topology

measurements

Our Approach

Select a basis set of k paths that fully describe O(n2) paths (k «O(n2))

  • Monitor the loss rates of k paths, and infer the loss rates of all other paths

  • Applicable for any additive metrics, like latency


Modeling of path space l.jpg

A Network Monitoring

p1

1

3

Modeling of Path Space

D

C

2

B

Path loss rate p, link loss rate l


Putting all paths together l.jpg
Putting All Paths Together Network Monitoring

Totally r = O(n2) paths, s links, s <<r

=


Sample path matrix l.jpg

A Network Monitoring

b2

1

3

b1

D

b3

C

2

B

Virtualization

2

1

Virtual links

Sample Path Matrix

  • x1 - x2unknown => cannot compute x1, x2

  • To separate identifiable vs. unidentifiable components: x = xG + xN

  • All E2E paths (G) are orthogonal to xN, i.e., GxN = 0


Intuition through topology virtualization l.jpg

1 Network Monitoring

1’

2’

2

1

2

3

Rank(G)=2

2’

1’

1

1

3’

2

2

4

3

3

4’

Rank(G)=3

Intuition through Topology Virtualization

Virtual links: minimal path segments whose loss rates uniquely identified

  • Can fully describe all paths

  • xG composed of virtual links

Virtualization

Real links (solid) and all of the overlay

paths (dotted) traversing them

Virtual links


Algorithms l.jpg

= Network Monitoring

Algorithms

  • Select k = rank(G) linearly independent paths to monitor (one time)

    • Use QR decomposition

    • Leverage sparse matrix: time O(rk2) and memory O(k2)

      • E.g., 79 seconds for n = 300 (r = 44850) and k = 2541

  • Compute the loss rates of other paths (continuously)

    • Time O(k2) and memory O(k2)

=


Outlines12 l.jpg
Outlines Network Monitoring

  • An algebraic approach framework

  • Algorithms for fixed set of overlay nodes

  • Scalability analysis

  • Adaptive dynamic algorithms

  • Measurement load balancing

  • Handling topology measurement errors

  • simulations and Internet experiments


How many measurements saved l.jpg
How many measurements saved ? Network Monitoring

k « O(n2) ?

For a power-law Internet topology

  • When the majority of end hosts are on the overlay

  • When a small portion of end hosts are on overlay

    • If Internet a pure hierarchical structure (tree): k = O(n)

    • If Internet no hierarchy at all (worst case, clique): k = O(n2)

    • Internet has moderate hierarchical structure [TGJ+02]

k = O(n) (with proof)

For reasonably large n, (e.g., 100), k = O(nlogn)


Linear regression tests of the hypothesis l.jpg

BRITE 20K-node hierarchical topology Network Monitoring

Mercator 284K-node real router topology

Linear Regression Tests of the Hypothesis

  • BRITE Router-level Topologies

    • Barbarasi-Albert, Waxman, Hierarchical models

  • Mercator Real Topology

  • Most have the best fit with O(n) except the hierarchical ones fit best with O(nlogn)


Outlines15 l.jpg
Outlines Network Monitoring

  • An algebraic approach framework

  • Algorithms for fixed set of overlay nodes

  • Scalability analysis

  • Adaptive dynamic algorithms

  • Measurement load balancing

  • Handling topology measurement errors

  • Simulations and Internet experiments


Topology changes l.jpg
Topology Changes Network Monitoring

  • Basic building block: add/remove one path

    • Incremental changes: O(k2) time (O(n2k2) for re-scan)

    • Add path: check linear dependency with old basis set,

    • Delete path p : hard when

    • Intuitively, two steps

  • Add/remove end hosts , Routing changes

  • Routing relatively stable in order of a day

    => incremental detection


Topology change example l.jpg

A Network Monitoring

b2

Virtualization

1

3

b1

D

2

1

Virtual links

b3

C

2

B

Topology Change Example


Other practical issues l.jpg
Other Practical Issues Network Monitoring

  • Measurement load balancing

    • Randomly reorder the paths in G before scanning them for selection of

    • Has no effect on the loss rate estimation accuracy

  • Topology measurement errors tolerance

    • Care about path loss rates than any interior links

    • Router aliases

      => Let it be: assign similar loss rates to the same links

    • Path (segments) without topology info

      => add virtual links to bypass


Outlines19 l.jpg
Outlines Network Monitoring

  • An algebraic approach framework

  • Algorithms for fixed set of overlay nodes

  • Scalability analysis

  • Adaptive dynamic algorithms

  • Measurement load balancing

  • Handling topology measurement errors

  • Simulations and Internet experiments


Slide20 l.jpg

Evaluation Network Monitoring

  • Extensive Simulations

    • See paper

  • Experiments on PlanetLab

    • 51 hosts, each from different organizations

      • 51 × 50 = 2,550 paths

    • Simultaneous loss rate measurement

      • 300 trials, 300 msec each

      • In each trial, send a 40-byte UDP pkt to every other host

    • Topology measurement (traceroute)

    • 100 experiments in peak hours of North America


Slide21 l.jpg

PlanetLab Experiment Results Network Monitoring

  • Loss rate distribution

  • On average k = 872 out of 2550

  • Metrics

    • Absolute error |p – p’ |:

      • Average 0.0027 for all paths, 0.0058 for lossy paths

    • Relative error [BDPT02]

      • Average 1.1 for all paths, and 1.7 for lossy paths


More experiment results l.jpg
More Experiment Results Network Monitoring

  • Running time

    • Setup (path selection): 0.75 seconds

    • Update (for all 2550 paths): 0.16 seconds

    • More results on topology change adaptation: see paper

  • Robustness

    • Out of 14 sets of pair-wise traceroute …

    • On average 245 out of 2550 paths have no or incomplete routing information

    • No router aliases resolved

      Conclusion: robust against topology measurement errors


Slide23 l.jpg

Results for Measurement Load Balancing Network Monitoring

  • Simulation on an overlay of 300 end hosts, average load 8.5

  • With balancing: Gaussian-like load distribution

  • Without: heavily skewed, with the max almost 20 times the average


Conclusions l.jpg
Conclusions Network Monitoring

  • A tomography-based overlay network monitoring system

    • Given n end hosts, characterize O(n2) paths with a basis set of O(nlogn) paths

    • Selectively monitor the basis set for their loss rates, then infer the loss rates of all other paths

    • Adaptive to topology changes

    • Balanced measurement load

    • Topology measurement error tolerance

  • Both simulation and PlanetLab experiments show promising results

  • Built an adaptive overlay streaming media system on top of it


Backup slides l.jpg
Backup Slides Network Monitoring


Other practical issues26 l.jpg
Other Practical Issues Network Monitoring

  • Topology measurement errors tolerance

    • Care about path loss rates than any interior links

    • Poor router alias resolution

      => assign similar loss rates to the same links

    • Unidentifiable routers

      => add virtual links to bypass

  • Measurement load balancing on end hosts

    • Randomly order the paths for scan and selection of


Modeling of path space27 l.jpg

A Network Monitoring

p1

1

3

Modeling of Path Space

D

C

2

B

Path loss rate p, link loss rate l

Put all r = O(n2) paths together

Totally s links


Sample path matrix28 l.jpg

x Network Monitoring2

A

b2

(1,1,0)

1

3

b1

(1,-1,0)

path/row space

(measured)

D

null space

(unmeasured)

b3

C

2

x1

B

x3

Sample Path Matrix

  • x1 - x2unknown => cannot compute x1, x2

  • Set of vectors

    form null space

  • To separate identifiable vs. unidentifiable components: x = xG + xN

  • All E2E paths are in path space, i.e., GxN = 0


Intuition through topology virtualization29 l.jpg

x Network Monitoring2

(1,1,0)

(1,-1,0)

path/row space

(measured)

null space

(unmeasured)

x1

A

b2

x3

Virtualization

1

3

b1

D

2

1

Virtual links

b3

C

2

B

Intuition through Topology Virtualization

Virtual links:

  • Minimal path segments whose loss rates uniquely identified

  • Can fully describe all paths

  • xG is composed of virtual links

All E2E paths are in path space, i.e., GxN = 0


Algorithms30 l.jpg
Algorithms Network Monitoring

=

  • Select k = rank(G) linearly independent paths to monitor

    • Use rank revealing decomposition

    • Leverage sparse matrix: time O(rk2) and memory O(k2)

      • E.g., 10 minutes for n = 350 (r = 61075) and k = 2958

  • Compute the loss rates of other paths

    • Time O(k2) and memory O(k2)


Practical issues l.jpg
Practical Issues Network Monitoring

  • Topology measurement errors tolerance

    • Care about path loss rates than any interior links

    • Poor router alias resolution

      => assign similar loss rates to the same links

    • Unidentifiable routers

      => add virtual links to bypass

  • Measurement load balancing on end hosts

    • Randomly order the paths for scan and selection of

  • Topology Changes

    • Efficient algorithms for incrementally update of

      for adding/removing end hosts & routing changes


Slide32 l.jpg

More Experiment Results Network Monitoring

  • Measurement load balancing

    Putting load values of each node in 10 equally spaced bins

  • Running time

    • Setup (path selection): 0.75 seconds

    • Update (for all 2550 paths): 0.16 seconds

    • More results on topology change adaptation: see paper

With load balancing

Without load balancing


Work in progress l.jpg
Work in Progress Network Monitoring

  • Provide it as a continuous service on PlanetLab

  • Network diagnostics:

    Which links or path segments are down

  • Iterative methods for better speed and scalability


Evaluation l.jpg
Evaluation Network Monitoring

  • Simulation

    • Topology

      • BRITE: Barabasi-Albert, Waxman, hierarchical: 1K – 20K nodes

      • Real topology from Mercator: 284K nodes

    • Fraction of end hosts on the overlay: 1 - 10%

    • Loss rate distribution (90% links are good)

      • Good link: 0-1% loss rate; bad link: 5-10% loss rates

      • Good link: 0-1% loss rate; bad link: 1-100% loss rates

    • Loss model:

      • Bernouli: independent drop of packet

      • Gilbert: busty drop of packet

    • Path loss rate simulated via transmission of 10K pkts

  • Experiments on PlanetLab


Slide35 l.jpg

Evaluation Network Monitoring

  • Extensive Simulations

  • Experiments on PlanetLab

    • 51 hosts, each from different organizations

    • 51 × 50 = 2,550 paths

    • On average k = 872

  • Results Highlight

    • Avg real loss rate: 0.023

    • Absolute error mean: 0.0027 90% < 0.014

    • Relative error mean: 1.1 90% < 2.0

    • On average 248 out of 2550 paths have no or incomplete routing information

    • No router aliases resolved


Sensitivity test of sending frequency l.jpg
Sensitivity Test of Sending Frequency Network Monitoring

  • Big jump for # of lossy paths when the sending rate is over 12.8 Mbps


Slide37 l.jpg

PlanetLab Experiment Results Network Monitoring

  • Loss rate distribution

  • Metrics

    • Absolute error |p – p’ |:

      • Average 0.0027 for all paths, 0.0058 for lossy paths

    • Relative error [BDPT02]

    • Lossy path inference: coverage and false positive ratio

  • On average k = 872 out of 2550


Accuracy results for one experiment l.jpg
Accuracy Results for One Experiment Network Monitoring

  • 95% of absolute error < 0.0014

  • 95% of relative error < 2.1


Accuracy results for all experiments l.jpg
Accuracy Results for All Experiments Network Monitoring

  • For each experiment, get its 95% absolute & relative errors

  • Most have absolute error < 0.0135 and relative error < 2.0


Lossy path inference accuracy l.jpg
Lossy Path Inference Accuracy Network Monitoring

  • 90 out of 100 runs have coverage over 85% and false positive less than 10%

  • Many caused by the 5% threshold boundary effects


Performance improvement with overlay l.jpg
Performance Improvement with Overlay Network Monitoring

  • With single-node relay

  • Loss rate improvement

    • Among 10,980 lossy paths:

    • 5,705 paths (52.0%) have loss rate reduced by 0.05 or more

    • 3,084 paths (28.1%) change from lossy to non-lossy

  • Throughput improvement

    • Estimated with

    • 60,320 paths (24%) with non-zero loss rate, throughput computable

    • Among them, 32,939 (54.6%) paths have throughput improved, 13,734 (22.8%) paths have throughput doubled or more

  • Implications: use overlay path to bypass congestion or failures


Slide42 l.jpg

Adaptive Overlay Streaming Media Network Monitoring

Stanford

UC San Diego

UC Berkeley

X

HP Labs

  • Implemented with Winamp client and SHOUTcast server

  • Congestion introduced with a Packet Shaper

  • Skip-free playback: server buffering and rewinding

  • Total adaptation time < 4 seconds



Conclusions44 l.jpg
Conclusions Network Monitoring

  • A tomography-based overlay network monitoring system

    • Given n end hosts, characterize O(n2) paths with a basis set of O(nlogn) paths

    • Selectively monitor O(nlogn) paths to compute the loss rates of the basis set, then infer the loss rates of all other paths

  • Both simulation and real Internet experiments promising

  • Built adaptive overlay streaming media system on top of monitoring services

    • Bypass congestion/failures for smooth playback within seconds


ad