- 87 Views
- Uploaded on
- Presentation posted in: General

Internet Iso-bar: A Scalable Overlay Distance Monitoring System

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Internet Iso-bar: A Scalable Overlay Distance Monitoring System

Yan Chen, Lili Qiu, Chris Overton and Randy H. Katz

Applications of end-to-end distance monitoring/estimation

- Overlay Routing/Location
- Peer-to-peer Systems
- VPN Management/Provisioning
- Service Redirection/Placement
- Cache-infrastructure Configuration
Requirements for E2E distance monitoring system

- Scalable: a small amount of probing traffic and system load
- Accurate: capture congestion/failures + latency estimation
- Fast: small computation for real-time estimation
- Incrementally deployable
- Easy to use
Benefit applications

- Application-driven measurement
- Inference techniques for trouble shooting, root cause analysis
- Improve application performance and reliability

- Given N end hosts, how to select a subset of them as monitors and build a scalable overlay distance monitoring service without knowing the underlying topology?
- Distance info desired: report congestion/failure if occurs, otherwise latency

- Based on National Lab of Applied Network Research (NLANR) AMP data set
- 104 sites in US (including Alaska, Hawaii) & Australia, every host ping all other hosts every minute
- Sliding window of 10 samples, use minimum RTT as latency sample
- 105M measurements, 6/25/01 – 7/1/01
- Congestion/failures (uniformly denoted as congestion) defined as measurement “loss” or (latency > geo mean × geo stdev)

- Congestions not common, only 0.96% samples
- A few congestion links dominate the E2E congestion
- Besides those happened at the last mile, E2E congestion exhibit strong spatial correlation

- Procedures
- Cluster hosts that perceive similar performance to a small set of sites (landmarks)
- For each cluster, select a monitor for active and continuous probing
- Estimate distance between any pair of hosts using inter- and intra-cluster distance

- Define correlationdistance between each pair of hosts
- Existing work use network proximity:cor_dist(i,j) = net_dist(i,j) (denoted pij)
- Iso-bar uses network distance vector(k landmarks for clustering only): netVi = [pi1, pi2, …, pik]T
- Euclidean distance based:
- Cosine vector similarity based:

- Apply generic clustering methods
- Optimize the worst case: minimize the maximum radius of all clusters (limit_num_minRmax)
- Optimize the average case: minimize the sum of total host-monitor distance (limit_num_minDistSum)

Diagram of Internet Iso-bar

Cluster C

Cluster B

Cluster A

Landmark

End Host

Diagram of Internet Iso-bar

Distance probes from monitor to its hosts

Distance probes among monitors

Cluster C

Cluster B

Cluster A

Landmark

Monitor

End Host

j

i

m

j

mj

i

mi

- Intra-cluster estimation
- If path(m, i) or path(m, j) is congested, report path(i, j) as congestion
- O/w pDist(i,j) = (mDist(m, i) + mDist(m, j))/ 2

- Inter-cluster estimation
- If path(mi, i), path(mi, mj) or path(mj, j) is congested, report path(i, j) as congestion
- O/w pDist(i,j) = mDist(mi, mj)

- Internet measurement data
- NLANR AMP data set
- Clustering with geometric mean of training date
- Estimation dates: 6/25/01 – 7/24/01, 12/06/01

- Keynote CDN measurement data
- 63 agents covering all major ISPs in US, Europe, Asia & Australia
- 2 targets (CDN re-directors) in Boston and Texas
- Measure TCP connection time (2/3 of handshake) from each agent to target every minute
- Training date: 10/21/2002
- Estimation dates: 10/21/2002 – 11/25/2002

- NLANR AMP data set
- Similar latency estimation results for both datasets, present NLANR

- Estimation metric
- Relative accuracy error for un-congested latency
- Stability
- For dynamic monitoring systems, amount of congestion captured and false positive ratio

- Internet distance estimation techniques evaluated
- Omniscent: use g-mean data of (source, dest) on training date
- Global Network Positioning (GNP)
- Clustering with network distance vector (Iso-bar)
- Clustering with network proximity

- 15 clusters vs. 15 landmarks of GNP

- Training date: 06/25/01
- Estimation dates: 06/25/01 - 12/06/01
- Summary of the 90th percentile relative error for various distance estimation methods

- Latency estimation when un-congested
- Omniscient is the most accurate, but unscalable
- GNP and Iso-bar are the second
- Both have good accuracy and stability for distance estimation
- GNP unscalable for online monitoring, static approach

- Iso-bar outperforms proximity-based clustering by 50%
- 90th percentile < 0.5, if 60ms latency, 45ms < prediction < 90ms

- Congestion/failures estimation
- 6/25/01 – 7/01/01, averagely 148K congested measurements per day
- Iso-bar captures 78% of them, 32% false positive ratio
- Only 3% of monitoring overhead compared with RON

- Propose Internet Iso-bar
- Cluster hosts based on the network similarity
- Inter- and Intra-cluster latency estimation w/ first-step heuristic for congestion/failure detection
- Preliminary results promising
- High accuracy & stability for normal latency estimation
- Simple heuristics of congestion estimation captures 78% of congestions, with 32% false positive, and only 3% of monitoring overhead of RON

- Current focus switch from latency estimation to congestion/failures estimation
- Apply topology information, e.g. lossy link detection with network tomography
- Cluster and choose monitors based on the lossy links

- Benefit applications
- Dynamic node join/leave for P2P systems
- Joining client pings landmark sites to get distance vector, compare with those of monitors, and choose closest one to join
- Split/merge clusters

- Multi-path selection

- Dynamic node join/leave for P2P systems
- More comprehensive evaluation
- Simulate with large network
- Deploy on PlanetLab, and operate at finer level

Problem formulation:

Given N end hosts, how to select a subset of them as monitors and build a scalable overlay distance monitoring service without knowing the underlying topology?

Distance info desired: report congestion/failure if occurs, o/w latency

Our approach:

- Cluster hosts that perceive similar performance to a small set of sites (landmarks)
- For each cluster, select a monitor for active and continuous probing
- Estimate distance between any pair of hosts using inter- and intra-cluster distance
Performance evaluation

- Using real Internet measurement data
- Compared with other distance estimation services: GNP, RON
- Performance metrics: accuracy and stability

- Congestion/failures analysis
- Congestion/failures (uniformly denoted as congestion) not common
- Defined as measurement “loss” or (latency > geo mean × geo stdev)
- Only 0.96% out of 105M NLANR ping measurements over a week

- Suggest a few congestion links dominate the E2E congestion
- Besides those happened at the last mile, E2E congestion exhibit strong spatial correlation

- Congestion/failures (uniformly denoted as congestion) not common
- Estimation algorithms
- Intra-cluster estimation (i and j use the same monitor m)
- If path(m, i) or path(m, j) is congested, report path(i, j) as congestion
- O/w predictedDist(i,j) = (measuredDist(m, i) + measuredDist(m, j))/ 2

- Inter-cluster distance estimation
- If path(monitori, i), path(monitori, monitorj) or path(monitorj, j) is congested, report path(i, j) as congestion
- Otherwise predictedDist(i,j) = measuredDist(monitori, monitorj)

- Self-diagnostics of monitors, check for last-mile congestion

- Intra-cluster estimation (i and j use the same monitor m)