An Algebraic Approach to Practical and Scalable Overlay Network Monitoring

An Algebraic Approach to Practical and Scalable Overlay Network Monitoring Yan Chen, David Bindel, Hanhee Song, Randy H. Katz Presented by Mahesh Balakrishnan

Motivation • Overlay networks • Monitoring of end-to-end paths • The need for a separate Monitoring Service • Metrics: Latency... Loss Rate? • The Goal: A Scalable Overlay Loss Rate Monitoring Service

Existing Work… • Latency-only Schemes • Clustering: • Nodes are clustered together, and cluster representative is monitored • Claim: Inaccurate for congestion detection • Co-ordinates: • Cannot give congestion information

Existing Work. • Network Tomography: Determining internal network properties from black-box measurements • Shavitt, et al. Algebraic approach • Ozmutlu, et al. Selecting minimal set of paths to cover all links • General Metric Systems: RON

Core Idea • Assumptions: • Access to link composition of paths • Ability to measure path (but not link) characteristics • From the possible n2 end-to-end paths, select a basis set of k paths (k << n2) to monitor. • The characteristics of all paths can be inferred from this basis set. • Centralized algorithm: all nodes send measurements to central node.

p1 A 3 l1 D C l2 B The Math • Eq 1: • Represent paths as vectors: AD BD AC

= … System of Linear Equations Path Matrix Link Rates Path Rates

p1 A 3 l1 D C l2 B Example Network AB AC BC k = Number of essential paths 1 < k <= s G is rank deficient: k < s

… = More Math s • k = # of essential paths = rank (G) • k <= s • Usually G is rank-deficient: k < s • Select k linearly independent paths to monitor: • One-time QR Decomposition: O(rk2) time… O(n4)! • Inferring other paths: O(k2) k

Assessment Criteria • Accuracy • Scalability: How does k grow w.r.t n? • Other concerns: • centralized solution • compute time under churn • storage load

Effect of Topology on k growth • Star Topology, Strict Hierarchy: s = O(n), => k = O(n) • Clique: Each path (end host pair) contains a unique link, hence k = O(n2) • Hierarchy is good, Dense Connectivity is bad • Conjecture: k = O(nlogn) for the internet • What if only a small % of end nodes are on overlay?

Linear Regression Tests Synthetic Hierarchical Real

Handling Change • Path Addition: O(k2) • Path Removal: O(k2) [Naïve : O(rk2) • Node Addition: O(nk2) • Node Removal: O(nk2) • Cannot use path removal algorithm directly; path will be replaced using another path involving node • Remove all paths, then look for replacements • Cubic in n: Churn in large systems?

Routing Changes • End-to-end internet paths are generally stable • Traceroute • Topology checked on a daily basis, in presence of drastic loss rate changes • If path has changed at certain links, other paths with that link are checked as well

Load Balancing/Topology Measurement Errors • Paths in G are randomly reordered before basis set is selected • Untraceable paths/segments are modeled as single links; they always get selected in basis • Router aliases – one physical link presented as several virtual links – all virtual links get similar loss rates

Evaluation: Simulation • Three synthetic BRITE topologies: Barabasi-Albert, Waxman, hierarchical • One ‘real’ router topology (Mercator) • Methodology: • Loss Distribution: Good = 0-1%, Bad = 5-10% • Loss Model: Bernoulli, Gilbert • Simulate loss for selected paths, infer for other paths

Accuracy: Synthetic Topology • All Configurations under 0.008, 1.18

Accuracy: Real Topology

Accuracy Synthetic Hierarchical Topology Real Topology

Running Time • 3 seconds for 100 nodes, 21 minutes for 500!

Load Balancing

Effect of Churn/Routing Change Path Addition: 125 msec Path Removal: 445 msec Node Addition: 1.18 sec Node Removal: 16.9 sec What about n >> 60? Node Addition Network Link Removal Node Deletion

PlanetLab Experiments • 51 hosts, each from different organization • Each node sends a UDP packet to every other host in each trial • 300 trials of 300 msec each • Receiver counts packets for loss rate • Traceroute used for topology measurement

PlanetLab Results Average Abs. Error = 0.0027, Average Error Factor 1.1 Cumulative coverage/FP Cumulative error (Worst Run)

Effect of traffic on loss rates • Sensitivity Analysis done at night, on empty networks • Threshold at 12.8 Mbps • Why do this?

Conclusion • Algebraic Method for inferring loss rates of all paths from a basis set • Quite Accurate • Reasonable load imposed on each node • But is it really scalable? • Centralized solution, cubic dependence on n for handling node addition/removal

An Algebraic Approach to Practical and Scalable Overlay Network Monitoring

An Algebraic Approach to Practical and Scalable Overlay Network Monitoring

Presentation Transcript

An Algebraic Approach to Practical and Scalable Overlay Network Monitoring

SkipNet: A Scalable Overlay Network with Practical Locality Properties

An End-to-End Approach to Globally Scalable Network Storage

Tomography-based Overlay Network Monitoring

Tomography-based Overlay Network Monitoring

SkipNet: A Scalable Overlay Network with Practical Locality Properties

Scalable and Extensible Network Monitoring For GENI

An Efficient and Scalable Approach to CNN Queries in a Road Network

An Algebraic Approach to Visual Discovery Management

A scalable Approach to Size-independent Network Similarity

Tomography-based Overlay Network Monitoring

An Information-theoretic Approach to Network Measurement and Monitoring

Internet Iso-bar: A Scalable Overlay Distance Monitoring System

A Scalable Approach to Size-Independent Network Similarity

Scalable and Deterministic Overlay Network Diagnosis

Network Monitoring: A Practical Approach

5.3 Trigonometric Equations: An Algebraic Approach

Scalable Overlay Network for Peer-to-Peer File Sharing

Tomography-based Overlay Network Monitoring and its Applications

Tomography-based Overlay Network Monitoring and its Applications

Tomography-based Overlay Network Monitoring

An End-to-End Approach to Globally Scalable Network Storage