Scalable Network Sensing Infrastructure

Scalable Network Sensing Infrastructure Praveen Yalagandula Joint work with: Sujata Banerjee, Sujoy Basu, SJ Lee, Puneet Sharma, Alok Shriram, Han hee Song HP Labs, Palo Alto, CA http://networking.hpl.hp.com

Motivation • Emerging distributed applications require real-time comprehensive network state • Fault-diagnosis, meet QoS guarantees, and improve performance • Current network state information • Centralized and/or Fragmented  Not scalable, No E2E picture • Device-centric  Not flow-centric • SNMP-based (typically >5-min frequency)  Not enough real-time • Limited exposure of only few metrics • Example applications • Interactive Streaming Media Systems • CHART: Next Generation Internet Control Plane • Massive multi-player online games

client Interactive Streaming Media System Streaming server B Streaming server A High cross traffic • Automatic server selection depending on network conditions • Imperceptible to end user

B A Client-end Daemon CHART: Control for High-throughput Adaptive Resilient Transport • GOAL: Improve end-to-end TCP/IP performance 10x under multiple communication link impairments • DARPA funded joint project with UC Berkeley, Princeton, George Mason University, UC Santa Barbara, and Anagran 10% lossrate • Adaptive routing under fine real-time control • Constant, pervasive sensing of network conditions • Rapid routing response to link failures and degradations

What do these applications need? • Sensing feedback that is • Responsive: Seconds instead of minutes • Scalable: Low overhead even for ubiquitous sensing • Shareable: • Between different applications • Between different components of an application • Robust: Adapt to infrastructure failures • Flexible and Extensible: • Per-flow, per-application to meet multiple application requirements

S3: Scalable Sensing Service • Provides system state in real-time • Both individual network and node state • Monitors actively and passively • E2E but leverages network element info where possible • Flexible and extensible • Easy to add new measurement tools • Configurable time scales to measure • Supports complex queries • To which node do I have large bandwidth? • Which game server is within 10ms latency? • Shares measurement info across applications • Eliminates redundant expensive measurements • Scalable, Secure, and Reliable

Outline • Introduction • S3 : Scalable Sensing Service • Sensor Pods • Backplane • Scalable Inference Engines • Deployment: All pair network metrics on PlanetLab • S3 in CHART project • Summary and Future work

S3: Architecture • Sensor pods • Measure system state from a node’s view • Web-Service enabled collection of sensors • Backplane • Distributed programmable fabric • Connects pods and aggregates measured system state • Inference Engines • Infer O(n2) E2E paths info by measuring few paths • Schedule measurements on pods • Aggregate data on backplane • Applications

Sensor Pod Goals • Flexibility and Extensibility • Should be easy to add new sensors • Should be able to measure at application specified time scales • Easy accessibility • Use standard protocols for communication • Measured data should be shareable • Between different applications • Between different components of an application • Security

Configuration& Data Load Repository Memory Capacity Secure Web Interface Lossrate API: query, control, and notification Controller Bandwidth Latency Sensor Pod Web-Service (WS) enabled collection of sensors

Configuration& Data Load Repository Memory Capacity Secure Web Interface Lossrate API: query, control, and notification Controller Bandwidth • Secure Web Interface: • Standard communication protocols • Flexible interface Latency Sensor Pod Web-Service (WS) enabled collection of sensors

Configuration& Data Load Repository Memory Capacity Secure Web Interface Lossrate API: query, control, and notification Controller Bandwidth Latency Network Measurement/Monitoring Sensors Sensor Pod Web-Service (WS) enabled collection of sensors

Configuration& Data Node Monitoring Sensors Load Repository Memory Capacity Secure Web Interface Lossrate API: query, control, and notification Controller Bandwidth Latency Sensor Pod Web-Service (WS) enabled collection of sensors

Archive measurement data for sharing Store sensor invocation configurations Configuration& Data Load Repository Memory Capacity Secure Web Interface Lossrate API: query, control, and notification Controller Bandwidth Latency Sensor Pod Web-Service (WS) enabled collection of sensors

Configuration& Data Load Repository Memory Capacity Secure Web Interface Lossrate API: query, control, and notification Controller Bandwidth Latency Sensor Pod Web-Service (WS) enabled collection of sensors Process requests, invoke sensors according to installed configurations

Sensors available in S3 Focus on leveraging existing tools instead of building new

Challenges in adopting existing tools • Tools previously tested only in point-to-point configurations • Deployment in large scale setting exposed several issues • Hard-coded port numbers leading to port conflicts • Need to be started at source and destination simultaneously • High resource requirements leading to end-node crashes • Long running times leading to web server timeouts

Measure CAP(AB) CAP(B) Start CAP_RCV Sensor Pod Flexibility • Tools that need to be started at both ends simultaneously • Capacity • Pathrate • Available BW • PathChirp • Spruce Node A • Start CAP_SEND • Start CAP_RCV at B • 3) Measure Node B

Sharing Measurement Information • Avoid redundant measurements • Same sensor invocations with different time periods • Perform measurements at the lowest time period • Provide same measurement info to all clients • Guarantee provided: If client asks for periodicity of T, value returned is no more staler than T time • Example: • 3 second and 5 second periods • If you do both, (20 + 12) = 32 measurements/minute • Vs. 20 measurements/min with our guarantee

Sensor Pods • Web-Service (WS) enabled collection of sensors • Goal is to provide a flexible and extensible framework • Focus not on building new measurement tools • Allows easy plugging of new sensors • WS enables composition  aggregate sensors • E.g., Spruce needs capacity, use PathRate • Archival of sensing information

Sensing Backplane Sensing Information Management Backplane

Sensing Backplane • Distributed programmable middleware • Aggregate data from end-points • Distribute configurations to end-points • Configurable and self-managing • Exploring SDIMS [Yalagandula et al. SIGCOMM’04] • DHT based information management middleware • Scalable: with both nodes and attributes • Flexible: supports different aggregation strategies • Reliable: self-configuration in the face of failures • Adaptive: efficiently handle dynamic load patterns

Scalable Inference Engines • Difficult to collect complete sensing information • Large overhead for probing and data exchange • O(N2) measurements in a network of N nodes • Dynamically changing  Need frequent probing • Measurement/Monitoring failures • Failed or slow end machines • Measurement tool failures • Inference based on incomplete information • Exploit properties such as triangular inequality e.g. latency • A coarse estimate may suffice for many applications • Prediction based on archived information • E.g. network weather service

Scalable Inference Engines • Latency and Proximity estimation • Co-ordinates: e.g., GNP, Vivaldi • Landmark based: e.g., Netvigator, NodeBay • Others: e.g., Meridian, ID Maps, King • Lossrate • Subset Path Selection: NetQuest • Bandwidth: Non additive, highly dynamic  hard • Route sharing model: BRoute • Clustering [Alok Shriram, UNC] • Resource-Aware inference [Han Hee Song, UT]

Scalable Inference Engines • Latency and Proximity estimation • Co-ordinates: e.g., GNP, Vivaldi • Landmark based: e.g., Netvigator, Node Bay • Others: e.g., Meridian, ID Maps, King • Lossrate • Subset Path Selection: NetQuest • Bandwidth: Non additive, highly dynamic  hard • Route sharing model: BRoute • Clustering [Alok Shriram, UNC] • Resource-Aware inference [Han Hee Song, UT]

[P. Sharma et. al., “"Estimating Network Proximity and Latency," ACM Sigcomm Computer Communications Review, Volume 36, July 2006. d1 d2 … dn d1 d2 … dn d1 d2 … dn d1 d2 … dn Host Landmark Router (Milestone) NetVigator: Methodology O(nL) vs O(n2) measurements d1 d2 … dn Leverage Route Information traceroute instead of ping

NetVigator: “min_sum” algorithm InferDist(n,c) = Minl  L(n,c) (dist(n,l) + dist(c, l)) c L n

NetVigator performance • High accuracy: Over 90% accuracy • Low overhead: 15% measurement overhead • Robust to bad measurements • Performs 3 – 15 times better than GNP Evaluation with PlanetLab data

NetVigator: Summary • Proximity estimation is key to finding “best” resources • closest game server, closest media service, etc. • Netvigator: • Find the “k closest nodes to a given node” • Features: • Landmark based scheme: O(NL) v/s O(N2) measurement overhead • Highly Scalable and Accurate • Robustness to bad measurements and choice of landmarks • Allows incremental computation • Uses widely deployed tools: ping, traceroute

Available bandwidth inference • Complete N2 BW monitoring infeasible • Serialized measurements take too long – not real time • Concurrent measurements require large BW and other end-node resources • Our Approach: Clustering • Cluster nodes that have similar properties to all other nodes • Perform BW measurements from only one node in each cluster – head of the cluster • Infer all-pair BW from those measurements • Two approaches: Path-based or Capacity-based [Joint work with Alok Shriram and Jasleen Kaur, UNC]

Path Based Clustering • Cluster nodes using last m hop similarity • Compute distance metric for each node pair • d(p,q) = Average of Path Differences to all other nodes • PD(p,q,s) = Number of different hops in last m positions between paths (p,s) and (q,s) • Cluster nodes using K-means clustering • Choose a cluster head in every cluster • Node with highest access link capacity, compute power

Measurements and Inference • Within a cluster • Each node measures to every node in the cluster • Across clusters • Each cluster head probes all nodes outside the cluster • Overhead: (k-1).N+ ∑i=1..k |Ci2| • where C={C1,C2,..Ck}, k=#clusters, N=#nodes • Inferred bandwidth (A, B) = Min ( A  head(CA), head(CA)  B) where CA = Cluster in which A is present and A and B are in different clusters

Inference results Average error = 27%, 25th percentile = 12% and 75th percentile = 46%

Resource Aware Inference • Current inference tools • Try to minimize global # of measurements • Not end-host resource aware • Need resource aware inference techniques • NetQuest algorithms for Loss-rate inference [Song et al SIGMETRICS’06]

CPU usage of PathChirp Memory usage of PathChirp Challenges of Concurrent measurements • Problems with concurrent measurements • Resource requirements wrt CPU, memory, network BW

Challenges of Concurrent measurements • Problems with concurrent measurements • Resource requirements wrt CPU, memory, network BW Network usage of different measurement tools

Challenges of Concurrent measurements • Problems with concurrent measurements • Resource requirements wrt CPU, memory, network BW • Interference on node and network Response time Failure rate

Resource Adaptive Inference • Characterize resource requirements of measurement tools • CPU, Memory, and BW usage • Monitor resource availability at end nodes • Given resource constraints e.g., use only 5% of available CPU • Determine the number of concurrent measurements • Subset Path selection • Select paths to measure that maximize inference accuracy • While satisfying the resource constraints • Measure and Infer • Measure selected e2e path properties • Use inference algorithms

Evaluation –Inference accuracy comparison • Mean Absolute Error (MAE) of inferred path performances

S3: Architecture • Sensor pods • Web-Service enabled collection of sensors • Measure system state from a node’s view • Backplane • Programmable fabric • Connects pods and aggregates measured system state • Inference Engines • Infer O(n2) E2E paths info by measuring few paths • Schedule measurements on pods • Aggregate data on backplane • Applications

Outline • Introduction • S3 : Scalable Sensing Service • Sensor Pods • Backplane • Scalable Inference Engines • Deployment: All pair network metrics on PlanetLab • Summary and Future work

Deployment: PlanetLab • 700+ nodes scattered across 350+ sites • Running since early January 2006 • All pair network metrics: E2E latency, BW, Capacity, Loss • Simple backplane: central server • Maintains pods, schedules measurements, collects and publishes data • Stats:~14GB raw data every day, ~1GB compressed

S3 Data Usage • Web server stats (2006): • ~200 unique visitors/month • ~20GB download BW/month • Projects • Internal: Bandwidth Inference, Resource Aware Monitoring, Semantic store • External: MSR, UofWashington, Purdue, Georgia Tech, Harvard, Princeton, Boston University, etc.

Screenshot: Hop by hop loss sensor

S3 Screenshot

Network Visualizer

Scalable Network Sensing Infrastructure

Scalable Network Sensing Infrastructure

Presentation Transcript

Remote Sensing Technology for Scalable Information Networks

Tulane’s Network Infrastructure

Network Infrastructure Insecurity

A Scalable Content Addressable Network

A Scalable Content-Addressable Network

Cluster-Based Scalable Network Services

Designing a Scalable Network Infrastructure

Towards a Flexible Global Sensing Infrastructure

Network Infrastructure

A scalable Content- Addressable Network

Network Infrastructure

A Scalable, Content-Addressable Network

Scalable Network Infrastructure

Green Infrastructure Network

Network Infrastructure Virtualization

SCALABLE Network Technologies

GÉANT2 Network Infrastructure

A Scalable Content Addressable Network

A Scalable Content-Addressable Network