Routing Dynamics in Simultaneous Overlay Networks

Routing Dynamics in Simultaneous Overlay Networks Mukund Seshadri Randy Katz (mukunds@cs.berkeley.edurandy@cs.berkeley.edu) Berkeley-Helsinki Short Course Aug. 2003

Problem • Consider overlay routing when multiple independent overlay networks/flows interact: • Can this be unstable/inefficient? • Identify such scenarios. • Suggest improvements. • Identify scope for reduction of measurement overhead.

General Motivation • End-host controlled routing can become significant • Pure Overlay Network protocols (RON[3], Detour[4], ESM[5]) • Overlay primitives (“Path reflection”[1], i3-based [2]) • Better routing than Internet/BGP (resilience/performance/multicast/etc.) • What if several entities set up their own overlays? • Companies setting up distribution overlay networks… • Or, more ad-hoc users setting up overlay networks… • Flows within a single overlay… • Consider overlay networks/flows which have some physical links in common, but don’t explicitly coordinate with each other.

Unstable Routing Example 1+ Mbps (L2) Primary Paths Alternate Paths 2 Mbps L1 Sources Bottleneck Phy. Link Destinations 1 Mbps (L3) Ov.Nw. Nodes (2 Ovns) • L1 failure can cause synchronized oscillation of both flows between the two alternate paths

Focus • Main application – multimedia streams • Long-lived (medium) flows : ~ 1hr (5min) . • Flows require specified bandwidth levels • Flows require route stability (Packet-reordering, jitter undesirable) • Secondary app – long high volume transfers/sessions • Problem considered: selection of best routes (not location/DHTs) • Size: 50-500 overlay flows; 10-50 nodes each. • Independent decision makers - no explicit info. sharing • Unlike PlanetLab[6], underlay[7] model, i3-based soln.[2] • Independent administration might be desirable. • Don’t have to wait for infrastructure nodes to come up. • Most protocols like ESM can’t scale to thousands of nodes.

Overlay Network Model • Given M overlay networks/flows with N nodes each • Probing of all potential paths is done (O(N) cost). • Path characteristics are inferred from probes in some time window • With some error factor • We consider only bandwidth • Best path is selected to send traffic on (GREEDY) • Route change based on bandwidth improvement threshold (H) • Path-level simulator • Characterizes shared bottleneck links. • The level of sharing is characterized by “path density” • Unicast CBR flows with bandwidth requirement. • Metrics of interest • Loss Rate (related to bandwidth) • Stabilization time

Contribution • Study the need for “restraint” in route selection • Randomness in selection selection • Hysteresis • Time between re-route decisions

Hysteresis Required • No hysteresis threshold (H) for route change => unstable. • We will use 99% stabilization time.

H affects loss rate… • Will explore more later in the talk…

When does Greedy “fail”? • Defaults: • 500 overlay flows, • 50 bottleneck links • link capacities ~ flow requirements • ~50% cross-traffic • 10% measurement error. • 4x variation in link b/w. • ~25 links/flow (density) • Optimal Threshold Assumed • Large flows => more effect when re-routed => lower stability

When does Greedy “fail”? • High sharing=>many route-changes • Flows within a single overlay. • when overlay nodes are skewed towards certain ASes, like univ.s. • if several overlay flows independently use a medium size shared infrastructure.

Cross-Traffic • High Cross-Traffic causes the effect of overlay flows on available bandwidths to be lower, so greedy is more stable. • Other factors investigated: routing window variation, measurement error, excess capacity, bandwidth distribution.

Summary of “Greedy” • The following factors contribute to poor stability and performance of “Greedy” overlay path selection • Several flows’ paths share a large number of bottleneck links. • There is not much spare capacity in paths used. • There is a large variation in link and flow bandwidths. • The overlay traffic is a high fraction of traffic on the bottleneck links • Each flow’s bandwidth is significant compared to bottleneck link bandwidth.

Improvements to Greedy • Randomly select path to be chosen • ARAND: In proportion to available bandwidths • SRAND: Best of randomly selected subset of size S • …in proportion to capacity • Reduces measurement overhead • Works well for server load balancing [8] • (but different work model: jobs arrive and leave, and are assigned to only one server for their lifetime) • GRAND: Randomly select from the best S paths

Does Randomizing Help? • Randomization more useful at high densities. • More stable, lower loss, less sensitive to threshold setting.

Hysteresis Threshold • Optimal value of H very sensitive to parameters. • Flows can automatically discover the values of H. • Flows can independently “probe” values of H • No route change => decrease H • Route change => increase H • Try AIAD, MIMD, etc. • Can perform even better than with fixed H…

Exploring “H” • Very similar, MIMD stabilizes slightly quicker… • I/D pmtrs. not as sensitive to simulated network pmtrs. as H.

Exploring “H” (Contd.) • Performs much better than with fixed threshold, loss rates close to 0 • Stabilization times similar to fixed case.

Summary • SRAND is as good as or better than GREEDY in most cases • Measurement costs lowered, with performance similar to the proportional randomization method. • Automatic discovery of H works better than fixed H (and is more feasible). • Increasing time windows can help, particularly when flows arrive/depart.

Future Work • Define a general method that combines randomization, hysteresis estimation, and time variation (like simulated annealing) • Explore dynamic scenarios (flows arrive/depart). • Explore 2nd level control loop for MIMD pmtrs. • Implement/simulate using real topologies. • Can we define a general notion of “friendliness” pertaining to both route selection and traffic distribution over different routes?

References • Network layer Support for Overlay Networks – John Jannotti – OpenArch 2002. • Infrastructure Primitives for Overlay Networks – Karthik Lakshminarayanan et al. – under submission. • Resilient Overlay Networks – Andersen et al – SOSP 2001 • Detour: a Case for Informed Routing and Transport – Savage et al. – IEEE Micro Jan 1999. • A Case for End System Multicast – Yang-hua Chu et al. – JSAC 2002. • PlanetLab – http://www.planet-lab.org • A Routing Underlay for Overlay Networks – Nakao et al. – Sigcomm 2003. • How Useful is Old Information – M.Mitzenmacher – PODC 1997 • An Analysis of Internet Content Delivery Systems – Saroiu et al. – OSDI 2002.

…Backup Slides…

Stabilization Times of the *RANDs • Generally SRAND and ARAND stabilize quickly and have a very low loss rate. • Also investigated the effect of subset size on SRAND

Other Factors • Small amount of cheating doesn’t hurt the good flows, large amount does. • If link bandwidths are much higher than flow bandwidths, Greedy is more stable and performs better. • If link and flow BW are similar, then a high variation in the same causes Greedy to be fairly unstable.

Extra Slide2-Flow Illustration • We can randomize • Route selection • Proportional to Available BW • Time intervals • Of assessment and rerouting.

Routing Dynamics in Simultaneous Overlay Networks