1 / 34

Improving Internet Availability with Path Splicing

Improving Internet Availability with Path Splicing. Murtaza Motiwala Nick Feamster Santosh Vempala. Availability. “It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder.

Download Presentation

Improving Internet Availability with Path Splicing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Internet Availabilitywith Path Splicing Murtaza MotiwalaNick FeamsterSantosh Vempala

  2. Availability • “It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder. • Over time, our list will evolve. It should be: • Robust and available. The network should be as robust, fault-tolerant and available as the wire-line telephone network is today. • … It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder. Over time, our list will evolve. It should be: 1. Robust and available. The network should be as robust, fault-tolerant and available as the wire-line telephone network is today.

  3. Availability of Other Services • Carrier Airlines (2002 FAA Fact Book) • 41 accidents, 6.7M departures • 99.9993% availability • 911 Phone service (1993 NRIC report +) • 29 minutes per year per line • 99.994% availability • Std. Phone service (various sources) • 53+ minutes per line per year • 99.99+% availability

  4. Can the Internet Be “Always On”? • Various studies (Paxson, Andersen, etc.) show the Internet is at about 2.5 “nines” • More “critical” (or at least availability-centric) applications on the Internet • At the same time, the Internet is getting more difficult to debug • Increasing scale, complexity, disconnection, etc. Is it possible to get to “5 nines” of availability?If so, how?

  5. High Availability: Two Aspects • Reliability: Connectivity in the routing tables should approach the that of the underlying graph • If two nodes s and t remain connected in the underlying graph, there is some sequence of hops in the routing tables that will result in traffic • Recovery:In case of failure (i.e., link or node removal), nodes should quickly be able to discover a new path

  6. Where Today’s Protocols Stand • Reliability: Routing protocols are single path. • When a link or node failure occurs, routers must recompute new paths to each destination • Approach: Compute backup paths • Challenge: Many possible failure scenarios! • Recovery: Today’s Internet routing protocols • Meanwhile, packets are dropped, reordered, etc. • Approach: Switch to a backup when a failure occurs • Challenge: Must quickly discover a new working path

  7. Multipath: Promise and Problems • Bad: If any link fails on both paths, s is disconnected from t • Want:End systems remain connected unless the underlying graph has a cut s t

  8. t Path Splicing: Main Idea Compute multiple forwarding trees per destination.Allow packets to switch slices midstream. • Step 1 (Perturbations): Run multiple instances of the routing protocol, each with slightly perturbed versions of the configuration • Step 2 (Slicing): Allow traffic to switch between instances at any node in the protocol s

  9. Outline • Path Splicing • Achieving Reliabile Connectivity • Mechanism #1: Random Perturbations • Mechanism #2: Network Slicing • Forwarding • Recovery • Properties • High Reliability • Bounded Stretch • Fast recovery • Open Questions

  10. Perturbed Graph 1.5 4 1.5 5 s t 1.25 3.5 Mechanism #1: Perturbations • Goal: Each instance provides different paths • Mechanism: Each edge is given a weight that is a slightly perturbed version of the original weight • Two schemes: Uniform and degree-based “Base” Graph 3 3 s t 3

  11. How to Perturb the Link Weights? • Uniform: Perturbation is a function of the initial weight of the link • Degree-based:Perturbation is a linear function of the degrees of the incident nodes • Intuition: Deflect traffic away from nodes where traffic might tend to pass through by default

  12. a s t b dst next-hop c t a Slice 1 t c Slice 2 Mechanism #2: Network Slicing • Goal: Allow multiple instances to co-exist • Mechanism: Virtual forwarding tables

  13. Forwarding Traffic • Packet has shim header with forwarding bits • Routers use lg(k) bits to index forwarding tables • Shift bits after inspection • To access different (or multiple) paths, end systems simply change the forwarding bits • Incremental deployment is trivial • Persistent loops cannot occur

  14. Putting It Together • End system sets forwarding bits in packet header • Forwarding bits specify slice to be used at any hop • Router: examines/shifts forwarding bits, and forwards s t

  15. A Definition Motivated by Reliability • Reliability:the probability that, upon failing each edge with probability p, the graph remains connected • Reliability curve:the fraction of source-destination pairs that remain connected for various link failure probabilities p • The underlying graph has an underlying reliability (and reliability curve) • Goal: Reliability of routing system should approach that of the underlying graph.

  16. Reliability Curve: Illustration Fraction of source-dest pairs disconnected Better reliability Probability of link failure (p) More edges available to end systems -> Better reliability

  17. Reliability Approaches Optimal • Sprint (Rocketfuel) topology • 1,000 trials • p indicates probability edge was removed from base graph Reliability approaches optimal Average stretch is only 1.3 Sprint topology,degree-based perturbations

  18. Recovery is Fast • Which paths can be recovered within 5 trials? • Sequential trials: 5 round-trip times • …but trials could also be made in parallel Recovery approaches maximum possible Adding a few more slices improves recovery beyond best possible reliability with fewer slices.

  19. Stretch is Bounded • Stretch:How much longer is the path taken by packets over the “optimal” path? • Stretch is bounded in one slice by amount of perturbation • …but what about the stretch of spliced paths? • As long as “significant progress” (a large fraction of the distance to d) is achieved for each hop, stretch bounded Implication:Loops are rare.

  20. High Availability with Splicing • Reliability: Connectivity in the routing tables should approach the that of the underlying graph • Approach: Overlay trees generated using random link-weight perturbations. Allow traffic to switch between them. • Result: Splicing ~ 10 trees achieves near-optimal reliability • Recovery:In case of failure (i.e., link or node removal), nodes should quickly be able to discover a new path • Approach: End nodes randomly select new bits. • Result: Recovery within five trials approaches best possible.

  21. Open Questions and Future Work • How does splicing interact with traffic engineering? • (How) can the bits be best encoded? • What changes are required to today’s routers to make splicing possible? • Can splicing eliminate dynamic routing?

  22. Conclusion • Simple: Forwarding bits provide access to different paths through the network • Scalable: Exponential increase in available paths, linear increase in state • Stable: Fast recovery does not require fast routing protocols • No modifications to existing routing protocols http://www.cc.gatech.edu/~feamster/tmp/splicing-hotnets.pdf

  23. History: Network Embedding • Given: virtual (V) and physical (P) network • Topology, constraints, etc. • Problem: find the appropriate mapping onto available physical resources (nodes and edges) • Idea: Define a virtual graph G’ onto which G can be embedded • A link in G can be mapped to multiple links in G’ • How to forward traffic over multiple links in G’? • …

  24. Possible Applications/Future Work • Fast recovery from poorly performing paths • Data transfer with easy multi-path • Overlay networks, CDNs, etc. • Transfer of video with multiple description • Security applications • Spatial diversity in wireless networks

  25. Significant Novelty for Modest Stretch • Novelty: difference in nodes in a perturbed shortest path from the original shortest path Fraction of edges on short path shared with long path Example s d Novelty: 1 – (1/3) = 2/3

  26. Related Work • Pre-Computed Backup Paths • Multi-Topology Routing • Multiple Router Configuration • MPLS Fast Reroute • End-Node Controlled Traffic • Source routing • Routing deflections • Multipath routing (ECMP, MIRO, etc.) • IGP link-weight optimization • Measurement of path diversity and multihoming • Layer-3 VPNs

  27. Other Properties • Scalable • Exponential increase in paths, linear increase in state • Fast recovery from underlying failures • Automatic tuning (e.g., for traffic engineering) • Perturbations achieve property of automatically spreading traffic across different links • Standard link-weight optimization is potentially brittle in the face of link failures • Incrementally deployable

  28. Control Plane Daemon ForwardingTable Prototype Implementation • Click and Quagga on PL-VINI • http://www.vini-veritas.net/ Control Plane Daemon ForwardingTable Classifier

  29. Required new functionality • Storing multiple entries per prefix • Indexing into them based on packet headers • Selecting the “best” k routes for each destination Variation: BGP Splicing • Observation: Many routers already learn multiple alternate routes to each destination. • Idea: Use the forwarding bits to index into these alternate routes at an AS’s ingress and egress routers. default d alternate Splice paths at ingress and egress routers

  30. Loops, Reconsidered • Problem: Potential for loops between ASes • AS-level loops can be longer than intra-AS loops • Two possible approaches • Detection: routers mark packets and determine that packets have traversed the same AS twice • Prevention: Exploit “common” routing policies to ensure that packets are only deflected along valley-free paths

  31. Preventing Inter-AS Loops with Policy Observation: inter-AS loops inherently involve traversal that violates valley-free Constraints: 1. once a “down” deflection has occurred, do not deflect 2. only allow one “across” deflection Possible relaxation: allow a limited number of violations, specified by source

  32. Definitions of Path Diversity • Connectivity: Minimum number of edges whose failure disconnects the graph (min cut) • Expansion: Intuitively, small cuts disconnect small groups of nodes from the graph

  33. Design Goals • Reachability: allow endpoints to communicate • High Diversity: expose paths to end hosts that survive failures • Capacity: the total available data rate between each source-destination pair should be high • Fault tolerance: the number of disjoint paths should be high, and the network should remain connected under failures • Low Stretch:paths should not be too circuitous • Scalability: scale to a large number of networks, destinations, routers, etc. Today’s routing protocols do not exploit the diversity of the underlying network graph

More Related