1 / 18

Infrastructure-based Resilient Routing

Infrastructure-based Resilient Routing. Ben Y. Zhao, Ling Huang, Jeremy Stribling, Anthony Joseph and John Kubiatowicz University of California, Berkeley Sahara Winter Retreat, 2004. Challenges Facing Network Applications. Network connectivity is not reliable

gordon
Download Presentation

Infrastructure-based Resilient Routing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Infrastructure-basedResilient Routing Ben Y. Zhao, Ling Huang, Jeremy Stribling, Anthony Joseph and John Kubiatowicz University of California, Berkeley Sahara Winter Retreat, 2004

  2. Challenges Facing Network Applications • Network connectivity is not reliable • Disconnections frequent in the wide-area Internet • IP-level repair is slow • Wide-area: BGP  3 mins • Local-area: IS-IS  5 seconds • Next generation network applications • Mostly wide-area • Streaming media, VoIP, B2B transactions • Low tolerance of delay, jitter and faults • Our work: transparent resilient routing infrastructure that adapts to faults in not seconds, but milliseconds ravenben@eecs.berkeley.edu

  3. Talk Overview • Motivation • A Structured Overlay Infrastructure • Mechanisms and policy • Evaluation • Summary ravenben@eecs.berkeley.edu

  4. The Challenge • Routing failures are diverse • Many causes: • Router misconfigurations, cut fiber, planned downtime, protocol implementation bugs • Occur anywhere with local or global impact: • Single fiber cut can disconnect AS pairs • Isolating failures is difficult • Wide-area measurement is ongoing research • Single event leads to complex inter-protocol interactions • End user symptoms often dynamic or intermittent • Requires: • Fault detection from multiple distributed vantage points • In-network decision making necessary for timely responses ravenben@eecs.berkeley.edu

  5. An Infrastructure Approach • Our goals • Overlay focused on resiliency • Route around failures to maintain connectivity • Respond in milliseconds (react instantaneously to faults) • Our approach • Large-scale infrastructure for fault and route discovery • Nodes are observation points (similar to Plato’s NEWS service) • Nodes are also points of traffic redirection(forwarding path determination and data forwarding) • Automated fault-detection and circumvention • No edge node involvement: fast response time, security focused on infrastructure • Fully transparent, no application awareness necessary ravenben@eecs.berkeley.edu

  6. Qwest Backbone An Illustration Goal: fast fault detection and route-around Key: on the fly in-network traffic redirection ravenben@eecs.berkeley.edu

  7. Why Structured Overlays • Resilient Overlay Networks (MIT) • Fully connected mesh • Allows each node full knowledge of network • Fast, independent calculation of routes • Nodes can construct any path, maximum flexibility • Cost of flexibility • Protocol needs to choose the “right” route/nodes • Per node O(n) state • Monitors n - 1 paths • O(n2) total path monitoring is expensive D S ravenben@eecs.berkeley.edu

  8. O V E R L A Y v v v v v v v v v v v v v The Big Picture • Locate nearby overlay proxy • Establish overlay path to destination host • Overlay traffic routes traffic resiliently Internet ravenben@eecs.berkeley.edu

  9. register register get (hash(B)) P’(B) put (hash(B), P’(B)) put (hash(A), P’(A)) Traffic Tunneling A, B are IP addresses Legacy Node B Legacy Node A B P’(B) Proxy P’(B) = B P’(A) = A Proxy Structured Peer to Peer Overlay • Store mapping from end host IP to its proxy’s overlay ID • Similar to approach in Internet Indirection Infrastructure (I3) ravenben@eecs.berkeley.edu

  10. Tradeoffs of Tunneling via P2P • Less neighbor paths to monitor per node: O(log(n)) • Large reduction in probing bandwidth: O(n)  O(log(n)) • Faster fault detection with low bandwidth consumption • Actively maintain path redundancy • Manageable for “small” # of paths • Redirect traffic immediately when a failure is detectedEliminate on-the-fly calculation of new routes • Restore redundancy when a path fails • Fast fault detection + precomputed paths = increased responsiveness • Cons: overlay imposes routing stretch (mostly < 2) ravenben@eecs.berkeley.edu

  11. In-network Resiliency Mechanisms • Efficient fault detection • Use soft-state to periodically probe log(n) neighbor paths • “Small” number of routes  reduced bandwidth • Exponentially weighted moving averagefor link quality estimation • Avoid route flapping due to short term loss artifacts • Loss rate Ln = (1 - )  Ln-1 +   p • Simple approach taken, ongoing research available • Smart fault-detection / propagation (Zhuang04) • Intelligent and cooperative path selection (Seshardri04) • Maintaining backup paths • Each hop has flexible routing constraint • Create and store backup routes at node insertion • Restore redundancy via “intelligent” gossip after failures • Simple policies to choose among redundant paths ravenben@eecs.berkeley.edu

  12. First Reachable Link Selection (FRLS) • Use estimated loss results to choose shortest “usable” path • Sort next hop paths by latency • Use shortest path withminimal quality > T • Correlated failures • Reduce with intelligent topology construction • Key is to leverage redundancy available ravenben@eecs.berkeley.edu

  13. Evaluation • Metrics for evaluation • How much routing resiliency can we exploit? • How fast can we adapt to faults (responsiveness)? • Experimental platforms • Event-based simulations on transit stub topologies • Data collected over multiple 5000-node topologies • PlanetLab measurements • Microbenchmarks on responsiveness More details in paper (ICNP03) and poster session ravenben@eecs.berkeley.edu

  14. Exploiting Route Redundancy (Sim) • Simulation of Tapestry, 2 backup paths per routing entry • Transit-stub topology shown, results from TIER and AS graphs similar ravenben@eecs.berkeley.edu

  15. 660 300 Responsiveness to Faults (PlanetLab) • Response time increases linearly with probe period • Minimum link quality threshold T = 70%, 20 runs per data point ravenben@eecs.berkeley.edu

  16. Link Probing Bandwidth (Planetlab) • Medium sized routing overlays incur low probing bandwidth • Bandwidth increases logarithmically with overlay size ravenben@eecs.berkeley.edu

  17. Conclusion • Pros and cons of infrastructure approach • Structured routing has low path maintenance costs • Allows “caching” of backup paths for quick failover • Transparent to user applications • Can no longer construct arbitrary paths • Structured routing with low redundancy close to ideal connectivity • Incur low routing stretch • Fast enough for highly interactive applications • 300ms beacon period  response time < 700ms • On overlay networks of 300 nodes, b/w cost is 7KB/s • Ongoing questions • Is there lower bound on desired responsiveness?Should we use multipath redundant routing for resilience? • How to deploy as a single network across ISPs?VPN-like routing service? ravenben@eecs.berkeley.edu

  18. Related Work • Redirection overlays • Detour (IEEE Micro 99) • Resilient Overlay Networks (SOSP 01) • Internet Indirection Infrastructure (SIGCOMM 02) • Secure Overlay Services (SIGCOMM 02) • Topology estimation techniques • Adaptive probing (IPTPS 03) • Internet tomography (IMC 03) • Routing underlay (SIGCOMM 03) • Many, many other structured peer-to-peer overlays Thanks to Dennis Geels / Sean Rhea for their work on BMark ravenben@eecs.berkeley.edu

More Related