1 / 60

Proactive Techniques for Correct and Predictable Internet Routing

Proactive Techniques for Correct and Predictable Internet Routing. Nick Feamster. The Internet. Internet Routing. Large-scale: Thousands of autonomous networks Self-interest: Independent economic and performance objectives But, must cooperate for global connectivity. Abilene.

morton
Download Presentation

Proactive Techniques for Correct and Predictable Internet Routing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proactive Techniques for Correct and Predictable Internet Routing Nick Feamster

  2. The Internet Internet Routing • Large-scale: Thousands of autonomous networks • Self-interest: Independent economic and performance objectives • But, must cooperate for global connectivity Abilene Comcast MIT AT&T Cogent

  3. Session Destination Next-hop AS Path 18.0.0.0/8 192.5.89.89 10578..3 66.250.252.44 18.0.0.0/8 174… 3 Internet Routing Protocol: BGP Autonomous Systems (ASes) Route Advertisement Traffic

  4. Configuration Defines BGP Behavior Flexibility for realizing goals in complex business landscape • Which neighboring networks can send traffic • Where traffic enters and leaves the network • How routers within the network learn routes to external destinations Traffic No Route Route Flexibility Complexity

  5. Configuring routers is like writing a distributed program. Operators make mistakes, often with catastrophic results. Problem

  6. Catastrophic Configuration Faults “…a glitch at a small ISP… triggered a major outage in Internet access across the country. The problem started when MAI Network Services...passed bad router information from one of its customers onto Sprint.” -- news.com, April 25, 1997 “Microsoft's websites were offline for up to 23 hours...because of a [router] misconfiguration…it took nearly a day to determine what was wrong and undo the changes.” -- wired.com, January 25, 2001 “WorldCom Inc…suffered a widespread outage on its Internet backbone that affected roughly 20 percent of its U.S. customer base. The network problems…affected millions of computer users worldwide. A spokeswoman attributed the outage to "a route table issue." -- cnn.com, October 3, 2002 "A number of Covad customers went out from 5pm today due to, supposedly, a DDOS (distributed denial of service attack) on a key Level3 data center, which later was described as a route leak (misconfiguration).” -- dslreports.com, February 23, 2004

  7. Operator Mailing List Note: Only includes problems openly discussed on this list. Feamster et al., “An Empirical Study of ‘Bogon’ Route Advertisements”, SIGCOMM CCR 2005

  8. Why Correctness is Hard • Operators make mistakes • Configuration is difficult • Complex policies, distributed configuration • Interactions cause unintended consequences • Each network independently configured • Unintended policy interactions

  9. Goal Correctness and predictability of the global routing system, examining only local configurations

  10. Today: Reactive Operation What happens if I tweak this policy…? • Problems cause downtime • Problems often not immediately apparent Revert No Yes Desired Effect? Wait for Next Problem Configure Observe

  11. Proactive Techniques rcc Thesis: Proactive Operation • Idea: Analyze configuration before deployment Detect Faults Predict Traffic Flow Configure Deploy Many faults can be detected with static analysis.

  12. Contributions • Correctness specification and constraints • rcc (“router configuration checker”) • Static configuration analysis tool for fault detection • Used by operators of large backbone networks • Analysis of real-world network configurations from 17 autonomous systems (ASes) • Route prediction using static configuration analysis • Sufficient and necessary conditions for safe routing • http://nms.csail.mit.edu/rcc/ • About 100 downloads (70 network operators)

  13. Take-home lessons • Configuration can be factored into a few operations • Static configuration analysis uncovers errors • Major causes of error: • Distributed configuration • Intra-AS dissemination is too complex • Mechanistic expression of policy • Guaranteeing safety while preserving autonomy requires tight restrictions on expressiveness

  14. Outline • Correctness specification and constraints • Path visibility • Route validity • Safety • Proactive fault detection with rcc • Path visibility and route validity • Implementation and findings • Safety v. policy expressiveness • Local guarantees for safety • Implications

  15. Policy: P(vi-1, vi, vi+1, d) 0,1 (d v2) (d v3) Paths, Routes, and Policy Path: (v1, v2, ..., vn) d Route: (d vi) vn vi v2 v1 Routes induce paths Consistency: All induced paths along a path to the destination are subpaths of the original path Policy-conformance: All nodes along an induced path have P=1

  16. Filtering: route advertisement Dissemination: internal route advertisement Factoring Routing Configuration Hundreds of thousands of lines of configuration in hundreds of routers. Ranking: route selection Customer Primary Competitor Backup

  17. Path visibility faults Dissemination • Partition in graph that disseminates routes Next 2 slides Filtering • Filtering routes for usable paths Path Visibility If there exists a path, then there exists a route If there is at least one policy-conformant path to the destination, then routers should select routes that induce one of them.

  18. “iBGP” Path Visibility: Internal BGP (iBGP) Default: “Full mesh” iBGP. Doesn’t scale. Large ASes use “Route reflection” Route reflector: non-client routes over client sessions; client routes over all sessions Client: don’t re-advertise iBGP routes.

  19. iBGP Signaling: Static Check Theorem. Suppose the iBGP reflector-client relationship graph contains no cycles. Then, path visibility is satisfied if, and only if, the set of routers that are not route reflector clients forms a clique. Condition is easy to check with static analysis.

  20. Route validity faults Filtering Next slide - Advertising routes that violate higher-level policy- Originating routes for private (or unowned) address space Dissemination - Loops and “deflections” along internal routing path Route Validity If there exists a route, then there exists a path Routers should select routes that induce only consistent, policy-conformant paths. Must form beliefs about high-level policy

  21. Sprint AT&T Route Validity: Consistent Export • Settlement-free peering rules: • Advertise routes at all peering points • Advertised routes must have equal “AS path length” “equally good” routes on all BGP sessions Some ASes routinely violate this constraint. [IMC 2004]

  22. Safety The protocol does not oscillate The protocol computes a stable path assignment for every initial state and message ordering. Depends on the interactions of rankings and filters of multiple ASes Challenge: Guarantee safety with only “local” information (Preserve the autonomy of each AS.)

  23. Outline • Correctness specification and constraints • Path visibility • Route validity • Safety • Proactive fault detection with rcc • Path visibility and route validity • Implementation and findings • Safety v. policy expressiveness • Local guarantees for safety • Implications

  24. rcc Overview Distributed router configurations (Single AS) • Analyzing complex, distributed configuration • Defining a correctness specification • Mapping specification to constraints “rcc” Correctness Specification Constraints Faults Normalized Representation Challenges

  25. http://nms.csail.mit.edu/rcc/ rcc Implementation Preprocessor Parser Distributed router configurations Relational Database (mySQL) (Cisco, Avici, Juniper, Procket, etc.) Constraints Verifier Faults

  26. Path Visibility Faults in Practice Analysis of configuration from 17 ASes 420 sessions(8 ASes) 133 routers(7 ASes) 11 Partitions(6 ASes)

  27. Route Validity Faults in Practice Analysis of configuration from 17 ASes 233 Sessions(9 ASes) 196 Sessions(6 ASes) 117 Sessions(7 ASes) 45 Sessions(7 ASes) 6 Sessions(1 AS)

  28. Causes of Error Every AS had faults, regardless of network size Most faults can be attributed to distributed configuration Route Validity Path Visibility

  29. Feedback From Network Operators “That’s wicked!” -- Nicolas Strina, ip-man.net “Thanks again for a great tool.” -- Paul Piecuch, IT Manager “...good to finally see more coverage of routing as distributed programming. From my experience, the principles of software engineering eliminate a vast majority of errors.” --Joe Provo, rcn.com “I find your approach useful, it is really not fun (but critical for the health of the network) to keep track of the inconsistencies among different routers…a configuration verifier like yours can give the operator a degree of confidence that the sky won't fall on his head real soon now.” -- Arnaud Le Tallanter, clara.net

  30. rcc: Take-home lessons • Static configuration analysis uncovers many errors • Major causes of error: • Distributed configuration • Intra-AS dissemination is too complex • Mechanistic expression of policy • http://nms.csail.mit.edu/rcc/ • About 100 downloads (70 network operators)

  31. Outline • Correctness specification and constraints • Path visibility • Route validity • Safety • Proactive fault detection with rcc • Path visibility and route validity • Implementation and findings • Safety v. policy expressiveness • Local guarantees for safety • Implications

  32. Safety The protocol does not oscillate If the protocol computes a stable path assignment for every initial state and message ordering, then safety is satisfied. Depends on the interactions of rankings and filters of multiple ASes Challenge: Guarantee safety with only “local” information (Preserve the autonomy of each AS.)

  33. 1 2 3 Safety: No Persistent Oscillation Depends on the interactions of rankings and filters of multiple ASes 1 3 0 1 0 0 2 1 0 2 0 3 2 0 3 0 Dispute wheel: global, cyclic relationship among rankings Varadhan, Govindan, & Estrin, “Persistent Route Oscillations in Interdomain Routing”, 1996 Griffin, Shepherd, & Wilfong, “The Stable Paths Problem and Interdomain Routing”, ToN, 2002

  34. First Necessary Condition for Safety Safe No Dispute Ring Safe under Filtering No Dispute Wheel Dispute ring: Dispute wheel where each node only appears once We show: Dispute ring implies no safety under filtering Problem: “No dispute ring” is still a global condition.

  35. Goal: Local Constraints for Safety Given no restrictions on filtering or topology, what are the local restrictions on rankings to guarantee globalsafety under filtering?

  36. Autonomy Rankings (from single AS) ARC Function Accept/Reject

  37. 2 3 0 2 0 1 2 0 1 0 Node 1’s Rankings Node 2’s Rankings ARC Function Properties Permutation Invariance: Node labels don’t matter ARC Function Accept Accept Scale Invariance Adding new nodes does not force a node to change its rankings over old paths.

  38. 3*,2*,0* 1 2 3 1*, 3*, 0* 2*,1*,0* Examples of ARC Functions Accept only next-hop rankings • Captures most routing policies • Problem: system may not be safe (See Section 6.4 for proof) Accept only shortest hop count rankings • Guarantees safety under filtering • Problem: not expressive

  39. What ARC Functions Violate Safety? Theorem. Permitting paths of length n+2 over paths of length n will violate safety under filtering. Theorem. Permitting paths of length n+1 over paths of length n will result in a dispute wheel. Proof Idea:Use the ARC function to construct a dispute ring (respectively, wheel). See Section 6.6.

  40. Outline • Correctness specification and constraints • Path visibility • Route validity • Safety • Proactive fault detection with rcc • Path visibility and route validity • Implementation and findings • Safety v. policy expressiveness • Local guarantees for safety • Implications

  41. Proactive Techniques rcc Static Analysis in the Workflow Detect Faults Predict Traffic Flow Configure Deploy Many faults can be detected with static analysis. Challenge: Adoption

  42. RCP iBGP Preventing Errors in the First Place Before: conventional iBGP eBGP iBGP After: RCP gets “best” iBGP routes (and IGP topology) Feamster et al., “The Case for Separating Routing from Routers”, SIGCOMM FDNA, 2004 Caesar et al., “Design and Implementation of a Routing Control Platform”, NSDI, 2005

  43. Safety: Possible Steps Forward • Add constraints on filtering • Relax autonomy of rankings • Restrict expressiveness: Shortest paths routing with autonomy for setting edge weights • Routing protocol converges on a fast timescale • Policy disputes (“tussle”) resolved on a slower timescale

  44. Summary of Contributions • Correctness specification and constraints • rcc (“router configuration checker”) • Static configuration analysis tool for fault detection • Used by operators of large backbone networks • Analysis of real-world network configurations from 17 autonomous systems (ASes) • Route prediction using static configuration analysis • Sufficient and necessary conditions for safe routing

  45. Known Constraints are Too Restrictive • Only three types of business relationships • Customer: filter none, rank highest • Peer: filter other peers and providers, rank second • Provider: filter other peers and providers, rank last Problems • Requires acyclic hierarchy (global condition) • Too restrictive to express important business relationships Sprint Abovenet Verio Customer PSINet Gao & Rexford, “Stable Internet Routing without Global Coordination”, IEEE/ACM ToN, 2001

  46. The protocol does not oscillate Correctness Specification Path Visibility Every destination with a usable path has a route advertisement If there exists a path, then there exists a route Example violation: Network partition Route Validity Every route advertisement corresponds to a usable path If there exists a route, then there exists a path Example violation: Routing loop Safety The protocol converges to a stable path assignment for every possible initial state and message ordering Example violation: Oscillation

  47. (Un)Related Work • Integrity & Consistency of Route Advertisements • S-BGP [Kent 2000], soBGP [White 2003], SPV [Hu 2004], Listen/Whisper [Subramanian 2004] • Model Checking/Formal Methods • Network protocols [Hajek 1978, Barghavan 2002] • Large programs [Musuvathi 2003] • Traffic Engineering • Intradomain [Fortz 2002] • Interdomain [Feamster 2003, Mahajan 2005] • Convergence Speed • Path exploration [Labovitz 1999], Route Flap Damping [Mao 2002]

  48. Sprint AT&T Route Validity: Consistent Export • Settlement-free peering rules: • Advertise routes at all peering points • Advertised routes must have equal “AS path length” “equally good” routes

  49. Inconsistent Export Observed at AT&T 15% of destinations inconsistent for >4 days Percentage of destinations with inconsistent routes Percentage of time Feamster et al., “BorderGuard: Detecting Cold Potatoes from Peers”. ACM IMC, October 2004.

More Related