Routing Convergence

Routing Convergence Global Routing

An Experimental Study of Delayed Internet Routing Convergence Internet Routing Convergence • Craig Labovitz, Abha Ahuja, Farnam Jahanian, Abhijit Bose • ACM Sigcomm September 2000

scale: with 50 million destinations: can’t store all dest’s in routing tables! routing table exchange would swamp links! administrative autonomy internet = network of networks each network admin may want to control routing in its own network Hierarchical Routing -- Review Untruths about Internet Routing: • all routers identical • network “flat” … not true in practice

aggregate routers into regions, “autonomous systems” (AS) routers in same AS run same routing protocol “inter-AS” routing protocol routers in different AS can run different inter-AS routing protocol special routers in AS run inter-AS routing protocol with all other routers in AS also responsible for routing to destinations outside AS run intra-AS routing protocol with other gateway routers gateway routers Hierarchical Routing

c b b c a A.c A.a C.b B.a Intra-AS and Inter-AS routing • Gateways: • perform inter-AS routing amongst themselves • perform intra-AS routers with other routers in their AS b a a C B d A network layer inter-AS, intra-AS routing in gateway A.c link layer physical layer

Inter-AS routing between A and B b c a a C b B b a c d Host h1 A A.a A.c C.b B.a Intra-AS and Inter-AS routing Host h2 Intra-AS routing within AS B Intra-AS routing within AS A

Reality may be closer to this… AS graphs obscure topology! The AS graph may look like this. Tim Griffin, Leiden 2000

Inter-AS routing (cont) • BGP (Border Gateway Protocol): the de facto standard • Path Vector protocol: and extension of Distance Vector • Each Border Gateway broadcast to neighbors (peers) the entire path (ie, sequence of ASs) to destination • For example, Gateway X may store the following path to destination Z: Path (X,Z) = X,Y1,Y2,Y3,…,Z

Inter-AS routing (cont) • Now, suppose Gwy X send its path to peer Gwy W • Gwy W may or may not select the path offered by Gwy X, because of cost, policy ($$$$) or loop prevention reasons. • If Gwy W selects the path advertised by Gwy X, then: Path (W,Z) = w, Path (X,Z) Note: path selection based not so much on cost (eg,# of AS hops), but mostly on administrative and policy issues (e.g., do not route packets through competitor’s AS)

Inter-AS routing (cont) • Peers exchange BGP messages using TCP. • OPEN msg opens TCP connection to peer and authenticates sender • UPDATE msg advertises new path (or withdraws old) • KEEPALIVE msg keeps connection alive in absence of UPDATES; it also serves as ACK to an OPEN request • NOTIFICATION msg reports errors in previous msg; also used to close a connection

Why different Intra- and Inter-AS routing ? • Policy: Inter is concerned with policies (which provider we must select/avoid, etc). Intra is contained in a single organization, so, no policy decisions necessary • Scale: Inter provides an extra level of routing table size and routing update traffic reduction above the Intra layer • Performance: Intra is focused on performance metrics; needs to keep costs low. In Inter it is difficult to propagate performance metrics efficiently (latency, privacy etc). Besides, policy related information is more meaningful. We need BOTH!

What is Routing Policy? • Description of the routing relationship between autonomous systems • Who are the peers? • What routes are • Originated by a peer? • Imported from each peer? • Exported to each peer? • Preferred when multiple routes exist? • What to do if no route exists?

The example I mentioned earlier Date: Fri, 25 Apr 1997 20:16:47 -0500 (CDT) Subject:** ALERT – Massive Routing Failures *** At about 10:30 AM today, one of Sprints customers (AS7007, Florida Internet Exchange) began announcing a /24 route for every CIDR block in the core routing table. This was due to a configuration problem in that they imported all their routing into a classfull interior routing protocol and then redistributed the route back into BGP, becoming a source for the first class C network in every CIDR block. Sprint does no border routing filters, so they happily accepted these routes and gave them away to all…

Motivation • Why we should care about convergence? • Routing reliability/fault-tolerance on small time scales (minutes) not previously a priority • Emerging transaction oriented and interactive applications (e.g. Internet Telephony) will require higher levels of end2end network reliability • How well does the Internet routing infrastructure tolerate faults?

Conventional Routing Wisdom • The Internet is designed to survive a nuclear cataclysm.Internet routing is robust under faults • Supports path re-routing and restoral on the order of seconds • The internet supports fast path rerouting and restoral. BGP has good convergence properties • Does not exhibit looping/bouncing problems of RIP • Internet fail-over will improve with faster routers and faster links • More redundant connections (multi-homing) to Internet will always improve site fault-tolerance

Contribution Labovitz et al show that most of the conventional wisdom about routing convergence is not accurate… • Measurement of BGP convergence in the Internet • Analysis/intuition behind delayed BGP routing convergence • Modifications to BGP implementations which would improve convergence times

Motivation • Why has fail-over and fault-tolerance not previously been a priority? • Applications like email not delay sensitive and possess fault-tolerance • TCP/IP fault-tolerance (resend) • Content replication helps improve reliability for static content • Network support is required for emerging transaction oriented and interactive applications (e.g. Internet Telephony, QoS)

Building a Reliable Internet • What Network support has been proposed already? • Significant recent improvement on data-link fail-over (e.g. SRP, Sonet). Solves some enterprise, intra-domain reliability problems • Also significant research on QoS and resource reservation protocols for the Internet • But, all of these protocols assume stable underlying IP forwarding path

Background • Internet sites multi-home, or purchase connectivity from multiple Internet providers to improve fault tolerance • Goal: tolerate a single link, router or ISP failure • 35% Internet end-sites currently multi-homed

BGP Sprint INTERNET Verio BGP Background: Multi-homing

PSTN versus Internet • Public Switched Telephone Network (PSTN) is the “other” network in place. • Trade-off between • scalability/extensibility/low cost and • fault-tolerance/service guarantees/high cost • PSTN retains significant intermediate state (i.e. circuit setup) and services on relatively few nodes. A “Smart Network” • Internet places all intelligence on end-nodes. A “Stupid Network”

Trade-Offs PSTN High State Reliability Service Guarantees Development Time Switch Cost Coordination Low Low High Scalability Flexibility Distributed Operation

Routing • Unlike circuit-switched PSTN, packet-switched Internet uses hop-by-hop forwarding and next-hop selection • Global state and circuit-setup used in PSTN • this is like owning an atlas and planning route • Internet routers only keep local knowledge and routes learned from neighbors • like asking directions at each stop

Internet Routing • Inter-domain Internet routing protocols are distance vector (i.e. Bellman-Ford) algorithms. Unlike PSTN, no pre-computed backup paths! • Distance vector protocols are problematic • Require time to converge • Suffer from “counting to infinity”

B A 1 2 R A 2 R 1 B 2 R 3 R=7 Node Distance Node Distance R=5 R=3 Problems with Distance Vector ProtocolsCounting to Infinity B A R R 7 R 5

Internet Routing • The Internet inter-domain routing protocol, BGP, “solves” count-to-infinity problem by keeping record of path the route announcement has traveled through network • Internet routing commonly (and incorrectly) believed to converge within 30 seconds

AS3 AS2 AS1 R AS2 AS1 R AS1 R BGP Routing R

TRAFFIC Open Question After a fault in a path to multi-homed site, how long does it take for the majority of Internet routers to fail-over to the secondary path? • Routing table convergence (backbone routers reach steady-state) after a fault • End-to-end paths stable (“normal” levels of loss and latency) BGP Primary ISP Customer BGP Backup ISP

Internet Fail-Over Experiments • Instrument the Internet • Inject routes into geographically and topologically diverse provider BGP peering sessions (Mae-West, Japan, Michigan, London) • Periodically fail and change these routes (i.e. send withdraws or new attributes) • Monitor impact faults through 1) recordings of BGP peering sessions with 20 tier1/tier2 ISPs and 2) active ICMP ECHO measurements (512 byte/second to 100 random web sites) • Write lots of Perl scripts • Wait two years… (125,000 routing events)

Experiment (For the Last Two Years)

Fault Scenarios • Tup -- A new route is advertised • Tdown -- A route is withdrawn (i.e. single-homed failure) • Tshort -- Advertise a shorter/better ASPath (i.e. primary path repaired) • Tlong -- Advertise a longer/worse ASPath (i.e.primary path fails)

Major Convergence Results • Routing convergence requires an order of magnitude longer than expected (10s of minutes) • Routes converge more quickly following Tup/Repair than Tdown/Failure events (“bad news travels more slowly”) • Curiously, withdrawals (Tdown) generate several times the number of announcements than announcements (Tup)

Example of BGP Convergence TIME BGP Message/Event 10:40:30 Route Fails/Withdrawn by AS2129 10:41:082117 announce 5696 2129 10:41:322117 announce 1 5696 2129 10:41:502117 announce 2041 3508 3508 4540 7037 1239 5696 2129 10:42:172117 announce 1 2041 3508 3508 4540 7037 1239 5696 2129 10:43:052117announce 2041 3508 3508 4540 7037 1239 6113 5696 2129 10:43:352117 announce 1 2041 3508 3508 4540 7037 1239 6113 5696 2129 10:43:59 2117 sends withdraw • BGP log of updates from AS2117 for route via AS2129 • One BGP withdrawal triggers 6 announcements and one withdrawal from 2117 • Increasing ASPath length until final withdraw

CDF of BGP Routing Table Convergence Times New Route Long->Short Fail-over Short->Long Fail-Over Failure • Less than half of Tdown events converge within two minutes • Tup/Tshort and Tdown/Tlong form equivalence classes • Long tailed distribution (up to 15 minutes)

Impact of Delayed Convergence • Why do we care about routing table convergence? It deleteriously impacts end-to-end Internet paths • ICMP experiment results • Loss of connectivity, packet loss, latency, and packet re-ordering for an average of 3-5 minutes after a fault • Why? Routers drop packets for which they do not have a valid next hop. Also problems with cache flushing in some older routers.

End-to-End Impact Failover • ICMP loss to 100 randomly chosen web sites with VIF source address of our probe • Tlong/Tshort exhibit similar relationship as before

Delayed Convergence Background • Well known that distance vector protocols exhibit poor convergence behaviors • Counting to infinity, looping, bouncing problem • RIP redefines infinity and adds split-horizon, poison reverse, etc. • Still, slow convergence and not scalable • BGP advertises ASPaths instead of distance • Solves counting to infinity and RIP looping problem, but… • BGP can still explore “invalid” paths during convergence (i.e. the bouncing problem)

R AS2 AS3 AS0 AS1 *B R via 3 B R via 13 B R via 23 *B R via 3 B R via 03 B R via 23 *B R via 3 B R via 03 B R via 13 * * * *B R via 013 B R via 103 *B R via 203 AS0 AS1 AS2 BGP Convergence Example

AS6453 AS2497 N > 4? 6453 1239 5696 237 AS6113 2497 5696 237 6113 2914 237 AS6461 6461 5696 237 AS1239 1239 5696 237 AS5696 5696 237 AS2914 2914 237 AS237 237 AS701 701 6461 5696 237 AS5000 5000 237 AS1 AS1673 1 5696 237 1673 5696 237

MinRouteAdver Rounds • Implementation of MinRouteAdver timer and receiver-side loop detection timer leads to 30 second rounds O(n-3)*30 seconds time complexity

An Experiment with SSF.OS.BGP4 • The Model • Topology: full mesh of N ASes, each with just 1 router • No route filtering • Shortest path is best • Advertise, Withdraw, Wait and Watch • Wait for system to reach stable state, then … • AS #1 advertises a bogus destination to everyone else • Wait for system to reach a stable state again, then … • AS #1 tells everyone that the bogus route is not reachable through it any more • Wait for system to reach a stable state again

4 5 1 bogus 3 2 N 10 20 30 40 50 longest path 9 20 28 40 46 convergence time after withdrawal (sec) 150 480 720 1080 1260 avg # updates due to withdrawal (range) 59.50 (35-84) 269.55 (58-397) 539.10 (118-892) 945.20 (160-1647) 1423.66 (196-2377)

. . . 1610.040778415 bgp@38:1 snd update to bgp@2:1 wds=bogus 1610.040778415 bgp@38:1 snd update to bgp@20:1 wds=bogus 1610.040778415 bgp@38:1 snd update to bgp@32:1 wds=bogus 1610.040778415 bgp@38:1 snd update to bgp@44:1 wds=bogus 1610.040890567 bgp@32:1 snd update to bgp@38:1 nlri=bogus,asp=32 44 34 38 4 22 2 20 48 10 26 12 6 16 36 8 14 24 28 41 18 51 21 33 45 43 35 3 5 47 23 31 37 49 25 46 39 7 27 13 9 29 11 15 17 50 19 42 40 30 1 1610.040890567 bgp@32:1 snd update to bgp@44:1 wds=bogus 1610.040907352 bgp@44:1 snd update to bgp@38:1 wds=bogus 1610.040907352 bgp@44:1 snd update to bgp@34:1 nlri=bogus,asp=44 38 34 32 4 22 2 20 48 10 26 12 6 16 36 8 14 24 28 41 18 51 21 33 45 43 35 3 5 47 23 31 37 49 25 46 39 7 27 13 9 29 11 15 17 50 19 42 40 30 1 1610.050930294 bgp@44:1 snd update to bgp@32:1 wds=bogus . . .

The Problem with BGP • If we assume • unbounded delay on BGP processing and propagation • Full BGP mesh BGP peers • Constrained shortest path first selection algorithm • BGP is O(N!), where N number of default-free BGP speakers • There exists possible ordering of messages such that BGP will explore all possible ASPaths of all possible lengths

BGP and RIP • RIP precisely monotonically increasing. Can explore metrics (1…N) • BGP monotonically increasing. Multiple (N!) ways to represent a path metric of N. • BGP “solved” RIP routing table loop problem by making it exponentially worse… 2117 5696 2129 2117 1 5696 2129 2117 2041 3508 3508 4540 7037 1239 5696 2129 2117 1 2041 3508 3508 4540 7037 1239 5696 2129 2117 2041 3508 3508 4540 7037 1239 6113 5696 2129 2117 1 2041 3508 3508 4540 7037 1239 6113 5696 2129

BGP Best Case • What is the best we can expect from BGP? • Implementation of MinRouteAdver timer leads to 30 second rounds • Time complexity is O(n-3)*30 seconds • State/Computational complexity O(n) • At its best, BGP performs as well as RIP2 (but uses exponentially more memory in the process)

MinRouteAdver • Minimum interval between successive updates sent to a peer for a given prefix • Allow for greater efficiency/packing of updates • Rate throttle • Applied only to announcements (at least according to BGP RFC) • Applied on (prefix destination, peer) basis, but implemented on (peer) basis

MinRouteAdver • 30*(N-3) delay due to creation mutual dependencies. Provide proof that N-3 rounds necessarily created during bounded BGP MinRouteAdver convergence • Rounds due to • Ambiguity in the BGP RFC and lack receiver loop detection • Inclusion of BGP withdrawals with MinRouteAdver (in violation of RFC)

Simulation Results

Routing Convergence

Routing Convergence

Presentation Transcript

Delayed Internet Routing Convergence

Internet Routing (COS 598A) Today: Interdomain Routing Convergence

Internet Routing (COS 598A) Today: Non-Convergence: Policy Conflicts

Accelerated Routing Convergence for BGP Graceful Restart

CONVERGENCE!

Convergence

Convergence

A Study of Packet Delivery Performance during Routing Convergence

Inter-Domain Routing Convergence Issues, Impacts and Improvements

Internet Routing: BGP Routing Convergence

Convergence ?

Convergence ?

Measuring IP Network Routing Convergence

Delayed Internet Routing Convergence due to Flap Dampening

Delayed Internet Routing Convergence

Fast Convergence of Selfish Re-Routing

Achieving Convergence-Free Routing using Failure-Carrying Packets

Fast Convergence of Selfish Re-Routing

Convergence ?

Considerations in Benchmarking Routing Protocol Network Convergence

Convergence ?

Convergence ?