R-BGP: Staying Connected in a Connected World
Download
1 / 70

BGP Convergence Causes Packet Loss - PowerPoint PPT Presentation


  • 298 Views
  • Updated On :

R-BGP: Staying Connected in a Connected World. Nate Kushman Srikanth Kandula, Dina Katabi, and Bruce Maggs. The Problem:. BGP Convergence Causes Packet Loss. When a route changes, up to 30% packet loss for more than 2 minutes [Labovitz00]

Related searches for BGP Convergence Causes Packet Loss

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'BGP Convergence Causes Packet Loss' - Gabriel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

R-BGP: Staying Connected in a Connected World

Nate Kushman

Srikanth Kandula, Dina Katabi,and Bruce Maggs


Bgp convergence causes packet loss l.jpg

The Problem:

BGP Convergence Causes Packet Loss

  • When a route changes, up to 30% packet loss for more than 2 minutes [Labovitz00]

  • Even domains dual homed to tier 1 providers see many loss bursts on a route change [Wang06]

  • Even popular prefixes experience losses due to BGP convergence [Wang05]

  • 50% of VoIP disruptions are highly correlated with BGP updates [Kushman06]


Slide3 l.jpg

Links, Links Everywhere But Not a Path to Forward!

Goal:

Ensure ASes stay connected as long as the physical network is connected


We focus on forwarding l.jpg
We Focus on Forwarding

  • Don’t worry about BGP’s routing

  • Ensure forwarding works by forwarding packets on pre-computed failover paths


Why focus on forwarding l.jpg
Why Focus on Forwarding?

  • Convergence is unlikely to be fast enough

  • Strict timing constraints limit innovation


Slide6 l.jpg

Our Contribution

Guarantee:

No BGP caused packet loss

Low Overhead:

Just like BGP, each AS advertises at most one path to each neighbor

On link failure, we reduce disconnected ASes from 22% to Zero


What causes transient disconnection l.jpg
What Causes Transient Disconnection?

AT&T

Sprint

Peter

All of Hari’s providers use him to get to MIT

BGP Rule:

An AS advertises only its current forwarding path

Hari

 Nobody offers Hari an alternate path

MIT


What causes transient disconnection8 l.jpg
What Causes Transient Disconnection?

AT&T

Sprint

Peter

Hari knows no path to MIT

Hari drops Peter and AT&T’s packets in addition to his own

Hari

LOSS!

X

Link Down

MIT


What causes transient disconnection9 l.jpg
What Causes Transient Disconnection?

Hari withdraws path

AT&T

Sprint

Peter

AT&T and Peter move to alternate paths

Hari

X

MIT


What causes transient disconnection10 l.jpg
What Causes Transient Disconnection?

Hari withdraws path

AT&T

Sprint

Peter

AT&T and Peter move to alternate paths

AT&T announces the Sprint path to Hari

 Traffic flows

Hari

X

Transient Packet Loss

MIT


How do failover paths solve the problem l.jpg
How do failover paths solve the problem?

BGP:

An AS advertises only its current path. It advertises an alternate only after a link fails

R-BGP:

Advertises an alternate, i.e. failover path, before a link fails


Failover paths l.jpg
Failover Paths

AT&T advertises to Hari “AT&T Sprint  MIT” as a failover path

Peter

AT&T

Sprint

Link Fails  Hari immediately sends traffic on failover path

Hari

No Loss !

X

MIT


Two challenges l.jpg
Two Challenges

Challenge 1:

Minimize the number of failover paths, while ensuring an AS always has a usable path

Challenge 2:

Transition from usable path to converged path without creating forwarding loops


Slide14 l.jpg

Challenge 1: Minimize number of failover paths

Claim: Just like BGP, advertise one path per neighbor, either current or failover

Current path

Current path

AT&T

Peter

Sprint

Current path

Failover Path

Hari

Insight: Replace path advertised to downstream AS with a failover path

MIT


Which failover path should it advertise l.jpg
Which failover path should it advertise?

AT&T

John

x

Bob

Joe

Most Disjoint Path

Dest

Lemma:Advertising Most Disjoint is equivalent to advertising all paths.


Slide16 l.jpg

Challenge 1: Minimize number of failover paths

R-BGP Rule:

Advertise to downstream AS as a failover path the path most disjoint from the current path

When a link fails:

Theorem 1:

The AS upstream of down link knows a failover path if it will know a path at convergence


Challenge 2 transition without loops l.jpg
Challenge 2: Transition without loops

AT&T

Hari withdraws path

Sprint

Peter

Hari

X

MIT


Challenge 2 transition without loops18 l.jpg
Challenge 2: Transition without loops

LOOP!

AT&T

Hari withdraws path

Sprint

Peter

Peter may choose to route through AT&T

AT&T may choose to route through Peter

Hari

X

Forwarding Loop!

MIT


Challenge 2 transition without loops19 l.jpg
Challenge 2: Transition without loops

Solution 2:Root Cause Information

Hari includes Root Cause Information with the withdrawal

AT&T

Sprint

Peter

AT&T recognizes the Peter->Hari->MIT path is down

Hari->MIT

Hari->MIT

Link down

It routes through Sprint instead

Hari

X

Theorem 2 :

No forwarding loops will form

MIT


R bgp l.jpg
R-BGP

Solution 1: Advertise most disjoint path to downstream AS

Solution 2: Include Root Cause Information

Final Theorem:

No AS will see BGP caused packet loss if it will have a path at convergence



Setup l.jpg
Setup

  • AS-Level Simulation over the full Internet

  • AS-graph with 24,142 ASes from Routeviews BGP Data

  • Use inference algorithm to annotate links with customer-provider or peer relationships


Single link failure results l.jpg
Single Link Failure Results

  • Dual-homed AS loses one link

  • Find percentage of ASs that see transient disconnection to the destination

  • Run for all dual homed ASes

X

Destination


Single link failure results24 l.jpg
Single Link Failure Results

Percentage of ASes transiently disconnected

22% - BGP

Zero - R-BGP

R-BGP Eliminates all Transient Disconnection


Cost of policy compliance l.jpg
Cost of Policy Compliance

  • Most disjoint path may not be compliant with BGP routing policies

  • Still an AS may want to advertise it:

    • To protect its own traffic

    • Because it is temporary

What if we choose most-disjoint among policy compliant paths?


Cost of policy compliance26 l.jpg
Cost of Policy Compliance

Percentage of ASes transiently disconnected

22% - BGP

Zero - R-BGP


Cost of policy compliance27 l.jpg
Cost of Policy Compliance

Percentage of ASes transiently disconnected

22% - BGP

1.4% - R-BGP: policy compliant

Zero - R-BGP

Policy compliant failover paths may be sufficient


Multiple link failure results l.jpg
Multiple Link Failure Results

  • All proofs are for single link failure

  • Randomly choose a second link

X

Destination


Slide29 l.jpg

Multiple Link Failure Results

Percentage of ASes transiently disconnected

22% - BGP

1.4% - R-BGP: policy compliant

0% - R-BGP

Multiple link failures are unlikely to interact


Worst case scenario l.jpg
Worst Case Scenario

  • Fail link on current path

  • Fail link on corresponding failover path

X

Hari

X

Destination


Multiple link failure results31 l.jpg
Multiple Link Failure Results

Percentage of ASes transiently disconnected

33% - BGP


Multiple link failure results32 l.jpg
Multiple Link Failure Results

Percentage of ASes transiently disconnected

33% - BGP

12% - R-BGP: policy compliant


Worst case scenario33 l.jpg
Worst case Scenario

Percentage of ASes transiently disconnected

33% - BGP

12% - R-BGP: policy compliant

7% - R-BGP

Eliminates 80% of disconnection even in the worst case of link failures on both current and failover


Conclusion l.jpg
Conclusion

  • BGP loses connectivity even when the physical network is connected

  • R-BGP uses a few failover paths to ensure forwarding works throughout convergence

    • Guarantees no packet loss

    • Just like BGP, one path per neighbor

    • Reduces disconnected ASes from 22% to zero

Working with Cisco on prototype feasibility



Multiple link failure results36 l.jpg
Multiple Link Failure Results

Joe forwards on second best path, not most disjoint

Joe

X

Packets on Bob’s failover path follow Joe’s second best path to the destination

Bob

X

Destination


Practical l.jpg
Practical

  • Requires only a few modifications to BGP

    • Currently working with Cisco to prototype

  • Advertises only one path per neighbor, just like BGP

  • Convergence time 1/3 that of BGP


Challenge 1 a few strategic failover paths l.jpg
Challenge 1: A few Strategic Failover Paths

Solution 1: Most Disjoint Path

Theorem 1: If any AS using the down link will have a path after convergence, then R-BGP guarantees that the AS immediately above the down link knows a failover path when the link fails.


No available loop free path l.jpg

Link Down

No Available Loop Free Path

Hari->MIT

Link is down

Hari->MIT

Link is down

AT&T can immediately move to Sprint path

AT&T

Sprint

Peter

Peter is left without any usable path

Peter continues to use the old path

Hari

Moves away from old path only after receiving advertisement from AT&T

Mechanism 3: If no path without the down link is available, continue to use the old path until such a path becomes available or sure that no such path will become available.

MIT


Putting it all together l.jpg

Mechanism 1

Mechanism 2

Mechanism 3

Ensure the failover AS knows an alternate path

Allow ASes to recognize safe paths that are guaranteed to be loop-free

Continue to forward along the old path to the failover AS until a safe path is learned

Key Idea: Disconnect forwarding from routing

Ensure that forwarding continues to work regardless of what happens at the routing layer

Putting it all together


Slide41 l.jpg

Final Theorem :

When a link fails:

If an AS will eventually have a path, it will see no BGP caused packet loss


Slide42 l.jpg

Final Theorem :When a single link fails, all ASs that will eventually learn a valley-free path to the destination are guaranteed no BGP-caused packet loss during convergence

A path is valley-free if no AS transits between two non-customers ASs


Little additional overhead l.jpg
Little Additional Overhead

22K

20K

Less than 10% more updates network wide


Faster convergence times l.jpg
Faster Convergence Times

13

4

Convergence times are 1/3 of those with BGP


Compared schemes l.jpg
Compared Schemes

  • Current BGP

  • Most-disjoint failover path

  • Most-disjoint policy-compliant failover path


Goal staying connected l.jpg
Goal: Staying Connected

If an ASes link to destination fails

and

After convergence the AS will have a path to destination

X

The AS should know a failover path to the destination when the link fails

Destination


Goal staying connected47 l.jpg
Goal: Staying Connected

the AS immediately upstream of a down link can protect all traffic

Without a failover path, all ASes see disconnection

X

Destination

The AS upstream of the down link must know a failover path when the link fails


Goal staying connected48 l.jpg
Goal: Staying Connected

AS immediately upstream of a down link can protect all traffic

If this AS has no failover path, all ASes using link see disconnection

X

The AS upstream of the down link must know a failover path when the link fails

Destination


Challenge 2 consistency during convergence l.jpg
Challenge 2: Consistency during convergence

Routing Loops & ASes unaware of available paths

Inconsistency across ASes

Strong Consistency

Expensive

Balance between providing enough consistency while maintaining BGPs scalability


Challenge 1 which failover paths to advertise l.jpg
Challenge 1: Which Failover Paths to Advertise

AS immediately upstream of a down link can protect all traffic

LOSS!

If this AS has no failover path, all ASes using link see disconnection

X

The AS upstream of the down link must know a failover path when the link fails

Destination


Slide51 l.jpg

Division of Labor

  • If AS upstream of down link doesn’t know failover path everyone sees loss

  • If the AS knows a failover path no one see loss

  • Each AS responsible for immediately downstream link

X

Which path does the AS far upstream offer to which neighbors?

Destination


Impossible is nothing l.jpg
Impossible is nothing

AT&T

Sprint

  • Assign each AS responsibility for downstream link

  • If AS above down link doesn’t know path everyone sees loss

Peter

  • If he knows a path no one sees loss

Hari

MIT

  • The real question is which path upstream guy offers


Impossible is nothing53 l.jpg
Impossible is nothing

AT&T

Sprint

  • Assign each AS responsibility for downstream link

  • If AS above down link doesn’t know path everyone sees loss

Peter

  • If he knows a path no one sees loss

Hari

MIT

  • The real question is which path upstream guy offers


Immediately upstream must know waaayyy upstream must advertise l.jpg
immediately upstream must know, waaayyy upstream must advertise

Assigning responsibility

  • If AS above down link doesn’t know path everyone sees loss

  • If the guy knows a path you’re fine

  • Assign responsibility to that guy

  • The real question is which path upstream guy offers


The challenges l.jpg
The Challenges advertise

Challenge 1: Which Failover Paths to Advertise

Ensure continuous connectivity without flooding the network with failover paths

Challenge 2: Consistency During Convergence

A large scale distributed consistency problem leaves ASes with loops and path loss


Challenge 1 which failover paths to advertise56 l.jpg
Challenge 1: advertise Which Failover Paths to Advertise

  • Can we do this while advertising only one path per neighbor just like BGP?

  • Any path currently advertised to the next-hop neighbor is useless

Constraint: An AS advertises only one failover path, and only to its next-hop neighbor


Challenge 1 which failover paths to advertise57 l.jpg
Challenge 1: advertise Which Failover Paths to Advertise

X

Destination


Challenge 1 which failover paths to advertise58 l.jpg
Challenge 1: advertise Which Failover Paths to Advertise

AS immediately upstream of a down link can protect all traffic

LOSS!

If this AS has no failover path, all ASes using link see disconnection

X

The AS upstream of the down link must know a failover path when the link fails

Destination


Challenge 1 which failover paths to advertise59 l.jpg
Challenge 1: advertise Which Failover Paths to Advertise

Solution 1: Most Disjoint Paths

Each AS advertises to its next-hop AS:

a failover path which is the path most disjoint from its primary

Theorem 1:

When a link fails and there is some path:

The AS immediately upstream of the down link knows a failover path


Challenge 2 inconsistency during convergence l.jpg

Link Down advertise

Challenge 2: Inconsistency During Convergence

Hari withdraws path from AT&T and Peter

AT&T

Sprint

Peter

AT&T and Peter stop sending packets to Hari

Hari

MIT


Challenge 2 inconsistency during convergence61 l.jpg

Link Down advertise

Challenge 2: Inconsistency During Convergence

Hari withdraws path from AT&T and Peter

LOSS!

AT&T

Sprint

Peter

AT&T and Peter stop sending packets to Hari

Peter will choose to route through AT&T

Hari

AT&T may choose to route through Peter

MIT

Routing Loop Created!


Challenge 2 inconsistency during convergence62 l.jpg

Link Down advertise

Challenge 2: Inconsistency During Convergence

Solution 2:Root Cause Information

AT&T

Sprint

Hari includes Root Cause Information with the withdrawl

Peter

Hari->MIT

Hari->MIT

Link down

AT&T recognizes the

Peter->Hari->MIT path is

no longer available

Hari

It routes through Sprint instead

MIT

Routing Loop Avoided!


Challenge 2 inconsistency during convergence63 l.jpg
Challenge 2: advertise Inconsistency During Convergence

Solution 2:Root Cause Information

  • Include in each update Root Cause Information indicating the down link

  • Do not use paths that include the down link

Theorem 2 :

When a link fails:

If an AS will eventually have a path, it will see no BGP caused packet loss


How do failover paths solve the problem64 l.jpg
How do failover paths solve the problem? advertise

  • BGP often provides an alternate path only after the link fails

  • R-BGP uses pre-computed failover paths to ensure all ASes have an alternate path before the link fails


Single link failure results65 l.jpg
Single Link Failure Results advertise

Percentage of ASes transiently disconnected

22% - BGP

Zero - R-BGP


Advertise failover path to which neighbor l.jpg
Advertise failover path to which neighbor? advertise

BGP Rule:Advertise only best path (used path)

Advertised Path always contains downstream AS

BGP Rule:Do not use paths with your AS

Insight:

Any path advertised to the downstream neighbor can’t be used by that neighbor


Multiple link failure results67 l.jpg
Multiple Link Failure Results advertise

Percentage of ASes transiently disconnected

33% - BGP


Multiple link failure results68 l.jpg
Multiple Link Failure Results advertise

Percentage of ASes transiently disconnected

33% - BGP

12% - R-BGP: policy compliant


Multiple link failure results69 l.jpg
Multiple Link Failure Results advertise

Percentage of ASes transiently disconnected

33% - BGP

12% - R-BGP: policy compliant

7% - R-BGP

Eliminates 80% of disconnectivity even in the worst case of link failures on both primary and failover


Multiple link failure results70 l.jpg
Multiple Link Failure Results advertise

Percentage of ASes transiently disconnected

33% - BGP

12% - R-BGP: policy compliant

7% - R-BGP

Eliminates 80% of disconnectivity even in the worst case of link failures on both primary and failover


ad