slide1
Download
Skip this Video
Download Presentation
R-BGP: Staying Connected in a Connected World

Loading in 2 Seconds...

play fullscreen
1 / 70

BGP Convergence Causes Packet Loss - PowerPoint PPT Presentation


  • 298 Views
  • Uploaded on

R-BGP: Staying Connected in a Connected World. Nate Kushman Srikanth Kandula, Dina Katabi, and Bruce Maggs. The Problem:. BGP Convergence Causes Packet Loss. When a route changes, up to 30% packet loss for more than 2 minutes [Labovitz00]

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'BGP Convergence Causes Packet Loss' - Gabriel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

R-BGP: Staying Connected in a Connected World

Nate Kushman

Srikanth Kandula, Dina Katabi,and Bruce Maggs

bgp convergence causes packet loss

The Problem:

BGP Convergence Causes Packet Loss
  • When a route changes, up to 30% packet loss for more than 2 minutes [Labovitz00]
  • Even domains dual homed to tier 1 providers see many loss bursts on a route change [Wang06]
  • Even popular prefixes experience losses due to BGP convergence [Wang05]
  • 50% of VoIP disruptions are highly correlated with BGP updates [Kushman06]
slide3

Links, Links Everywhere But Not a Path to Forward!

Goal:

Ensure ASes stay connected as long as the physical network is connected

we focus on forwarding
We Focus on Forwarding
  • Don’t worry about BGP’s routing
  • Ensure forwarding works by forwarding packets on pre-computed failover paths
why focus on forwarding
Why Focus on Forwarding?
  • Convergence is unlikely to be fast enough
  • Strict timing constraints limit innovation
slide6

Our Contribution

Guarantee:

No BGP caused packet loss

Low Overhead:

Just like BGP, each AS advertises at most one path to each neighbor

On link failure, we reduce disconnected ASes from 22% to Zero

what causes transient disconnection
What Causes Transient Disconnection?

AT&T

Sprint

Peter

All of Hari’s providers use him to get to MIT

BGP Rule:

An AS advertises only its current forwarding path

Hari

 Nobody offers Hari an alternate path

MIT

what causes transient disconnection8
What Causes Transient Disconnection?

AT&T

Sprint

Peter

Hari knows no path to MIT

Hari drops Peter and AT&T’s packets in addition to his own

Hari

LOSS!

X

Link Down

MIT

what causes transient disconnection9
What Causes Transient Disconnection?

Hari withdraws path

AT&T

Sprint

Peter

AT&T and Peter move to alternate paths

Hari

X

MIT

what causes transient disconnection10
What Causes Transient Disconnection?

Hari withdraws path

AT&T

Sprint

Peter

AT&T and Peter move to alternate paths

AT&T announces the Sprint path to Hari

 Traffic flows

Hari

X

Transient Packet Loss

MIT

how do failover paths solve the problem
How do failover paths solve the problem?

BGP:

An AS advertises only its current path. It advertises an alternate only after a link fails

R-BGP:

Advertises an alternate, i.e. failover path, before a link fails

failover paths
Failover Paths

AT&T advertises to Hari “AT&T Sprint  MIT” as a failover path

Peter

AT&T

Sprint

Link Fails  Hari immediately sends traffic on failover path

Hari

No Loss !

X

MIT

two challenges
Two Challenges

Challenge 1:

Minimize the number of failover paths, while ensuring an AS always has a usable path

Challenge 2:

Transition from usable path to converged path without creating forwarding loops

slide14

Challenge 1: Minimize number of failover paths

Claim: Just like BGP, advertise one path per neighbor, either current or failover

Current path

Current path

AT&T

Peter

Sprint

Current path

Failover Path

Hari

Insight: Replace path advertised to downstream AS with a failover path

MIT

which failover path should it advertise
Which failover path should it advertise?

AT&T

John

x

Bob

Joe

Most Disjoint Path

Dest

Lemma:Advertising Most Disjoint is equivalent to advertising all paths.

slide16

Challenge 1: Minimize number of failover paths

R-BGP Rule:

Advertise to downstream AS as a failover path the path most disjoint from the current path

When a link fails:

Theorem 1:

The AS upstream of down link knows a failover path if it will know a path at convergence

challenge 2 transition without loops
Challenge 2: Transition without loops

AT&T

Hari withdraws path

Sprint

Peter

Hari

X

MIT

challenge 2 transition without loops18
Challenge 2: Transition without loops

LOOP!

AT&T

Hari withdraws path

Sprint

Peter

Peter may choose to route through AT&T

AT&T may choose to route through Peter

Hari

X

Forwarding Loop!

MIT

challenge 2 transition without loops19
Challenge 2: Transition without loops

Solution 2:Root Cause Information

Hari includes Root Cause Information with the withdrawal

AT&T

Sprint

Peter

AT&T recognizes the Peter->Hari->MIT path is down

Hari->MIT

Hari->MIT

Link down

It routes through Sprint instead

Hari

X

Theorem 2 :

No forwarding loops will form

MIT

r bgp
R-BGP

Solution 1: Advertise most disjoint path to downstream AS

Solution 2: Include Root Cause Information

Final Theorem:

No AS will see BGP caused packet loss if it will have a path at convergence

setup
Setup
  • AS-Level Simulation over the full Internet
  • AS-graph with 24,142 ASes from Routeviews BGP Data
  • Use inference algorithm to annotate links with customer-provider or peer relationships
single link failure results
Single Link Failure Results
  • Dual-homed AS loses one link
  • Find percentage of ASs that see transient disconnection to the destination
  • Run for all dual homed ASes

X

Destination

single link failure results24
Single Link Failure Results

Percentage of ASes transiently disconnected

22% - BGP

Zero - R-BGP

R-BGP Eliminates all Transient Disconnection

cost of policy compliance
Cost of Policy Compliance
  • Most disjoint path may not be compliant with BGP routing policies
  • Still an AS may want to advertise it:
    • To protect its own traffic
    • Because it is temporary

What if we choose most-disjoint among policy compliant paths?

cost of policy compliance26
Cost of Policy Compliance

Percentage of ASes transiently disconnected

22% - BGP

Zero - R-BGP

cost of policy compliance27
Cost of Policy Compliance

Percentage of ASes transiently disconnected

22% - BGP

1.4% - R-BGP: policy compliant

Zero - R-BGP

Policy compliant failover paths may be sufficient

multiple link failure results
Multiple Link Failure Results
  • All proofs are for single link failure
  • Randomly choose a second link

X

Destination

slide29

Multiple Link Failure Results

Percentage of ASes transiently disconnected

22% - BGP

1.4% - R-BGP: policy compliant

0% - R-BGP

Multiple link failures are unlikely to interact

worst case scenario
Worst Case Scenario
  • Fail link on current path
  • Fail link on corresponding failover path

X

Hari

X

Destination

multiple link failure results31
Multiple Link Failure Results

Percentage of ASes transiently disconnected

33% - BGP

multiple link failure results32
Multiple Link Failure Results

Percentage of ASes transiently disconnected

33% - BGP

12% - R-BGP: policy compliant

worst case scenario33
Worst case Scenario

Percentage of ASes transiently disconnected

33% - BGP

12% - R-BGP: policy compliant

7% - R-BGP

Eliminates 80% of disconnection even in the worst case of link failures on both current and failover

conclusion
Conclusion
  • BGP loses connectivity even when the physical network is connected
  • R-BGP uses a few failover paths to ensure forwarding works throughout convergence
    • Guarantees no packet loss
    • Just like BGP, one path per neighbor
    • Reduces disconnected ASes from 22% to zero

Working with Cisco on prototype feasibility

multiple link failure results36
Multiple Link Failure Results

Joe forwards on second best path, not most disjoint

Joe

X

Packets on Bob’s failover path follow Joe’s second best path to the destination

Bob

X

Destination

practical
Practical
  • Requires only a few modifications to BGP
    • Currently working with Cisco to prototype
  • Advertises only one path per neighbor, just like BGP
  • Convergence time 1/3 that of BGP
challenge 1 a few strategic failover paths
Challenge 1: A few Strategic Failover Paths

Solution 1: Most Disjoint Path

Theorem 1: If any AS using the down link will have a path after convergence, then R-BGP guarantees that the AS immediately above the down link knows a failover path when the link fails.

no available loop free path

Link Down

No Available Loop Free Path

Hari->MIT

Link is down

Hari->MIT

Link is down

AT&T can immediately move to Sprint path

AT&T

Sprint

Peter

Peter is left without any usable path

Peter continues to use the old path

Hari

Moves away from old path only after receiving advertisement from AT&T

Mechanism 3: If no path without the down link is available, continue to use the old path until such a path becomes available or sure that no such path will become available.

MIT

putting it all together

Mechanism 1

Mechanism 2

Mechanism 3

Ensure the failover AS knows an alternate path

Allow ASes to recognize safe paths that are guaranteed to be loop-free

Continue to forward along the old path to the failover AS until a safe path is learned

Key Idea: Disconnect forwarding from routing

Ensure that forwarding continues to work regardless of what happens at the routing layer

Putting it all together
slide41

Final Theorem :

When a link fails:

If an AS will eventually have a path, it will see no BGP caused packet loss

slide42

Final Theorem :When a single link fails, all ASs that will eventually learn a valley-free path to the destination are guaranteed no BGP-caused packet loss during convergence

A path is valley-free if no AS transits between two non-customers ASs

little additional overhead
Little Additional Overhead

22K

20K

Less than 10% more updates network wide

faster convergence times
Faster Convergence Times

13

4

Convergence times are 1/3 of those with BGP

compared schemes
Compared Schemes
  • Current BGP
  • Most-disjoint failover path
  • Most-disjoint policy-compliant failover path
goal staying connected
Goal: Staying Connected

If an ASes link to destination fails

and

After convergence the AS will have a path to destination

X

The AS should know a failover path to the destination when the link fails

Destination

goal staying connected47
Goal: Staying Connected

the AS immediately upstream of a down link can protect all traffic

Without a failover path, all ASes see disconnection

X

Destination

The AS upstream of the down link must know a failover path when the link fails

goal staying connected48
Goal: Staying Connected

AS immediately upstream of a down link can protect all traffic

If this AS has no failover path, all ASes using link see disconnection

X

The AS upstream of the down link must know a failover path when the link fails

Destination

challenge 2 consistency during convergence
Challenge 2: Consistency during convergence

Routing Loops & ASes unaware of available paths

Inconsistency across ASes

Strong Consistency

Expensive

Balance between providing enough consistency while maintaining BGPs scalability

challenge 1 which failover paths to advertise
Challenge 1: Which Failover Paths to Advertise

AS immediately upstream of a down link can protect all traffic

LOSS!

If this AS has no failover path, all ASes using link see disconnection

X

The AS upstream of the down link must know a failover path when the link fails

Destination

slide51

Division of Labor

  • If AS upstream of down link doesn’t know failover path everyone sees loss
  • If the AS knows a failover path no one see loss
  • Each AS responsible for immediately downstream link

X

Which path does the AS far upstream offer to which neighbors?

Destination

impossible is nothing
Impossible is nothing

AT&T

Sprint

  • Assign each AS responsibility for downstream link
  • If AS above down link doesn’t know path everyone sees loss

Peter

  • If he knows a path no one sees loss

Hari

MIT

  • The real question is which path upstream guy offers
impossible is nothing53
Impossible is nothing

AT&T

Sprint

  • Assign each AS responsibility for downstream link
  • If AS above down link doesn’t know path everyone sees loss

Peter

  • If he knows a path no one sees loss

Hari

MIT

  • The real question is which path upstream guy offers
immediately upstream must know waaayyy upstream must advertise
immediately upstream must know, waaayyy upstream must advertise

Assigning responsibility

  • If AS above down link doesn’t know path everyone sees loss
  • If the guy knows a path you’re fine
  • Assign responsibility to that guy
  • The real question is which path upstream guy offers
the challenges
The Challenges

Challenge 1: Which Failover Paths to Advertise

Ensure continuous connectivity without flooding the network with failover paths

Challenge 2: Consistency During Convergence

A large scale distributed consistency problem leaves ASes with loops and path loss

challenge 1 which failover paths to advertise56
Challenge 1: Which Failover Paths to Advertise
  • Can we do this while advertising only one path per neighbor just like BGP?
  • Any path currently advertised to the next-hop neighbor is useless

Constraint: An AS advertises only one failover path, and only to its next-hop neighbor

challenge 1 which failover paths to advertise58
Challenge 1: Which Failover Paths to Advertise

AS immediately upstream of a down link can protect all traffic

LOSS!

If this AS has no failover path, all ASes using link see disconnection

X

The AS upstream of the down link must know a failover path when the link fails

Destination

challenge 1 which failover paths to advertise59
Challenge 1: Which Failover Paths to Advertise

Solution 1: Most Disjoint Paths

Each AS advertises to its next-hop AS:

a failover path which is the path most disjoint from its primary

Theorem 1:

When a link fails and there is some path:

The AS immediately upstream of the down link knows a failover path

challenge 2 inconsistency during convergence

Link Down

Challenge 2: Inconsistency During Convergence

Hari withdraws path from AT&T and Peter

AT&T

Sprint

Peter

AT&T and Peter stop sending packets to Hari

Hari

MIT

challenge 2 inconsistency during convergence61

Link Down

Challenge 2: Inconsistency During Convergence

Hari withdraws path from AT&T and Peter

LOSS!

AT&T

Sprint

Peter

AT&T and Peter stop sending packets to Hari

Peter will choose to route through AT&T

Hari

AT&T may choose to route through Peter

MIT

Routing Loop Created!

challenge 2 inconsistency during convergence62

Link Down

Challenge 2: Inconsistency During Convergence

Solution 2:Root Cause Information

AT&T

Sprint

Hari includes Root Cause Information with the withdrawl

Peter

Hari->MIT

Hari->MIT

Link down

AT&T recognizes the

Peter->Hari->MIT path is

no longer available

Hari

It routes through Sprint instead

MIT

Routing Loop Avoided!

challenge 2 inconsistency during convergence63
Challenge 2: Inconsistency During Convergence

Solution 2:Root Cause Information

  • Include in each update Root Cause Information indicating the down link
  • Do not use paths that include the down link

Theorem 2 :

When a link fails:

If an AS will eventually have a path, it will see no BGP caused packet loss

how do failover paths solve the problem64
How do failover paths solve the problem?
  • BGP often provides an alternate path only after the link fails
  • R-BGP uses pre-computed failover paths to ensure all ASes have an alternate path before the link fails
single link failure results65
Single Link Failure Results

Percentage of ASes transiently disconnected

22% - BGP

Zero - R-BGP

advertise failover path to which neighbor
Advertise failover path to which neighbor?

BGP Rule:Advertise only best path (used path)

Advertised Path always contains downstream AS

BGP Rule:Do not use paths with your AS

Insight:

Any path advertised to the downstream neighbor can’t be used by that neighbor

multiple link failure results67
Multiple Link Failure Results

Percentage of ASes transiently disconnected

33% - BGP

multiple link failure results68
Multiple Link Failure Results

Percentage of ASes transiently disconnected

33% - BGP

12% - R-BGP: policy compliant

multiple link failure results69
Multiple Link Failure Results

Percentage of ASes transiently disconnected

33% - BGP

12% - R-BGP: policy compliant

7% - R-BGP

Eliminates 80% of disconnectivity even in the worst case of link failures on both primary and failover

multiple link failure results70
Multiple Link Failure Results

Percentage of ASes transiently disconnected

33% - BGP

12% - R-BGP: policy compliant

7% - R-BGP

Eliminates 80% of disconnectivity even in the worst case of link failures on both primary and failover

ad