advanced bgp convergence techniques n.
Download
Skip this Video
Download Presentation
Advanced BGP Convergence Techniques

Loading in 2 Seconds...

play fullscreen
1 / 29

Advanced BGP Convergence Techniques - PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on

Apricot 2006. Advanced BGP Convergence Techniques. Pradosh Mohapatra pmohapat@cisco.com. Agenda. Terminology Convergence Scenarios Core Link Failure Edge Node Failure Edge Link Failure. Basic Terminology. Prefix – A route that is learnt by routing protocols. 12.0.0.0/16

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Advanced BGP Convergence Techniques' - rama-powell


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
advanced bgp convergence techniques

Apricot 2006

Advanced BGP Convergence Techniques

Pradosh Mohapatra

pmohapat@cisco.com

agenda
Agenda
  • Terminology
  • Convergence Scenarios
    • Core Link Failure
    • Edge Node Failure
    • Edge Link Failure
basic terminology
Basic Terminology
  • Prefix – A route that is learnt by routing protocols.
    • 12.0.0.0/16
  • Pathlist – A list of Next Hop paths learnt by routing protocols.
    • 12.0.0.0/16
      • Via POS1/0
      • Via GE2/0, 5.5.5.5
    • 10.0.0.0/16
      • Via 5.5.5.5

Non-recursive

Recursive

(Depends on the resolution of the next-hop)

forwarding table structure

BGP

PL

IGP

PL

IGP

PL

path 1

path 1

path 1

path 2

path 2

path 2

Forwarding Table Structure

Intf1/NH1

Intf2/NH2

Intf3/NH3

Intf4/NH4

salient features
Salient Features
  • Pathlist Sharing:
    • All BGP prefixes that have the same set of paths point to a single pathlist.
  • Hierarchical Structure:
    • BGP prefixes (recursive) point to IGP prefixes (non-recursive).
multipath bgp multipath igp igp path goes down

BGP

PL

IGP

PL

IGP

PL

path 1

path 1

path 1

path 2

path 2

path 2

Multipath BGP, Multipath IGP, IGP path goes down
  • Initial organization before failure of IGP path 1.
  • Link to Path 1 goes down.
multipath bgp multipath igp igp path goes down1

IGP

PL

BGP

PL

IGP

PL

path 1

path 1

path 2

path 2

path 2

Multipath BGP, Multipath IGP, IGP path goes down
  • IGP pathlist modified after Path 1 failure.
  • BGP Convergence = IGP Convergence.
multipath bgp multipath igp igp prefix is deleted

IGP

PL

BGP

PL

IGP

PL

Path 1

path 1

path 1

Path 2

path 2

path 2

Multipath BGP, Multipath IGP, IGP prefix is deleted
  • Initial organization before deletion of IGP prefix 1.
  • IGP Prefix 1 gets deleted.
  • Fix-up BGP PL to point to the second path.
multipath bgp multipath igp igp prefix is deleted1

BGP

LI

IGP

LI

path 1

path 1

path 2

Multipath BGP, Multipath IGP, IGP prefix is deleted
  • BGP pathlist modified after deletion of IGP prefix 1.
  • BGP Convergence = IGP Convergence.
multipath bgp multipath igp igp path modified

BGP

LI

IGP

LI

IGP

LI

path 1

path 1

path 1

path 2

path 2

path 2

Multipath BGP, Multipath IGP, IGP path modified
  • Initial organization before modification of IGP Path 1.
  • IGP Path 1 gets modified.
  • BGP Convergence = IGP Convergence
conclusion
Conclusion
  • In case of core link failure:
    • Sub-second convergence.
    • BGP Prefix-independent & In-place modification of the forwarding table.
    • Make-before-break solution
edge node failure
Edge node failure

PE2

  • PE1 has selected PE2 as bestpath and has installed that path only in forwarding table.
  • What PE1 needs upon PE2’s failure is fast detection of Unreachability.
  • Unreachability status requires all the IGP neighbors to have detected the failure and have sent their LSP’s to PE1.
  • PE1 now needs to point to PE3.

P2

PE1

P1

PE3

bgp next hop tracking
BGP Next-Hop Tracking
  • Event-driven reaction to BGP next-hop changes
    • BGP communicates its next-hops to RIB.
    • If RIB gets a modify/delete/add of an entry covering these next-hops, it notifies BGP.
    • BGP runs bestpath algorithm.
  • Stability requirement
    • Fast reaction to isolated events
    • Delayed reaction to too frequent events
  • Classification of Events
    • Next-hop unreachable is critical: React faster.
    • Metric Change is non-critical: React slower.
bgp nht implementation highlights
BGP NHT – Implementation highlights
  • RIB implements dampening algorithm
    • Next-hops flapping too often are dampened.
  • RIB classifies next-hop changes as critical or non-critical.
    • Critical events are sent immediately to BGP. Non-critical events are delayed up-to 3 seconds.
  • BGP has an initial delay before it reacts to next-hop changes.
    • Default: 5s. Configurable.
    • Capture as many changes as
    • possible within the initial delay before running bestpath.

router bgp 1

bgp nexthop-trigger-delay 1

bgp nht example
BGP NHT - example

RIB sends 1st NH notification

  • T1: Link failure triggering IGP convergence.
  • T2: First next-hop notification to BGP.
  • T3: BGP reads the next-hop updates and starts initial delay timer.
  • T4: Initial delay period expires. BGP does Nhscan and bestpath change (a function of the table size).

Lk Dn

NHScan + BestPath

IGP CV

T4

T3

T1

T2

bgp nht
BGP NHT
  • Principle: The first SPF must declare PE2 as unreachable
    • We want to make sure that if PE2 fails, then all its neighbors have had the time to detect the failure, originate their LSP and have flooded it to PE1
    • We want to make sure that when PE1 starts its SPF, all PE2’s neighbors LSP’s are in PE1’s database
  • Dependency
    • fast failure detection
    • fast flooding
    • SPF Initial-wait conservative enough
bgp nht typical timing
BGP NHT – Typical Timing
  • 0: PE2 failure
  • 50ms: PE1 receives the 1st LSP and schedules SPF at T=200ms
    • the other LSP’s will have all the time to arrive in the meantime
  • 200ms: PE1 starts SPF
    • we account a duration of 30ms but with iSPF it will be ~1ms
  • 232ms: PE1 deletes PE2’s loopback and schedules BGP NHT at T=1232ms
    • there are few prefixes to modify as this is a node failure
  • 1232ms: PE1 runs BGP NHT
    • table scan: ~6us per entry: if PE1 has 20k routes: ~ 120ms
    • RIB modify: ~140us per entry: if PE1 has 5k routes from PE2, it takes ~ 700ms
    • 70ms distribution download
  • 2122ms: PE1/LC has finished modifying the BGP entries to use nh=PE3. We still need to resolve them
    • resolution starts [0, 1000ms]
    • resolution lasts: ~ 100us per entry
  • 3622ms: Convergence is finished in the worst case
conclusion edge node failure
Conclusion – Edge node failure
  • Sub-5s is achievable
    • analyzed scenario leads to WC ~ 3500ms
  • Sub-Second is challenging
  • Ongoing work to improve this further:
    • Backup path
backup path

BGP

PL

IGP

PL

IGP

PL

Intf1/NH1

path 1

path 1

path 1

Intf2/NH2

backup path

path 2

path 2

Backup Path

Intf3/NH3

  • No Multipath. Prefix always points to Path 1.
  • Reroute triggered per IGP prefix: fix-up Path 1 to
  • point to the backup path.

Intf4/NH4

backup path contd
Backup Path – Contd.
  • Problem:
    • How to know the backup path? BGP advertises only one path.
    • Peering with RRs: RR sends only the bestpath it computes.
  • Solution:
    • Add-path draft.
add path
ADD-PATH
  • Mechanism that allows the advertisement of multiple paths for the same prefix without the new paths implicitly replacing any previous ones.
  • Add a path identifier to the encoding to distinguish between different prefixes.

+-----------------------------+

| Path Identifier (4 octets) |

+-----------------------------+

| Length (1 octet) |

+-----------------------------+

| Label (3 octets) |

+-----------------------------+

...........................................

+-----------------------------+

| Prefix (variable) |

+-----------------------------+

  • +----------------------+
  • | Path Identifier (4 octets) |
  • +----------------------+
  • | Length (1 octet) |
  • +----------------------+
  • | Prefix (variable) |
  • +----------------------+
add path operation
ADD-PATH - Operation
  • New capability: Add-path
  • Advertisement of the capability indicates ability to receive multiple paths for all negotiated AFI/SAFI.
  • Advertisement of specific AFI/SAFI information in the capability indicates the intent to send multiple paths.
  • Only in these cases must the new encoding be used.
  • Concerns: Cost of multiple paths advertisement outweigh the benefits on convergence?
example pe ce link failure
Example: PE-CE Link Failure

CE2

RRB1

RRA1

PE2

VPN1 HQ

CE1

PE1

RRB2

RRA2

CE3

PE3

VPN1 site

edge link failure scenarios
Edge Link Failure scenarios
  • Edge Link Failure: Next-hop on the peering link
    • Convergence behavior same as the last two scenarios.
  • Edge Link Failure: Next-hop-self
    • Default behavior for L3VPN
    • In-place modification and/or BGP NHT do not help.
    • Advanced BGP signaling required.