1 / 29

Advanced BGP Convergence Techniques

Apricot 2006. Advanced BGP Convergence Techniques. Pradosh Mohapatra pmohapat@cisco.com. Agenda. Terminology Convergence Scenarios Core Link Failure Edge Node Failure Edge Link Failure. Basic Terminology. Prefix – A route that is learnt by routing protocols. 12.0.0.0/16

rama-powell
Download Presentation

Advanced BGP Convergence Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Apricot 2006 Advanced BGP Convergence Techniques Pradosh Mohapatra pmohapat@cisco.com

  2. Agenda • Terminology • Convergence Scenarios • Core Link Failure • Edge Node Failure • Edge Link Failure

  3. Basic Terminology • Prefix – A route that is learnt by routing protocols. • 12.0.0.0/16 • Pathlist – A list of Next Hop paths learnt by routing protocols. • 12.0.0.0/16 • Via POS1/0 • Via GE2/0, 5.5.5.5 • 10.0.0.0/16 • Via 5.5.5.5 Non-recursive Recursive (Depends on the resolution of the next-hop)

  4. BGP PL IGP PL IGP PL path 1 path 1 path 1 path 2 path 2 path 2 Forwarding Table Structure Intf1/NH1 Intf2/NH2 Intf3/NH3 Intf4/NH4

  5. Salient Features • Pathlist Sharing: • All BGP prefixes that have the same set of paths point to a single pathlist. • Hierarchical Structure: • BGP prefixes (recursive) point to IGP prefixes (non-recursive).

  6. Core Link Failure 6 6 6

  7. BGP PL IGP PL IGP PL path 1 path 1 path 1 path 2 path 2 path 2 Multipath BGP, Multipath IGP, IGP path goes down • Initial organization before failure of IGP path 1. • Link to Path 1 goes down.

  8. IGP PL BGP PL IGP PL path 1 path 1 path 2 path 2 path 2 Multipath BGP, Multipath IGP, IGP path goes down • IGP pathlist modified after Path 1 failure. • BGP Convergence = IGP Convergence.

  9. IGP PL BGP PL IGP PL Path 1 path 1 path 1 Path 2 path 2 path 2 Multipath BGP, Multipath IGP, IGP prefix is deleted • Initial organization before deletion of IGP prefix 1. • IGP Prefix 1 gets deleted. • Fix-up BGP PL to point to the second path.

  10. BGP LI IGP LI path 1 path 1 path 2 Multipath BGP, Multipath IGP, IGP prefix is deleted • BGP pathlist modified after deletion of IGP prefix 1. • BGP Convergence = IGP Convergence.

  11. BGP LI IGP LI IGP LI path 1 path 1 path 1 path 2 path 2 path 2 Multipath BGP, Multipath IGP, IGP path modified • Initial organization before modification of IGP Path 1. • IGP Path 1 gets modified. • BGP Convergence = IGP Convergence

  12. Conclusion • In case of core link failure: • Sub-second convergence. • BGP Prefix-independent & In-place modification of the forwarding table. • Make-before-break solution

  13. Edge Node Failure 13 13 13

  14. Edge node failure PE2 • PE1 has selected PE2 as bestpath and has installed that path only in forwarding table. • What PE1 needs upon PE2’s failure is fast detection of Unreachability. • Unreachability status requires all the IGP neighbors to have detected the failure and have sent their LSP’s to PE1. • PE1 now needs to point to PE3. P2 PE1 P1 PE3

  15. BGP Next-Hop Tracking • Event-driven reaction to BGP next-hop changes • BGP communicates its next-hops to RIB. • If RIB gets a modify/delete/add of an entry covering these next-hops, it notifies BGP. • BGP runs bestpath algorithm. • Stability requirement • Fast reaction to isolated events • Delayed reaction to too frequent events • Classification of Events • Next-hop unreachable is critical: React faster. • Metric Change is non-critical: React slower.

  16. BGP NHT – Implementation highlights • RIB implements dampening algorithm • Next-hops flapping too often are dampened. • RIB classifies next-hop changes as critical or non-critical. • Critical events are sent immediately to BGP. Non-critical events are delayed up-to 3 seconds. • BGP has an initial delay before it reacts to next-hop changes. • Default: 5s. Configurable. • Capture as many changes as • possible within the initial delay before running bestpath. router bgp 1 bgp nexthop-trigger-delay 1

  17. BGP NHT - example RIB sends 1st NH notification • T1: Link failure triggering IGP convergence. • T2: First next-hop notification to BGP. • T3: BGP reads the next-hop updates and starts initial delay timer. • T4: Initial delay period expires. BGP does Nhscan and bestpath change (a function of the table size). Lk Dn NHScan + BestPath IGP CV T4 T3 T1 T2

  18. BGP NHT • Principle: The first SPF must declare PE2 as unreachable • We want to make sure that if PE2 fails, then all its neighbors have had the time to detect the failure, originate their LSP and have flooded it to PE1 • We want to make sure that when PE1 starts its SPF, all PE2’s neighbors LSP’s are in PE1’s database • Dependency • fast failure detection • fast flooding • SPF Initial-wait conservative enough

  19. BGP NHT – Typical Timing • 0: PE2 failure • 50ms: PE1 receives the 1st LSP and schedules SPF at T=200ms • the other LSP’s will have all the time to arrive in the meantime • 200ms: PE1 starts SPF • we account a duration of 30ms but with iSPF it will be ~1ms • 232ms: PE1 deletes PE2’s loopback and schedules BGP NHT at T=1232ms • there are few prefixes to modify as this is a node failure • 1232ms: PE1 runs BGP NHT • table scan: ~6us per entry: if PE1 has 20k routes: ~ 120ms • RIB modify: ~140us per entry: if PE1 has 5k routes from PE2, it takes ~ 700ms • 70ms distribution download • 2122ms: PE1/LC has finished modifying the BGP entries to use nh=PE3. We still need to resolve them • resolution starts [0, 1000ms] • resolution lasts: ~ 100us per entry • 3622ms: Convergence is finished in the worst case

  20. Conclusion – Edge node failure • Sub-5s is achievable • analyzed scenario leads to WC ~ 3500ms • Sub-Second is challenging • Ongoing work to improve this further: • Backup path

  21. BGP PL IGP PL IGP PL Intf1/NH1 path 1 path 1 path 1 Intf2/NH2 backup path path 2 path 2 Backup Path Intf3/NH3 • No Multipath. Prefix always points to Path 1. • Reroute triggered per IGP prefix: fix-up Path 1 to • point to the backup path. Intf4/NH4

  22. Backup Path – Contd. • Problem: • How to know the backup path? BGP advertises only one path. • Peering with RRs: RR sends only the bestpath it computes. • Solution: • Add-path draft.

  23. ADD-PATH • Mechanism that allows the advertisement of multiple paths for the same prefix without the new paths implicitly replacing any previous ones. • Add a path identifier to the encoding to distinguish between different prefixes. +-----------------------------+ | Path Identifier (4 octets) | +-----------------------------+ | Length (1 octet) | +-----------------------------+ | Label (3 octets) | +-----------------------------+ ........................................... +-----------------------------+ | Prefix (variable) | +-----------------------------+ • +----------------------+ • | Path Identifier (4 octets) | • +----------------------+ • | Length (1 octet) | • +----------------------+ • | Prefix (variable) | • +----------------------+

  24. ADD-PATH - Operation • New capability: Add-path • Advertisement of the capability indicates ability to receive multiple paths for all negotiated AFI/SAFI. • Advertisement of specific AFI/SAFI information in the capability indicates the intent to send multiple paths. • Only in these cases must the new encoding be used. • Concerns: Cost of multiple paths advertisement outweigh the benefits on convergence?

  25. Edge Link Failure 25 25 25

  26. Example: PE-CE Link Failure CE2 RRB1 RRA1 PE2 VPN1 HQ CE1 PE1 RRB2 RRA2 CE3 PE3 VPN1 site

  27. Edge Link Failure scenarios • Edge Link Failure: Next-hop on the peering link • Convergence behavior same as the last two scenarios. • Edge Link Failure: Next-hop-self • Default behavior for L3VPN • In-place modification and/or BGP NHT do not help. • Advanced BGP signaling required.

  28. Any Questions ?

More Related