1 / 25

Micro-loop Prevention Methods

Micro-loop Prevention Methods. draft-bryant-shand-lf-conv-frmwk-00.txt draft-zinin-microloop-analysis-00.txt. Outline of Talk. Convergence Strategy and Motivation Solution Taxonomy Existing Solution Space Summary. Traditional convergence strategy.

javan
Download Presentation

Micro-loop Prevention Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Micro-loop Prevention Methods draft-bryant-shand-lf-conv-frmwk-00.txt draft-zinin-microloop-analysis-00.txt

  2. Outline of Talk • Convergence Strategy and Motivation • Solution Taxonomy • Existing Solution Space • Summary

  3. Traditional convergence strategy • Switch to new as fast as you can independently • Required for failures • Strategy optimized for this case • traffic to failed element is lost • so speed is essential • Used for everything else • common method • traffic can be lost due to loops Fast-Reroute prevents traffic loss due to failure but loops can still cause loss.

  4. Micro-Loop Properties • Independent decisions can cause micro-loops. • Loops may occur between pairs of nodes or cycles of nodes. • Duration depends on relative time to update FIBs. • Implementation differences • Number of affected destinations • Propagation time • Loss due to Loop duration may be longer (an order of magnitude) than Loss during the Fast Reroute failover.

  5. Controlled convergence • Made feasible for failure case by fast reroute • Traffic is not lost so can afford to take time • Can use common method for both failure and management change events • Traditional convergence optimized for failure case without fast-reroute. We can do better… (but keep traditional as safe fall-back for single failure assumption violation.)

  6. Solution taxonomy • Controlled Information flow • Incremental cost change • Controlled Distributed Behavior • Synchronized FIB installation • Ordered FIB changes • Path locking

  7. Method Comparison

  8. Incremental cost change • A change in a link cost of x can only cause loops whose “cyclic” cost is <=x • Minimum cycle is 2 (1 in each direction) • Hence cost change of 1 can never cause a loop. • Where minimum cycle is larger, larger increments can be used. • Once cost reaches cost of alternate path no more loops possible. No Cooperation Required – But Can Take Hours

  9. Synchronized FIB swap • Network synchronized change-over at predetermined time • Signal/determine time to change • Network Synchronized Time (NTP is there) • Either Two FIBs for fast swap • Substantial hardware implications • Or FIB update “fast-enough” from change-over time. • Dependent on NTP Conceptually simple with minimal signalling – NTP dependency & implementation concerns

  10. Ordering by signalling alone • On change, tell old primary neighbors to wait for you • Wait for all neighbors as instructed, install FIB, and tell your old primary neighbors. • Assumes a single non-SRLG failure • Otherwise communication per destination is required No Estimation Required for FIB Compute/Install - Require Reliable Fast Signalling and Non-Trivial Protocol Extensions

  11. Ordered FIB changes • For any isolated link/node change • Determine “safe” ordering for FIB installation • bad news: update from edge to failure, • good news: update from change to edge • Each router computes its “rank” with respect to the change. • Delays for a number of worst-case FIB compute/install times proportional to its rank.

  12. Computing the ordering • Single Reverse SPF rooted at change node • Use old SPT to determine relevant node • For bad news:- count maximum depth of sub-tree below you • For good news:- count maximum hops to change

  13. Delay Proportional to Network Diameter • For Good News, rSPF gives necessary depth. • For Bad News, rSPF is overly pessimistic for some topologies. • Strategies to reduce unnecessary delay • Prune rSPF by only considering the branch across the failure – but still too pessimistic. • Run SPF rooted at edge nodes to correctly prune them – but doesn’t scale. • Compare rSPFs before and after failure • Avoids all micro-loops and requires single FIB install. • Delay dependent on network diameter so may be unacceptable.

  14. Signalling optimization to Reduce delay • Use actual FIB compute/install instead of worst-case • In many cases, actual delay is 0 b/c no change needed. • Signal to parents in rSPF when • Nothing to do, or • Completed FIB changes • Can change FIB when received signal from all children (or when delay expires) • Only an optimization • Loss of signals falls back to delay based

  15. SRLG Concerns • Diverse failures may require mutually incompatible ordering • Different orderings for individual destination sets may help • Need Rules to merge multiple rSPFs

  16. Ordered SPF Summary • No forwarding changes required. • No signalling required at time of change. • Complete prevention of loops for isolated node or link changes. • Requires cooperation from all routers • May delay re-convergence for tens of seconds (unless optional signalling used) • SRLGs require per destination delays and may delay re-convergence more.

  17. Path Locking Framework • Obtain a fixed convergence delay regardless of network. • Avoid ordering issue by providing transitional paths. • Handles SRLGs • Different methods to • Determine/Create transitional paths • Direct traffic to use transitional paths Standard trade-off of complexity versus coverage. • Tunnels for Transitional Paths • Safe Neighbors for Transitional Next-Hops • Marked Packets to Use Transitional Topology • U-turn Packets to Use New Topology

  18. Time-Line of Convergence • Change Discovery Time – At this point, all routers know about the change. Routers install transitional path support. • For some methods, immediately start use of self-determined transitional paths. • Use Transitional Paths Time (1 worst-case FIB compute/install later) – All routers use transitional paths, if available, and new primary next-hops otherwise. • Lock to New Topology Time (1 worst-case FIB compute/install later) – All routers use new primary next-hops. All micro-loops avoided if a transitional path always exists.

  19. Create Tunnels • Requires tunnel computation/creation at topology change • Old topology Locking • Tunnel to the upstream side of the failure • Single tunnel for all affected destinations (if link/node failure). • New topology locking • Tunnel to first unaffected router on new primary path • Tunnels provide a transitional path that can traverse non-supporting routers. • Non-supporting routers can only loop locally originated traffic.

  20. Safe Neighbors • Find a safe neighbor to use as a transitional next-hop. • Safety condition is a neighbor that is loop-free on old topology and a downstream path on new topology. • If two neighboring routers don’t have a safe neighbor, a micro-loop can form on that link. • Analysis of real topologies shows pretty good coverage. • Local micro-loops possible with non-supporting routers.

  21. Typical Coverage

  22. Packet Marking • Can mark packets to force forwarding according to a particular topology. • Topology can be new or old. • All marking starts at the Use Transitional Paths Time • If using new topology, traffic on new topology after 1+ worst-case FIB compute/install delay.

  23. U-turn Packet • Create transitional next-hop by directing U-turn packets to the new primary next-hops. • At Use Transitional Paths Time, send traffic to new primaries (potentially explicitly marked as U-turn packets). • If implicitly determined U-turn packets, doesn’t require marking. • Explicit method for signalling support of U-turns

  24. Lots of Possibilities… • What are important criteria? • Time to be converged • Affects single failure assumption • Network Stability • Ballpark requirement is 10s • Simplicity • Support for SRLGs • No additional mechanisms beyond IP (but coverage may suffer…) • Common additional mechanisms for this and IPFRR advanced methods. • Should also work for LDP

  25. Conclusions & Next Steps • Incremental Cost Change is impractical. • Synchronized FIB Swap – what is the implementation complexity? Implications of coupling NTP to routing? • Ordered SPF – long delay and poor SRLG support. Is that enough to be an issue? • Path Locking • Seem most promising • Many possibilities to get similar results Please send suggestions and comments to the list. This solution set may not be complete.

More Related