- 79 Views
- Uploaded on
- Presentation posted in: General

Advanced Networks

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Advanced Networks

1. Delayed Internet Routing Convergence

2. The Impact of Internet Policy and Topology on Delayed Routing Convergence

- How to Recover from Failure Quickly?
- Phone systems recover, failover, in milliseconds
- Internet takes an order of minutes
- Loss of Connectivity
- Packet Loss
- Latency

- Failure over on the internet not very good
- Sluggish Backup systems
- Internet has to adjust to the failure
- Path must be restored to back up

- Why does convergence take so long?
- What is the upper bound for convergence?
- What causes this delayed convergence?
- What can we do about it?

- Unexpected Interaction of:
- Protocol timers
- Router Implementation
- Policies (Safe/Unsafe)

- Distance vector algorithm has issues
- Lack of sufficient info to determine if next hop choice will cause loops

- Use of Path Vector
- Split Horizon
- Triggered updates
- Diffusion
- Timers

- Admins can implement unsafe policies
- Policies can cause route oscillations
- Routers default to Shortest Path
- Even if constrained upper-bound might be as high factorial

- Measure the convergence behavior of BGP 4
- Done for Bellman-Ford O(n3)
- Convergence in BGP is NOT much better than RIP
- Give an upper and lower bounds to convergence

- 2 year study
- 250,000 routing fault injections
- 25 Internet providers
- End to End performance measurements

- Tup: (New) Route Announcement
- Tdown: Route Withdrawal
- Tshort: Shorter Route Replaces Current
- Current Route is Withdrawn Implicitly

- Tlong: Shorter Route Replaced with longer one
- Represents a failure and failover
- Current Route is Withdrawn Implicitly

- Oscillation greater than 3 minutes
- 20% of Tlong
- 40% of Tdown

- Equivalence Latency Classes
- Tlong,Tdown
- Tshort,Tup

Average Message Per Event Type

Tup: Route Announcement

Tdown: Route Withdrawal

Tshort: Shorter Route Replacement

Tlong: Longer Route Replacement

- Why do Tlong and Tdown cause 2 times the amout of updates?
- Why do certain ISP produce more updates per event?
- Relationship between number of updates and convergence latency?

- What makes an ISP have a higher latency?

- Interesting Points
- ISP3: Japan’s National Backbone
- ISP5 Canadian ISP
- Latency NOT Dependant Geographic Distance or Network Distance (aka hop count)

- No relationship between day of the week and Latency!
- Independent of Network load and congestion

- Route Oscillation effects performance
- Drop Packets, Buffering of Packets
- Out of order delivery

- Time after ICMP echo arrived after Tup
- Simulates a failover
- 80% of test sites began returning after 30 seconds
- 100% after one minute

- IBGP ignored
- Full Mesh
- Ignore ingress and egress filters
- Exclude MinRouteAdver
- Updates messages follow FIFO ordering

- Start: 0(*R, 1R, 2R) 1(0R, *R, 2R) 2(0R, 1R, *R)

R Withdraws routes

R -> 0 W

R -> 1 W

R -> 2 W

0(-, -, *2R) 1(-, -, *2R) 2(*01R, 10R, -)

0(-, *1R, 2R) 1(*0R, -, 2R) 2(*0R, 1R, -)

- 1 and 2 receive new announcement from 0
- 0 -> 1 01R (loop)
- 0 -> 2 01R

0(-, *1R, 2R) 1(-, -, *2R) 2(01R, *1R, -)

- 0 and 2 receive new announcement from 1
- 1 -> 0 10R (loop)
- 1 -> 2 10R

0 and 1 receive new announcement from 2

2 -> 0 20R

2 -> 1 20R

0(-, -, -) 1(-, -, *20R) 2(*01R, 10R, -)

0 and 2 receive new announcement from 1

1 -> 0 12R

1 -> 2 12R

0(-, *12R, -) 1(-, -, *20R) 2(*01R, -, -) … 48 steps later

0(-, -, -) 1(-, -, -) 2(-, -, -)

- For n nodes there exist 0((n-1)!) distinct paths
- When a route is withdrawn, a new route is found of equal or increasing length
- Message count could be a bad as (n-1)O((n-1)!) until convergence
- Not really possible on the internet

- Made possible by MinRouteAdver timers
- (n-1) Rounds to convergence

- Minimum time between route advertisements
- Gives a AS time to pick a good route before announcing it
- In standard BGP, timer only applied to announcements
- Does Not apply to explicit withdrawls

- Instead of 48 rounds only took 13 rounds

- Why do Tup/Tshort converge quicker than Tdown/Tlong?
- Answer: Tup/Tshort are decreasing while Tdown/Tlong are increasing
- One a path is selected a longer one will not be picked
- While on Tdown/Tlong you pick the next best one until you are out of choices
- O(1) for Tup while O(n) for Tdown

- Why is there different latencies between the five ISPs?
- Answer: The topological factors, length and number of possible paths (peering relationships, policies and agreements) are the answer.
- Longer routes announced, longer latencies
- Longer routes the more MinRouteAdver rounds

- Loop Detection done at receiver side
- If done, at sender you can get more out of MinRouteAdver round
- MinRouteAdver is good but causes a 30 second delay in end to end communication at best

- 2nd study of convergence
- 20 unique advertisement between 200 pairs of ISPs, 6 months
- Measure the impact of Policies
- Measure the impact of Topology
- Analysis

- One network, two ISPs
- Better connectivity + backup
- Failover = New route convergence
- Work done in this Paper
- Convergence Analysis of Tdown event

- Fault injection announcements
- Logged table snapshot to disk
- Survey of backbone providers
- Routing and peering policies

- Used data to discuss impact on convergence

- How policy impacts number and length of ASPaths with a given route
- Limited inbound acceptance by all ISP

- ISP D filters peering session with ISPG
- D only acceptG’s backbone and customers routes

- ISP A filters peering session with D
- A only acceptD’s backbone and customers routes

- ISP A will accepts G’s routes by chaining

- A will advertise routes with paths “D G” and “D” but not “C D G”
- Done by 13% of ISPs
- Combinations of ASPath and prefix filters create unintentional back-up transit paths

- Interaction of MinRouteAdver timers
- MinRouteAdver is per peer not prefix
- MinRouteAdver interference delays convergence

- ISP1 explored one backup path of length 2
- ISP2 explored backup paths of length 2 and 3
- ISP 3 explored backup paths of length 5