1 / 24

Quantifying Path Exploration in the Internet

Quantifying Path Exploration in the Internet. Ricardo Oliveira, Rafit Izhak-Ratzin, Lixia Zhang, UCLA Beichuan Zhang, UArizona Dan Pei, AT&T Labs -- Research IMC’06, Rio de Janeiro. Motivation. There has been extensive work measuring BGP convergence , however most work:

vahe
Download Presentation

Quantifying Path Exploration in the Internet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quantifying Path Exploration in the Internet Ricardo Oliveira, Rafit Izhak-Ratzin, Lixia Zhang, UCLA Beichuan Zhang, UArizona Dan Pei, AT&T Labs -- Research IMC’06, Rio de Janeiro

  2. Motivation • There has been extensive work measuring BGPconvergence, however most work: • was done in controlled simulation environments, e.g. [Labovitz’00] • using a small number of beacon-like prefixes, e.g.[Labovitz’00, Labovitz’01, Mao’03] • We did a systematic measurement of path exploration in the operational Internet

  3. Talk Outline • Background on BGP convergence • Measurement methodology • Event characterization • Impact of policy and topology in observed convergence

  4. BGP Background and Monitoring • BGP is a path-vector protocol • Collectors gather BGP routing tables + BGP updates e.g. UCLA X=AS52 announcing prefix 131.179/16 X Collector 131.179/16: [X] 131.179/16 : [Y X] Monitor Y 131.179/16 : [Z Y X] Monitor Z 131.179/16 : [Y X]

  5. 2 3 W time Relative convergence time What is path exploration? A B Q: What happens if link F-G fails? 3 A: Node E explores 2 paths before declaring G unreachable… C D • Q: Why is this a problem? • Delays andloss of data pkts • Extra router processing 2 E F X G 1 Peer Peer Provider Customer

  6. Talk Outline • Background on BGP convergence • Measurement methodology • Event characterization • Impact of policy and topology in observed convergence

  7. Methodology • Data Set: 50 monitors of RV+RIPE and 1 month of data (Jan’06) Raw BGP feed Preprocessing Event Identification Event Classification Timeout T Path Rank Heuristic • Preprocessing: removed session resets; cleaned beacons using anchor prefixes • Event Identification: grouped updates for same (monitor,prefix) across time using relative timeout T • Event Classification: classify events according to explored paths and output of path rank heuristic BGP Beacons were used to calibrate our event identification scheme and evaluated different path rank heuristics

  8. BGP Beacons • Periodic BGP announcements and withdraws that are artificially injected in the network [Mao’03, RIPE] A W A time 2h 2h Beacon Announcement Beacon Withdraw • Used as calibration points: • clean signals: no noise caused by sporadic events • beacon event times are known

  9. Event Identification • A single event can trigger multiple updates • Need to cluster BGP updates along time dimension for each (monitor, prefix) pair • Q: what relative timeout T should we use? A: T=240s (4min)

  10. Event Classification 1 event p1 p2 p3 p4 p5 Final path:p5 Initial path:p0 time p0=p5 p0p5 p0=…=p5 p5>p0 p0>p5 p0= p5=

  11. Classifying Tlong and Tshort events: the problem of path comparision p1 p2 p3 Initial path p0 Final path p3 time 1 event • This event is classified as: • Tshort: if pref(p3) > pref(p0) • Tlong: if pref(p3) < pref(p0) • Because of policy routing, the shorter path is not always the preferred path… • Q: Which path the router prefers: p0 or p3?

  12. Evaluating Path Rank Heuristics

  13. Beacons’ Tdown Evaluating Path Rank Heuristics • Extending this method to all prefixes, the accuracy of each heuristic is: • Policy: 17% • Length: 65% • Policy+ Length: 73% • Usage time: 95% • c_right: # of matches with calibration list • c_wrong: # of mismatches Usage time is most accurate heuristic to determine path preference

  14. Talk Outline • Background on BGP convergence • Measurement methodology • Event characterization • Impact of policy and topology in observed convergence

  15. Characterizing Events Tshort < Tspath ~ Tup < Tlong << Tdown < Tpdist Tdown convergence time is significantly higher than Tlong convergence time, contrasting with worst case analysis of [Labovitz’01]

  16. Talk Outline • Background on BGP convergence • Measurement methodology • Event characterization • Impact of policy and topology in observed convergence

  17. The impact of policy and topology in observed convergence • How is the convergence process perceived by monitors in different locations in the Internet? Non-MRAI • What about MRAI timer? • BGP RFC specifies that the MRAI should have a base of 30s + jitter between 0.75 and 1 • Not all ISPs follow RFC . . . MRAI

  18. Impact of monitor location on observed convergence • Set of MRAI monitors : 4 core(tier-1), 15 middle(transit) and 3 edge (stub) Convergence time by monitor location : core < middle < edge

  19. Impact of monitor location on observed convergence 1 2 Peer Peer Provider Customer Core 3 4 Middle Edge 5 6 7 • Monitors at lower tiers have more paths to explore

  20. Further breaking down events by originmonitor pair Worst case: edge  {edge, middle}

  21. 131.179.100/24 131.179/16 C B A The Impact of Tdown Convergence In a Tdown the destination becomes unreachable, therefore we don’t care about routing convergence time … … or do we? Q: What happens when the /24 prefix is withdrawn? A: Routers will experience Tdown convergence, even though the destination is still reachable via the /16 prefix… • According to recent measurements, about 1/3 of prefixes in routing table are in the same scenario as the /24 in this example

  22. Origin of Tdown events Networks in the core are the most stable; edge networks the most unstable (proportion 1:2:3)

  23. Conclusions • Usage time: new path ranking heuristicwhich provides +95% accuracy in determining routers’ path preference • Tdown convergence is by far the longest, even when compared with Tlong • Core-to-core convergence is the fastest case; edge-to-{edge,middle} the slowest • Core networks are three times more stable than edge networks

  24. Thanks!Questions?rveloso@cs.ucla.edu

More Related