1 / 51

Internet Routing Instability

Craig Labovitz, G. Robert Malan, Farnam Jahanian, "Internet Routing Instability." IEEE/ACM Transactions on Networking, 6(5):515-528, 1998. Craig Labovitz, G. Robert Malan, Farnam Jahanian, "Origins of Internet Routing Instability", IEEE INFOCOM 1999.

erling
Download Presentation

Internet Routing Instability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Craig Labovitz, G. Robert Malan, Farnam Jahanian, "Internet Routing Instability." IEEE/ACM Transactions on Networking, 6(5):515-528, 1998. Craig Labovitz, G. Robert Malan, Farnam Jahanian, "Origins of Internet Routing Instability", IEEE INFOCOM 1999. Craig Labovitz, G. Abha Ahuja, Farnam Jahanian, "Experimental Study of Internet Stability and Backbone Failures." FTCS 1999. Internet Routing Instability Three Papers Presented by Michael A. Smith

  2. Background • Events • NSFNet backbone ended in April ‘95 • Evident • Network degradation • bandwidth shortages • lack of router switching capacity • “Death of Internet is Imminent” • reported by popular press • Routing Instability (“route flaps”) • Informally defined as: • “the rapid change of network reachability and topology information” Internet Routing Instability

  3. The Internet Backbone • 12 large ISPs, tier one • 4000-6000 tier two providers • Large public exchange points are considered the “core” of the Internet. • Backbone service providers must maintain a complete map, or default-free routing table. • Divided into different regions of administrative control called autonomous systems (AS’s). • Most AS’s exchange routing information through the border gateway protocol (BGP). Internet Routing Instability

  4. Routing Instability • Origins • Router configuration errors • Transient physical and data link problems • Software bugs • Effects • Poorer end-to-end network performance • Degradation of overall efficiency of the Internet infrastructure Internet Routing Instability

  5. Route Flaps • Result in large number of routing updates passed to core Internet exchange point routers. • Network instability spreads from router to router and propagates throughout the network. • Effects in Internet infrstructure: • Increased packet loss • Delays in time for network convergence • Resource overhead (CPU, memory, etc.) Internet Routing Instability

  6. BGP • An incremental protocol • Does not flood intra-domain network with topological information or link state entries (like IGRP and OSPF) • Sends update information only upon changes in topology or policy • Uses TCP as underlying transport mechanism (as opposed to reliability through datagram service) • As a path vector routing protocol, it limits the distribution of reachability information. Internet Routing Instability

  7. Routing on the Backbone • path - sequence of intermediate AS’s between source and destination routers that form a directed route for packets to travel • Router configuration files allow the stipulation of routing policies which may: • specify the filtering of specific routes • modify path attributes before sharing • Policy decisions can be made based on: • announcement of routes from peers • attributes of announced routes (such as MED’s) • After each router makes a new local decision on the best route to a destination, it sends it. • As the route propagates, each AS appends its unique number to the route’s ASPATH, which, in conjunction with the prefix, provides a specific handle for transit. • The ASPATH mechanism allows a router to detect and prevent routing loops. Internet Routing Instability

  8. Routing Information in BGP • Two forms: • Announcements • Indicates that a router has either learned a new network attachment or has made a policy decision to prefer a diff. route to a destination. • Withdrawals • Sent when a router decides that a network is no longer reachable • Paper distinguishes between: • Explicit – associated with actual withdrawal message • Implicit – existing route replaced by new route • A BGP update may contain multiple announcements and withdrawals. • Ideally, routers should only generate routing updates for relatively infrequent policy changes and the addition of new physical networks. • It’s been found that BGP’s ASPATH mechanism is not sufficient to ensure network convergence. Internet Routing Instability

  9. Methodology of Studies • Geographically diverse exchange points. • Although the route servers do not forward network traffic, the route servers do peer with over 90% of the service providers at each exchange point. Internet Routing Instability

  10. Route Tracker Architecture • Devloped on Sun workstations • Uses MRT and IPMA toolkits to analyze BGP updates Internet Routing Instability

  11. “Internet Routing Instability” • Monitored BGP updates generated by five service provider backbone routers at the major U.S. public exchange points over a period of nine months. • Paper distinguishes three types of updates: • forwarding instability – may reflect legitimate topological changes and affects the paths on which data will be forwarded • routing policy fluctuation – reflects changes in routing policy information that do no affect forwarding paths • pathological – updates are redundant BGP information that do not reflect routing nor forwarding instability • Instability is defined as: • an instance of either forwarding instability or policy fluctuation • Data reflects the stability of inter-domain Internet routing, or changes in topology or policy among AS’s • “Intra-domain routing instability is not explicitly measured and is only indirectly observed through BGP information exchanged with a domain’s peer.” Internet Routing Instability

  12. Results of Study • The number of BGP updates exchanged per day in the Internet core is one or more orders of magnitude larger than expected. • Routing information is dominated by pathological, or redundant updates, which may not reflect changes in routing policy or topology. • Instability and redundant updates exhibit a specific periodicity of 30 and 60 seconds. • Instability and redundant updates show a surprising correlation to network usage and exhibit corresponding daily and weekly cyclic trends. • Instability is not dominated by a small set of autonomous systems or routes. Internet Routing Instability

  13. Results of Study (2) • Instability and redundant updates exhibit both strong high and low frequency components. Much of the high frequency instability is pathological. • Discounting the contribution of redundant updates, the majority (over 80%) of Internet routes exhibits a high degree of stability. • This work has led to specific architectural and protocol changes in commercial Internet routers through the collaboration with vendors. Internet Routing Instability

  14. Methodology of Study (2) • 12 Gb of data starting in January ’96 • Uses several tools from XYZ toolkit • Focuses on largest exchange, Mae-East • Data verification against BGP backbone logs from a number of large service providers Internet Routing Instability

  15. More Background • Problems of network topology fluctuation (non-convergence): • packets get dropped • packets delivered out of order • Internet routers of the day were based on route caching architecture. • Each interface card maintains a routing table of cache of destination and next-hop lookups • If found, then switch on CPU independent “fast-path.” • Sustained levels of instability increase the probability of packet encountering a cache miss, which leads to: • increased load on CPU • increased switching latency • dropped or lost packets • queuing delay, preventing timely routing of Keep-Alive packets • It should be noted that new generations of routers that do not require caching and are able to maintain the full routing table in memory do not exhibit the same pathological loss under heavy routing updates. Internet Routing Instability

  16. Route Flap Storms • A failed router can instigate a “route flap storm.” • This pathological oscillation causes overloaded routers to be marked as unreachable since the required interval of Keep-Alive transmissions is not met. • Peers of the failed router find alternative paths for destinations previously reachable and transmit updates. • After the failed router recovers, it will re-initiate BGP peering sessions with peers, transmit large state dumps, and cause more routers to fail. • “Route Flap Storms” in 1996 caused extended outages for several million network customers. • Newer generations of routers provide a mechanism for giving BGP and Keep-Alive messages higher priority. Internet Routing Instability

  17. Battling Routing Instability • Route Aggregation (Supernetting): • combines a number of smaller IP prefixes into a single, less specific route announcement. • reduces overall number of networks visible on the core Internet • fails in multi-homing (when end-sites have redundant connections to the internet via multiple service providers). • In 1996, more than 25% (and growing) of prefixes were multi-homed and therefore non-aggregatable. • Deployment of route dampening algorithms • “hold-down” updates that exceed certain parameters (i.e. quota of updates per hour) • can introduce artificial connectivity problems as “legitimate” announcements are delayed. Internet Routing Instability

  18. Problems • The internet continues to exhibit high levels of routing instability despite the increased emphasis on aggregation and route dampening. • Internet topology is growing increasingly less hierarchical with the addition of new exchange points and peering relationships. • The behavior and dynamics of Internet routing stability has gone mostly without formal study prior to the publication of the paper. Little was known! Internet Routing Instability

  19. Observations • Disproportionalism: • 42,000 Internet prefixes • 1300 Autonomous Systems • 1500 Unique ASPATHS • 3-6 million routing updates per day • 125 updates per network per day • At times, 100 prefix announcements per sec. • Once exceeded 30 million, monitor crashed! • This is a problem for all but the most high-end of commercial routers, and even they exhibit problems. Internet Routing Instability

  20. Classification of BGP Updates • WADiff – A route is explicitly withdrawn as it becomes unreachable and it is later replaced with an alternative route to the same destination; forwarding instability. • AADiff – A route is implicitly withdrawn and replaced by an alternative route as the original route becomes unreachable, or a preferred alternative path becomes available; forwarding instability. • WADup – A route is explicitly withdrawn and then re-announced as unreachable. This may reflect transient topological (link or router failure, or it may represent a pathological oscillation; forwarding instability or pathological behavior (see next slide) • All considered to be instability Internet Routing Instability

  21. Classification of Pathological Behavior (Redunant Updates) • AADup – A route is implicitly withdrawn and replaced with a duplicate of the original route (a router should only send an update for a change in topology). • WWDup – The repeated transmission of BGP withdrawals for a prefix that is currently unreachable. • All considered to be pathological instability. • Pathological updates may have a minimal impact on the performance of the Internet. Internet Routing Instability

  22. Expected Instability • Problems affecting aggregation into supernets: • Multi-homing • initial lack of hierarchical IP address space allocation • reluctance to renumber IP addresses • Result: Large number of globally visible addresses • Each globally visible address is reachable by one or more paths. • You would expect Internet instability to be proportional to the total number of available paths to all globally visible network addresses or aggregates Internet Routing Instability

  23. Mae-East Routing Updates • Most WWDup withdrawals are transmitted by routers belonging to AS’s that never previously announce reachability from the withdrawn prefixes. • On average, 500,000 – 6 million pathological withdrawals per day Internet Routing Instability

  24. Update Totals per ISP on a Given Day • Many of the exchange point routers withdraw an order of magnitude more routes than they announce during a given day. • Provider I shows the disproportionate effect that a single service provider can have on the global routing mesh. Internet Routing Instability

  25. More Observations • Guess what: • There is a strong causal relationship between the manufacturer of router used by an ISP and the ISP’s exhibited level of pathological BGP behavior. • Routing updates have a regular, specific periodicity, usually either 30 or 60 seconds. • The persistence of instability is the duration of time that routing information fluctuates before it stabilizes. Internet Routing Instability

  26. Origins of Routing Pathologies • Some pathological withdrawals can be at attributed to implementation decisions • time-space trade off in not maintaining state of advertisements • stateless BGP = O(N*U) updates • Presentation of results led to a router vendor’s updating of software to a partial state • Stateless BGP contributes an insignificant number of updates and does not account for oscillating behavior of WWDup and AADup updates. Internet Routing Instability

  27. Origins of Routing Pathologies (2) • Single-homed, stateless peer routers should result in at most O(N) updates, but instead: • It seemed that each legitimate withdrawal induces some type of short-lived pathological network oscillation • Persistence of these updates is between 1 and 5 minutes • Periodic routing instability may be caused by: • inadvertant synchronization on update transmission • improper configuration of interaction between IGP and BGP (conversion is lossy) • Internet Routing Instability still remains poorly understood Internet Routing Instability

  28. Forwarding Instability • Instability Density • Black squares are above a particular threshold (mean of detrended data) (345 updates in March, 770 in September) Internet Routing Instability

  29. Forwarding Instability (2) • A week of raw forwarding • Little instability over the weekend Internet Routing Instability

  30. Forwarding Instability (3) • Time series analyses, FFT and MEM spectral estimation, validate results. • Routing instability corresponds closely to trends in Internet bandwidth usage and packet loss (intuitively obvious?) • Rigorous justification of network usage equating to routing instability is problematic due to the size and heterogeneity of the internet. Internet Routing Instability

  31. Fine-grained Instability Stats. • No single AS consistently dominates the instability statistics. • There is not a correlation between the size (# routes responsible for in table) of an AS and its proportion of the instability statistics. • A small set of paths or prefixes do not dominate the instability statistics; instability is evenly distributed across routes Internet Routing Instability

  32. Fine-grained Instability Stats. (2) • Internet routing tables are dominated by 6-8 ISPs • Over the course of the month, their share of the default-free routing tables did not change significantly Internet Routing Instability

  33. Fine-grained Instability Stats. (3) • Internet routing tables are dominated by 6-8 ISPs • Over the course of the month, their share of the default-free routing tables did not change significantly Internet Routing Instability

  34. Fine-grained Instability Stats. (4) • 80-100% of the daily instability is contributed by Prefix + AS pairs announced less than 50 times. • (a) ISP A announced seven routes between 630 and 650 times with no withdrawals Internet Routing Instability

  35. Fine-grained Instability Stats. (5) • 80-100% of the daily instability is contributed by Prefix + AS pairs announced less than 50 times. • (c) ISP A announced seven routes between 630 and 650 times with no withdrawals Internet Routing Instability

  36. Fine-grained Instability Stats. (6) • (a) 20-90% of AADiff events are contributed by routes that changed 10 times or less • No single route consistently dominates the instability measured. • Some days, a single Prefix+AS pair contributes substantially (40%) - account for lowest curve in (a) (ISP A) • WADiff climbs to a plateau about 95% faster than other three categories. • WADiff has fewest number of Prefix+AS pairs that dominate their days. • Comforting, since categories probably best represent topological instability • Investigation on prefix alone provided similar results. Internet Routing Instability

  37. Temporal Properties of Instability Statistics • Update frequency distributions for instability events at Prefix+AS level • Update frequency is the inverse of the inter-arrival time between routing updates; higher frequency corresponds to a short inter-arrival time • Other work has been able to capture the lower frequencies through both routing table snapshots and end-to-end techniques Internet Routing Instability

  38. Temporal Properties of Instability Statistics (2) • Histogram distribution captured in 30 second and 1 minute bins • You would expect a Poisson distribution reflecting exogneous events, such as power outages, fiber cuts, and natural human events. • 30 second periodicity suggests widespread systematic influence in origin. Internet Routing Instability

  39. Temporal Properties of Instability Statistics (3) • Histogram distribution captured in 30 second and 1 minute bins • You would expect a Poisson distribution reflecting exogneous events, such as power outages, fiber cuts, and natural human events. • 30 second periodicity suggests widespread systematic influence in origin. Internet Routing Instability

  40. Conclusions • Routing instability can have a significant deleterious impact in Internet infrastructure • Majority (99%) of routing information is pathological and may not reflect real network topological changes. • Instability is well distributed across AS’s and prefix space. • Instability and redundant routing information exhibit a strong periodicity (of unknown origin). Internet Routing Instability

  41. Conclusions (2) • Proportion of Internet Routes affected by routing updates Internet Routing Instability

  42. Conclusions (3) • Current trends in the evolution of the Internet may have a significant impact on routing instability and the future performance of the network. • 25% of networks are multi-homed and the growth rate is about linear • Proliferation of exchange points is leading to a less hierarchical Internet. • This research helps characterize the effect of added topological complexity since the end of the NSFNet backbone. Internet Routing Instability

  43. “Origins of Internet Routing Instability” • 28 months gathering data from more than 40 commercial routers, switches, and Unix-based PC routers • Also collected IBGP information at the state of Michigan’s public Internet backbone, MichNet • Maintains that routing instability remains well distributed across prefix and AS space but that instability is not related to prefix length. • Since previous paper’s work, the volume of inter-domain routing messages in the Internet core has decreased by an order of magnitude. Internet Routing Instability

  44. Research Pays Off • Number of BGP updates almost doubled in 28 mo.’s • Number of announcements per day eventually (finally) surpassed the number of withdrawals at Mae East. • On average, across backbone, exchange point routers generated only half of the number of withdrawals at the number of announcements Internet Routing Instability

  45. New Routing Update Categories • We still have AADiff, AADup, and WWDup, but we add: • Tup and Tdown – fluctuation in the reachability for a given prefix. An announced route is withdrawn and transitions down, or a currently unreachable prefix is announced as reachable and transitions up Internet Routing Instability

  46. Breakdown of BGP Updates • Tup roughly equal to Tdown, connection recovery (good!) • Fluctuation in prefix reachability account for over 40% of all non WWDup BGP traffic • After January ’98, AADup comprised largest cat. of updates. Internet Routing Instability

  47. Analysis of AADiffs • 90% of MED oscillations involve only two large ISPs, product of their specific routing policies. Internet Routing Instability

  48. Dynamically Mapped MED • AS2 always wants traffic flowing from AS3 to AS1 to take the shortest path through its network, so instead of setting the MED value via static configuration rules, AS2 dynamically maps the IGP distance between R5 and R3, and between R5 and R4 to the MED attribute value associated with route advertisements from routers R3 and R4 to AS1. • AS2 influences AS1 who wants to reach Network A. AS1 will prefer the route via R4. Internet Routing Instability

  49. More Results Internet Routing Instability

  50. Conclusions • Improvement • Routing update messages reduced by a magnitude • Suppressed pathological withdrawals • Instability is still well distributed across AS and prefix space • More bugs in router software led to anomalies Internet Routing Instability

More Related