1 / 31

February 2008

Would Diversity Really Increase the Robustness of the Routing Infrastructure against Software Defects?. The answer is: Yes. February 2008. Juan Caballero, Theocharis Kampouris Carnegie Mellon. Dawn Song Carnegie Mellon & UC Berkeley. Jia Wang AT&T Labs. Software defects in routers.

tien
Download Presentation

February 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Would Diversity Really Increase the Robustness of the Routing Infrastructure against Software Defects? The answer is: Yes February 2008 Juan Caballero, Theocharis Kampouris Carnegie Mellon Dawn Song Carnegie Mellon & UC Berkeley Jia Wang AT&T Labs

  2. Software defects in routers • Defects in router software not uncommon • Multiple vulnerabilities in routers uncovered • DoS: maliciously crafted packets cause reload [CERT5] • DoS: maliciously crafted packets cause excessive resource consumption [CERT2,CERT4] • Remote execution of system-level commands [CERT3] • Unauthorized privileged access [CERT1] • Possible remote shell execution [CERT7]

  3. Simultaneous router failure • Routing infrastructure highly homogeneous • What if a software defect makes it possible to simultaneously take down many routers? • Worst case scenario. Rare. • But, huge impact  Highly damaging to ISP’s reputation • Diversity • Multiple implementations from different code bases • Reduces number of nodes affected by a bug[Zhang01,Junqueira05,O’Donnell04] • But, how well would it work on routers?

  4. Scope • We focus on the effect on network connectivity • Impact on higher layers left as future work • Includes: routing convergence, packet loss, delay… • Why? • Because no connectivity means no communication • What about fundamental limitations of diversity? • Vulnerabilities that are shared among vendors • General problem with no good solution • Deployment cost • Depends on how much diversity is already available

  5. Statement • This paper does not claim: • that diversity can protect against all software defects • that we should redesign all networks to accommodate for diversity • Rather, we show: • that diversity greatly helps with simultaneous router failures • that networks might already have a surprising amount of diversity • But, it is not used to increase the robustness!

  6. Contributions • Answering four fundamental questions: • How do we measure robustness of a network against simultaneous router failures? • How to best use the diversity? • How much diversity is needed to guarantee a certain degree of robustness? • Is there enough diversity already in the network or do we need to introduce more?

  7. Problem definition • Graph theoretic approach G = (V,E) • Nodes are routers (V), Edges are links (E) • A version of a graph coloring problem where: • Colors represent implementations • A failure is a color removal • Different from well-known optimal coloring problem • Network Robustness = Resilience to simultaneous router failure • How connected is the network when multiple nodes fail? • The goal is to assign a color to each router from a set of k available colors such that the network robustness (Φ) is maximized

  8. Determining the best coloring • Abilene network with 2 colors (k = 2) Φ = 0.23 Φ = 0.18 Φ = 0.05 Φ = 0.42 • We want to automatically select the best coloring

  9. Outline • Introduction • Metrics • Connectivity • Robustness • Algorithms • Evaluation

  10. Metrics • Need metrics to quantify the robustness of the colored graph  the resilience to the failure • We need two types of metrics: • Connectivity metrics: Given a graph determine how connected it is • Many graph connectivity metrics already proposed • We select some existing ones • Robustness metrics: Given a colored graph determine how robust it is • We propose new ones • The robustness metrics will be a function of the connectivity metrics

  11. Outline • Introduction • Metrics • Connectivity • Robustness • Algorithms • Evaluation

  12. Connectivity metrics: NSLC • Given a graph determine how connected it is • Normalized size of largest component (NSLC) [Albert00] A B 1 component 2 components NSLC = 1 NSLC = 0.66

  13. Connectivity metrics: PC • Pair Connectivity (PC) [Park03] A B 1 component 2 components PC= 1 PC = 0.33 We have versions of the metrics that support node weights

  14. Outline • Introduction • Metrics • Connectivity • Robustness • Algorithms • Evaluation

  15. Robustness metrics • Robustness of a colored graph measures the remaining connectivity when a color is removed • Remove a color => Disconnect all nodes using the color • Robustness is a function of the connectivity metric f applied over the diverse color-removal subgraphs • Probability of failure of each color is unknown • Two metrics: average and minimum (worst-case)

  16. Minimum and average robustness G2 • Average robustness good • Minimum robustness bad • Average robustness can be misleading by itself G2red G2blue NSLC=0.18 NSLC=0.82

  17. Outline • Introduction • Metrics • Algorithms • Evaluation

  18. Algorithms • We have devised a total of 9 algorithms which can be classified into 4 families • Only present the Region coloring algorithms in paper • Rest are on the extended version [ColoringTR] • Region coloring algorithms outperform others in evaluation

  19. Region coloring algorithms • Divide the network into contiguous regions • Regions are automatically found • Includes 2 algorithms: Cluster & Partition • Algorithms accept number of regions (k) as input • Graph partitioning algorithms try to balance the number of nodes in each partition (i.e., region) Region 2 Region 1

  20. Results overview • There is a trade-off usually between perfectly balanced partitions and contiguous partitions • Results will show that: • Balanced regions are better • Slightly imbalanced but contiguous partitions are better than perfectly balanced but discontiguous partitions Region 1 Region 2 Good partition Region 1 Region 2 Region 1 Bad partition

  21. Roles and Replicated nodes • Roles: • Not all routers can use all implementations • Two roles: Access / Backbone • One color-set for each role • Nodes have roles and can only use implementations from the color-set of their role • Replicated nodes: • ISPs usually replicate important nodes • Increases resilience against single node failures • Load-sharing • In real networks, replicas are colored identically • For robustness, replicas need to be colored differently

  22. Extended Partition Algorithm • Color all backbone routers • Create backbone graph by removing all access routers • First color replicas with different colors • Then color rest using partition algorithm • Color the access routers • Create the access graph by collapsing all backbone nodes into a single node • Two cases depending on independence of access / backbone implementations

  23. Outline • Introduction • Metrics • Algorithms • Evaluation

  24. Evaluation Setup Real Rocketfuel Synth. • Metrics + algorithms implemented using the JUNG graph library [JUNG] • Graph clustering algorithm from Wu et al. [Wu04] • Graph partition algorithm from Karypis et al. [Karypis00]

  25. Coloring Algorithms: Setup • Same topology (Tier-1 ISP) colored using different algorithms • Random as “lower bound” • Max as “upper bound”

  26. Coloring Algorithms: Results • Partition/Cluster best on average • Region coloring minimizes impact • Partition best on worst case • More balanced coloring than Cluster • Partition performs close to Max in both average/worst cases • Non-contiguous partitions are bad (dip at k=5)

  27. Redistributing the existing diversity • Tier-1 ISP contains 8 implementations (2 backbone, 6 access) • Due to: legacy routers, vendor change, budget constraints • Two implementations used by 90% of the nodes • What happens if we redistribute the same diversity using our algorithms? • Number of nodes in largest component goes from 5% to 76% • Requires: • Changing the number of nodes that use each implementation • Changing the geographical distribution of the implementations

  28. Minimal diversity for decent robustness • Two colors are enough for the backbone • Most backbone routers are replicated • Decent robustness starts with 3 colors for access routers • More than 5 colors for access routers do not buy much

  29. Related Work • Diversity as solution against software defects • Diversity in all network layers [Zhang01] • Diversity in distributed systems [Junqueira05] • Diversity to slow malware propagation [O’Donnell04] • Analysis of the Internet robustness[Albert00, Faloutsos99, Li04, Magoni03, Palmer01, Park03, Tangmunarunkit02, Zegura97] • Analysis of failures in networks [Markopoulou04, NIST02] • Router-level topologies [Spring02] • Node Importance metrics [Freeman77, Lorrain71, Newman02, Tauro01] • Clustering and Partitioning [Karypis00, Wu04, etc]

  30. Conclusions • How do we measure robustness of a network against simultaneous router failures? • Proposed robustness metrics • How to use the diversity best? • Proposed coloring algorithms that achieve robustness close to the one obtained by a fully connected network • How much diversity is needed to guarantee a certain degree of robustness? • Not much. 2 backbone + 3 access for Tier-1 ISP • Is there enough diversity already in the network or do we need to introduce more? • Amount of diversity surprisingly high • Redistributing the diversity can increase the number of nodes surviving a failure from 5% to 76%

  31. Questions?

More Related