1 / 80

Optimization-Based Approaches to Understanding and Modeling Internet Topology

Optimization-Based Approaches to Understanding and Modeling Internet Topology. David Alderson California Institute of Technology INFORMS Telecom Conference 2004 March 8, 2004. Acknowledgments.

wes
Download Presentation

Optimization-Based Approaches to Understanding and Modeling Internet Topology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimization-Based Approaches to Understanding and Modeling Internet Topology David Alderson California Institute of Technology INFORMS Telecom Conference 2004 March 8, 2004

  2. Acknowledgments • This talk represents joint work with John Doyle (Caltech), Lun Li (Caltech), and Walter Willinger (AT&T—Research) • Many others have contributed to this story • Reiko Tanaka (Caltech) • Steven Low (Caltech) • Ramesh Govindan (USC) • Neil Spring (U.Washington) • Stanislav Shalunov (Abilene) • Heather Sherman (CENIC) • John Dundas (Caltech)

  3. Today’s Agenda • Review recent work of empiricists and theoreticians to understand the router-level topology of the Internet • Understand the causes and implications of “heavy tails” in the complex structure of the Internet • Illustrate how recently popular “scale-free” models of Internet topology are not just wrong, but wildly so • Describe the importance of optimization in the development of explanatory models of Internet topology • Present the HOT framework as an alternative means to understanding the “robust, yet fragile” structure of the Internet and other complex engineering systems • Highlight some open research problems and areas where contributions can be made by the OR/MS community

  4. First Question: Why should we care about modeling the topology of the Internet?

  5. Understanding the topology of the current (and future) Internet is important for many reasons • Design and evaluation of networking protocols • Topology affects performance, not correctness • Understanding large-scale network behavior • Closed-loop feedback: topology design vs. protocols • Is the current design a result of the dominant routing protocols? • Or are the presently used routing protocols the result of some prevailing network design principles? • Ability to study what-if scenarios • Operating policy shifts • Economic changes in the ISP market • Implications for tomorrow’s networks • Provisioning requirements, Traffic engineering, Operating and management policies

  6. (traditional) topology generators provide only connectivity information! protocol info performance measures network topology info connectivity bandwidths delays traffic demand info application-specific Performance Evaluation via Simulation network simulator (e.g. ns2)

  7. Performance Evaluation via Simulation protocol info annotated network graph! performance measures network simulator (e.g. ns2) connectivity bandwidths delays traffic demand info

  8. The Internet as an Infrastructure As the Internet has grown in capability and importance, we have become increasingly dependent on it. • Directly: communication (email, instant messenger, VOIP), information and entertainment, e-commerce • Indirectly: business, education, government have (permanently) replaced physical/manual methods with electronic processes, many of which rely on the Internet. The central importance, open architecture, and evolving technology landscape make the Internet an attractive target for asymmetric attack.

  9. The Internet as a Case Study The Internet is a great starting point for the study of other highly engineered complex network systems: • To the user, it creates the illusion of a simple, robust, homogeneous resource enabling endless varieties and types of technologies, physical infrastructures, virtual networks, and applications (heterogeneous). • Its complexity is starting to approach that of simple biological systems • Our understanding of the underlying technology together with the ability to perform detailed measurements means that most conjectures about its large-scale properties can be unambiguously resolved, though often not without substantial effort.

  10. Question Two: Why is research on Internet topology interesting/difficult?

  11. A Challenging Problem Since the decommissioning of the NSFNet in 1995, it has been difficult to obtain comprehensive knowledge about the topology of the Internet • The network has grown dramatically (number of hosts, amount of traffic, number of ISPs, etc.) • There have been economic incentives for ISPs to maintain secrecy about their topologies • Direct inspection usually not allowed

  12. Qwest US Fiber Map – June 2001 (to Europe) (to Japan) Source: www.qwest.com (to Hawaii)

  13. Sprint backbone

  14. Measuring Topology • Marketing documents not very helpful • The task of “discovering” the network has been left to experimentalists • Must develop sophisticated methods to infer this topology from appropriate network measurements. • Many possible measurements that can be made.

  15. Router-Level Topology • Nodes are machines (routers or hosts) running IP protocol • Measurements taken from traceroute experiments that infer topology from traffic sent over network • Subject to sampling errors and bias • Requires careful interpretation Routers Hosts

  16. AS1 AS3 AS2 AS4 AS Topology • Nodes are entire networks (ASes) • Links = peering relationships between ASes • Relationships inferred from Border Gateway Protocol (BGP) information • Really a measure of business relationships, not network structure

  17. Measuring Topology • Marketing documents not very helpful • The task of “discovering” the network has been left to experimentalists • Must develop sophisticated methods to infer this topology from appropriate network measurements. • Many possible measurements that can be made. • Each type of measurement has its own strengths, weaknesses, and idiosyncrasies, and results in a distinct view of the network topology. • Hard to know what “matters”…

  18. “Standard” Approach • Choose a sequence of well-understood metrics or observed featuresof interest, such as • hierarchy • node-degree distributions • clustering coefficients • Develop a method that matches these metrics Pros: • Always possible to obtain a good “fit” on a chosen metric. Cons: • Hard to choose the “right” metric. What is “right” is apt to vary, depending on the intended use of the topology. • A method that does a good job of matching the chosen metric often does not fit other metrics well. • No predictive power. We call this approach descriptive (evocative) modeling.

  19. An Alternative Approach • Identify the causal forces at work in the design and evolution of real topologies. • Develop methods that generate and evolve topologies in a manner consistent with these forces. Pros: • Ability to generate (and design!) topologies at different levels of hierarchy • More realistic topology • Greater predictive power • Possibly reveal some relationship with routing protocols Cons: • Difficult to identify the causal forces • Requires careful development and diligent validation We call this approach explanatory modeling.

  20. Our Approach: Focus on the ISP • Capture and represent realistic drivers of Internet deployment and operation at the level of the single ISP • Many important networking issues are relevant at the level of the ISP (e.g. configuration, management, pricing, provisioning) • Common topologies represented in terms of ISPs • Router-level graphs  connectivity within the ISP • AS graphs  connectivity between ISPs • First-Order Objective: the ability to generate a “realistic, but fictitious” ISP topology at different levels of hierarchy

  21. ISP Driving Forces • Economic Factors: • Cost of procuring, installing, and maintaining the necessary facilities and equipment • Limited budget for capital expenditures • Need to balance expenditures with revenue streams • Need to leverage investment in existing infrastructure • Location of customers • Technological Factors: • Hardware constraints (e.g. router speeds, limited # interfaces or line cards per router) • Level 2 Technologies (Sonet, ATM, WDM) • Existing legacy infrastructure • Location and availability of dark fiber

  22. Mathematical Framework • Use combinatorial optimization to represent the problem and its constraints • Objectives (min cost, max profitability, satisfy demand) • Constraints (equipment costs/capacities, legacy infrastructure) • Parameters (pricing, provisioning, facility location) • Study and explore how this framework allows for a range of ISP behavior • Effect of objectives and constraints are the most important • Part of a more general framework • Highly Optimized Tolerance (HOT), Carlson and Doyle, 1999

  23. Question Three: How is research on Internet topology different from what OR/MS researchers are used to doing on other complex engineering networks?

  24. A Different Type of Problem • Researchers in network optimization are used to problems in which the objectives and constraints are well-defined • Here, we are trying to uncover the “most significant” drivers of topology evolution so that we can create “fictitious, yet realistic” network counterparts, and also so that we can design and build improved networks • We will need to iterate between modeling, measurement, and analysis to get it right • In this sense, this is a bit more like biology • Recent progress has given hope that a comprehensive theory for the Internet may be possible

  25. Heterogeneity of the Internet • Full of “high variability” • Link bandwidth: Kbps – Gbps • File sizes: a few bytes – Mega/Gigabytes • Flows: a few packets – 100,000+ packets • In/out-degree (Web graph): 1 – 100,000+ • Delay: Milliseconds – seconds and beyond • Diversity in the technologies that comprise the physical and link layers • Diversity in the applications and services that are supported • This heterogeneity has evolved organically from an architecture that was designed to be robust to changes (failures or innovation) and is permanent

  26. The Internet hourglass Applications Web FTP Mail News Video Audio ping Kazaa Transport protocols TCP SCTP UDP ICMP IP Ethernet 802.11 Power lines ATM Optical Satellite Bluetooth Link technologies

  27. The Internet hourglass Applications Web FTP Mail News Video Audio ping Kazaa TCP IP Ethernet 802.11 Power lines ATM Optical Satellite Bluetooth Linktechnologies

  28. IP on everything The Internet hourglass Applications Everything on IP Web FTP Mail News Video Audio ping Kazaa TCP IP Ethernet 802.11 Power lines ATM Optical Satellite Bluetooth Linktechnologies

  29. Link A Theory for the Internet? • Vertical decomposition of the protocol stack allows for the treatment of layers in isolation (a separation theorem) • Assume that layers not considered perform in a near-optimal manner • Use an engineering design-based perspective in two ways: • Analysis: explain the complex structure that is observed • Synthesis: suggest changes or improvements to the current design Applications TCP/ AQM IP

  30. Link A Theory for the Internet? How to design an application that “performs well” in meeting user demands subject to the resources/constraints made available by TCP/IP? Applications TCP/ AQM IP

  31. Link A Theory for the Internet? Applications TCP/ AQM How to design a network that “performs well” and satisfies traffic demands subject to the physical resources/constraints? IP

  32. Link A Theory for the Internet? If TCP/AQM is the answer, what is the question? Applications gives ? TCP/ AQM gives IP Primal/dual model of TCP/AQM congestion control…

  33. Link A Theory for the Internet? Applications TCP/ AQM IP ? If the current topology of the Internet is the answer, what is the question?

  34. Next Question: What’s been done to try and understand the large-scale structure of the Internet? How should we think about the Internet’s router-level topology?

  35. Observation Modeling Approach • Random graph models (Waxman, 1988) generate connectivity-only topologies • Long-range links are expensive (router-level). • Real networks are not random, but have obvious hierarchy (router-level). • Structural models (GT-ITM Calvert/Zegura, 1996) generate connectivity-only topologies with inherent hierarchy • Router-level and AS graphs exhibit heavy-tailed distributions (power laws) in characteristics such as node-degree. • Degree-based models (including popular “scale-free” models) generate connectivity-only topologies with inherent power laws in node degree distribution Trends in Topology Modeling

  36. Power Laws and Internet Topology A few nodes have lots of connections Observed scaling in node degree and other statistics: • Autonomous System (AS) graph • Router-level graph How to account for high variability in node degree? Source: Faloutsos et al (1999) number of connections rank rank Most nodes have few connections

  37. Power Laws in Topology Modeling • Recent emphasis has been on whether or not a given topology model/generator can reproduce the same types of macroscopic statistics, especially power law-type degree distributions • Lots of degree-based models have been proposed • All of them are based on random graphs, usually with some form of preferential attachment • All of them are connectivity-only models and tend to ignore engineering-specific system details • Examples: BRITE, INET, Barabasi-Albert, GLP, PLRG, CMU-generator

  38. Models of Internet Topology • These topology models are merely descriptive • Measure some feature of interest (connectivity) • Develop a model that replicates that feature • Make claims about the similarity between the real system and the model • A type of “curve fitting”? • Unfortunately, by focusing exclusively on node degree distribution, these models that get the story wrong • We seek something that is explanatory • Consistent with the drivers of topology design and deployment • Consistent with the engineering-related details • Can be verified through the measurement of appropriate system-specific details

  39. Our Perspective • Must consider the explicit design of the Internet • Protocol layers on top of a physical infrastructure • Physical infrastructure constrained by technological and economic limitations • Emphasis on network performance • Critical role of feedback at all levels • Consider the ability to match large scale statistics (e.g. power laws) as secondary evidence of having accounted for key factors affecting design

  40. Observation Modeling Approach • Random graph models (Waxman, 1988) generate connectivity-only topologies • Long-range links are expensive (router-level). • Real networks are not random, but have obvious hierarchy (router-level). • Structural models (GT-ITM Calvert/Zegura, 1996) generate connectivity-only topologies with inherent hierarchy • Router-level and AS graphs exhibit heavy-tailed distributions (power laws) in characteristics such as node-degree. • Degree-based models (including popular “scale-free” models) generate connectivity-only topologies with inherent power laws in node degree • Physical networks have hard technological (and economic) constraints. • Optimization-driven models generate annotated topologies consistent with design tradeoffs of network engineers Trends in Topology Modeling

  41. HOT Highly Heavily Heuristically • Based on ideas of Carlson and Doyle (1999) • Complex structure (including power laws) of highly engineered technology (and biological) systems is viewed as the natural by-product of tradeoffs between system-specific objectives and constraints • Non-generic, highly engineered configurations are extremely unlikely to occur by chance • Result in “robust, yet fragile” system behavior Optimized Organized Tolerance Tradeoffs

  42. Heuristic Network Design What factors dominate network design? • Economic constraints • User demands • Link costs • Equipment costs • Technology constraints • Router capacity • Link capacity

  43. Internet End-User Bandwidths high performance computing 1e4 POS/Ethernet 1-10Gbps 1e3 academic and corporate 1e2 Ethernet 10-100Mbps Connection Speed (Mbps) residential and small business 1e1 a few users have very high speed connections Broadband Cable/DSL ~500Kbps 1 most users have low speed connections 1e-1 Dial-up ~56Kbps How to build a network that satisfies these end user demands? 1e-2 1e6 1 1e2 1e4 1e8 Rank (number of users)

  44. Economic Constraints • Network operators have a limited budget to construct and maintain their networks • Links are tremendously expensive • Tremendous drive to operate network so that traffic shares the same links • Enabling technology: multiplexing • Resulting feature: traffic aggregation at edges • Diversity of technologies at network edge (Ethernet, DSL, broadband cable, wireless) is evidence of the drive to provide connectivity and aggregation using many media types

  45. Hosts Heuristically Optimal Network Mesh-like core of fast, low degree routers Cores High degree nodes are at the edges. Edges

  46. Heuristically Optimal Network Claim: economic considerations alone suggest a structure having • Mesh-like core of high-speed, low degree routers • High degree, low-speed nodes at the edge • Is this consistent with technology capability? • Is this consistent with real network design?

  47. Cisco 12000 Series Routers • Modular in design, creating flexibility in configuration. • Router capacity is constrained by the number and speed of line cards inserted in each slot. Source: www.cisco.com

  48. Cisco 12000 Series Routers Technology constrains the number and capacity of line cards that can be installed, creating a feasible region.

  49. Cisco 12000 Series Routers Pricing info: State of Washington Master Contract, June 2002 (http://techmall.dis.wa.gov/master_contracts/intranet/routers_switches.asp) $2,762,500 $1,667,500 $932,400 $560,500 $602,500 $381,500 $212,400 $128,500

More Related