1 / 40

Scaling routers: Where do we go from here? HPSR, Kobe, Japan May 28 th, 2002

Scaling routers: Where do we go from here? HPSR, Kobe, Japan May 28 th, 2002. Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu www.stanford.edu/~nickm. Router capacity x2.2/18 months. Moore’s law x2/18 m. Router capacity

lucine
Download Presentation

Scaling routers: Where do we go from here? HPSR, Kobe, Japan May 28 th, 2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scaling routers: Where do we go from here? HPSR, Kobe, Japan May 28th, 2002 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu www.stanford.edu/~nickm Nick McKeown

  2. Router capacity x2.2/18 months Moore’s law x2/18 m Nick McKeown

  3. Router capacity x2.2/18 months Moore’s law x2/18 m DRAM access rate x1.1/18 m Nick McKeown

  4. Router vital statistics Cisco GSR 12416 Juniper M160 19” 19” Capacity: 160Gb/sPower: 4.2kW Capacity: 80Gb/sPower: 2.6kW 6ft 3ft 2ft 2.5ft Nick McKeown

  5. Internet traffic x2/yr 5x Router capacity x2.2/18 months Nick McKeown

  6. Fast (large) routers • Big POPs need big routers POP with large routers POP with smaller routers • Interfaces: Price >$200k, Power > 400W • About 50-60% of interfaces are used for interconnection within the POP. • Industry trend is towards large, single router per POP. Nick McKeown

  7. Job of router architect • For a given set of features: Nick McKeown

  8. Mind the gap • Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption. • Our options: • Make routers simple • Use more parallelism • Use more optics Nick McKeown

  9. Mind the gap • Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption. • Our options: • Make routers simple • Use more parallelism • Use more optics Nick McKeown

  10. Make routers simple • We tell our students that Internet routers are simple. All routers do is make a forwarding decision, update a header, then forward packets to the correct outgoing interface. • But I don’t understand them anymore. • List of required features is huge and still growing, • Software is complex and unreliable, • Hardware is complex and power-hungry. Nick McKeown

  11. Router linecard OC192c linecard Lookup Tables Buffer & State Memory Optics Packet Processing Buffer Mgmt & Scheduling Physical Layer Framing & Maintenance Buffer Mgmt & Scheduling • 30M gates • 2.5Gbits of memory • 1m2 • $25k cost, $200k price. Buffer & State Memory Scheduler Nick McKeown

  12. Things that slow routers down • 250ms of buffering • Requires off-chip memory, more board space, pins and power. • Multicast • Affects everything! • Complicates design, slows deployment. • Latency bounds • Limits pipelining. • Packet sequence • Limits parallelism. • Small internal cell size • Complicates arbitration. • DiffServ, IntServ, priorities, WFQ etc. • Others: IPv6, Drop policies, VPNs, ACLs, DOS traceback, measurement, statistics, … Nick McKeown

  13. An example: Packet processing CPU Instructions per minimum length packet since 1996 Nick McKeown

  14. Reducing complexityConclusion • Need aggressive reduction in complexity of routers. • Get rid of irrelevant requirements and irrational tests. • It is not clear who has the right incentive to make this happen. • Else, be prepared for core routers to be replaced by optical circuit switches. Nick McKeown

  15. Mind the gap • Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption. • Our options: • Make routers simpler • Use more parallelism • Use more optics Nick McKeown

  16. Use more parallelism • Parallel packet buffers • Parallel lookups • Parallel packet switches • Things that make parallelism hard: • Maintaining packet order, • Making throughput guarantees, • Making delay guarantees, • Latency requirements, • Multicast. Nick McKeown

  17. Parallel Packet Switches Router 1 rate, R rate, R 1 1 2 rate, R rate, R N N k Bufferless Nick McKeown

  18. Characteristics • Advantages • kh a memory bandwidth i • kh a lookup/classification rate i • kh a routing/classification table size I • With appropriate algorithms • Packets remain in order, • 100% throughput, • Delay guarantees (at least in theory). Nick McKeown

  19. Mind the gap • Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption. • Our options: • Make routers simpler • Use more parallelism • Use more optics Nick McKeown

  20. All-optical routers don’t make sense • A router is a packet-switch,and so requires • A switch fabric, • Per-packet address lookup, • Large buffers for times of congestion. • Packet processing/buffering infeasible with optics • A typical 10 Gb/s router linecard has 30 Mgates and 2.5 Gbits of memory. • Research Problem • How to optimize the architecture of a router that uses an optical switch fabric? Nick McKeown

  21. 100Tb/s optical routerStanford University Research Project • Collaboration • 4 Professors at Stanford (Mark Horowitz, Nick McKeown, David Miller and Olav Solgaard), and our groups. • Objective • To determine the best way to incorporate optics into routers. • Push technology hard to expose new issues. • Photonics, Electronics, System design • Motivating example: The design of a 100 Tb/s Internet router • Challenging but not impossible (~100x current commercial systems) • It identifies some interesting research problems Nick McKeown

  22. 100Tb/s optical router Optical Switch Electronic Linecard #1 Electronic Linecard #625 160- 320Gb/s 160- 320Gb/s 40Gb/s • Line termination • IP packet processing • Packet buffering • Line termination • IP packet processing • Packet buffering 40Gb/s 160Gb/s 40Gb/s Arbitration Request 40Gb/s Grant (100Tb/s = 625 * 160Gb/s) Nick McKeown

  23. Research Problems • Linecard • Memory bottleneck: Address lookup and packet buffering. • Architecture • Arbitration: Computation complexity. • Switch Fabric • Optics: Fabric scalability and speed, • Electronics: Switch control and link electronics, • Packaging: Three surface problem. Nick McKeown

  24. 160Gb/s Linecard: Packet Buffering DRAM DRAM DRAM 160 Gb/s 160 Gb/s Queue Manager SRAM • Problem • Packet buffer needs density of DRAM (40 Gbits) and speed of SRAM (2ns per packet) • Solution • Hybrid solution uses on-chip SRAM and off-chip DRAM. • Identified optimal algorithms that minimize size of SRAM (12 Mbits). • Precisely emulates behavior of 40 Gbit, 2ns SRAM. klamath.stanford.edu/~nickm/papers/ieeehpsr2001.pdf Nick McKeown

  25. The Arbitration Problem • A packet switch fabric is reconfigured for every packet transfer. • At 160Gb/s, a new IP packet can arrive every 2ns. • The configuration is picked to maximize throughput and not waste capacity. • Known algorithms are too slow. Nick McKeown

  26. Approach • We know that a crossbar with VOQs, and uniform Bernoulli i.i.d. arrivals, gives 100% throughput for the following scheduling algorithms: • Pick a permutation uar from all permutations. • Pick a permutation uar from the set of size N in which each input-output pair (i,j) are connected exactly once in the set. • From the same set as above, repeatedly cycle through a fixed sequence of N different permutations. • Can we make non-uniform, bursty traffic uniform “enough” for the above to hold? Nick McKeown

  27. 1 1 1 N N N 2-Stage Switch External Inputs Internal Inputs External Outputs Spanning Set of Permutations Spanning Set of Permutations • Recently shown to have 100% throughput • Mild conditions: weakly mixing arrival processes C.S.Chang et al.: http://www.ee.nthu.edu.tw/~cschang/PartI.pdf Nick McKeown

  28. 1 1 N N 2-Stage Switch External Inputs Internal Inputs External Outputs Spanning Set of Permutations Spanning Set of Permutations 1 N Nick McKeown

  29. 2 1 1 2 1 1 1 N N N Problem: Unbounded Mis-sequencing External Inputs Internal Inputs External Outputs Spanning Set of Permutations Spanning Set of Permutations • Side-note: Mis-sequencing is maximized when arrivals are uniform. Nick McKeown

  30. 1 1 1 N N N Preventing Mis-sequencing Large Congestion Buffers Small Coordination Buffers & ‘FFF’ Algorithm Spanning Set of Permutations Spanning Set of Permutations • The Full Frames First algorithm: • Keep packets ordered and • Guarantees a delay bound within the optimum Infocom’02: klamath.stanford.edu/~nickm/papers/infocom02_two_stage.pdf Nick McKeown

  31. ExampleOptical 2-stage Switch Linecards Lookup Phase 1 Buffer 1 Lookup Buffer 2 Phase 2 Lookup Buffer Idea: Use a single-stage twice 3 Nick McKeown

  32. ExamplePassive Optical 2-Stage “Switch” R/N R/N Ingress Linecard 1 Midstage Linecard 1 Egress Linecard 1 R/N R/N Ingress Linecard 2 Midstage Linecard 2 Egress Linecard 2 Ingress Linecard n Midstage Linecard n Egress Linecard n R/N R/N It is helpful to think of it as spreading rather than switching. Nick McKeown

  33. N N 2-Stage spreading Buffer stage 1 1 1 N Nick McKeown

  34. 1 1 2 2 n n Passive Optical Switching Integrated AWGR or diffraction grating based wavelength router Midstage Linecard 1 Egress Linecard 1 Ingress Linecard 1 1 1 1 1 Midstage Linecard 2 Egress Linecard 2 Ingress Linecard 2 2 2 2 2 Midstage Linecard n Egress Linecard n Ingress Linecard n n n n n Nick McKeown

  35. 100Tb/s Router Optical links Optical Switch Fabric Racks of 160Gb/s Linecards Nick McKeown

  36. DRAM DRAM DRAM DRAM DRAM DRAM Queue Manager Queue Manager SRAM SRAM Lookup Lookup Racks with 160Gb/s linecards Nick McKeown

  37. 40 μm Additional Technologies • Demonstrated or in development • Chip to chip optical interconnects with total power dissipations of several mW. • Demonstration of wavelength division multiplexed chip interconnect. • Integrated laser modulators. • 8Gsample/s serial links. • Low-power variable power supply serial links. • Integrated array waveguide routers. Nick McKeown

  38. Mind the gap • Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption. • Our options: • Make routers simpler • Use more parallelism • Use more optics Nick McKeown

  39. Some predictions about core Internet routers • The need for more capacity for a given power and volume budget will mean: • Fewer functions in routers: • Little or no optimization for multicast, • Continued overprovisioning will lead to little or no support for QoS, DiffServ, …, • Fewer unnecessary requirements: • Mis-sequencing will be tolerated, • Latency requirements will be relaxed. • Less programmability in routers, and hence no network processors. • Greater use of optics to reduce power in switch. Nick McKeown

  40. What I believe is most likely The need for capacity and reliability will mean: • Widespread replacement of core routers with transport switching based on circuits: • Circuit switches have proved simpler, more reliable, lower power, higher capacity and lower cost per Gb/s. Eventually, this is going to matter. • Internet will evolve to become edge routers interconnected by rich mesh of WDM circuit switches. Nick McKeown

More Related