1 / 39

Scaling Internet Routers Using Optics

Scaling Internet Routers Using Optics Isaac Keslassy, Shang-Tse Chuang, Kyoungsik Yu, David Miller, Mark Horowitz, Olav Solgaard, Nick McKeown Department of Electrical Engineering Stanford University Backbone router capacity 1Tb/s 100Gb/s 10Gb/s Router capacity per rack

sandra_john
Download Presentation

Scaling Internet Routers Using Optics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scaling Internet Routers Using Optics Isaac Keslassy, Shang-Tse Chuang, Kyoungsik Yu, David Miller, Mark Horowitz, Olav Solgaard, Nick McKeown Department of Electrical Engineering Stanford University

  2. Backbone router capacity 1Tb/s 100Gb/s 10Gb/s Router capacity per rack 2x every 18 months 1Gb/s

  3. Backbone router capacity 1Tb/s 100Gb/s Traffic 2x every year 10Gb/s Router capacity per rack 2x every 18 months 1Gb/s

  4. Extrapolating 100Tb/s 2015: 16x disparity Traffic 2x every year Router capacity 2x every 18 months 1Tb/s

  5. Consequence • Unless something changes, operators will need: • 16 times as many routers, consuming • 16 times as much space, • 256 times the power, • Costing 100 times as much. • Actually need more than that…

  6. Optical Switch Electronic Linecard #1 Electronic Linecard #625 160-320Gb/s 160-320Gb/s 40Gb/s • Line termination • IP packet processing • Packet buffering • Line termination • IP packet processing • Packet buffering 40Gb/s 160Gb/s 40Gb/s 100Tb/s = 640 * 160Gb/s 40Gb/s Stanford 100Tb/s Internet Router Goal: Study scalability • Challenging, but not impossible • Two orders of magnitude faster than deployed routers • We will build components to show feasibility

  7. Throughput Guarantees • Operators increasingly demand throughput guarantees: • To maximize use of expensive long-haul links • For predictability and planning • Despite lots of effort and theory, no commercial router today has a throughput guarantee.

  8. Requirements of our router • 100Tb/s capacity • 100% throughput for all traffic • Must work with any set of linecards present • Use technology available within 3 years • Conform to RFC 1812

  9. What limits router capacity? Approximate power consumption per rack Power density is the limiting factor today

  10. Crossbar Linecards Switch Linecards Trend: Multi-rack routersReduces power density

  11. Juniper TX8/T640 Alcatel 7670 RSP TX8 Avici TSR Chiaro

  12. Limits to scaling • Overall power is dominated by linecards • Sheer number • Optical WAN components • Per packet processing and buffering. • But power density is dominated by switch fabric

  13. Limit today ~2.5Tb/s • Electronics • Scheduler scales <2x every 18 months • Opto-electronic conversion Switch Linecards Trend: Multi-rack routersReduces power density

  14. Multi-rack routers Switch fabric Linecard In WAN Out In WAN Out

  15. Question • Instead, can we use an optical fabric at 100Tb/s with 100% throughput? • Conventional answer: No. • Need to reconfigure switch too often • 100% throughput requires complex electronic scheduler.

  16. Outline • How to guarantee 100% throughput? • How to eliminate the scheduler? • How to use an optical switch fabric? • How to make it scalable and practical?

  17. R R ? R R ? Out ? R R ? R R R R ? R R R ? R Out ? R R R R ? ? R Out Switch capacity = N2R Router capacity = NR 100% Throughput In In In

  18. R R/N R/N Out R/N R/N R R R R/N R/N Out R/N R R/N R/N Out If traffic is uniform R In R In R In

  19. R R R R ? R/N In R R/N Out R/N R/N R R R R R In R R R/N R/N Out R/N R R R R/N In R/N Out Real traffic is not uniform

  20. Out Out Out Out Out 100% throughput for weakly mixing, stochastic traffic. [C.-S. Chang, Valiant] Two-stage load-balancing switch R R R R/N R/N In Out R/N R/N R/N R/N R/N R/N R R R In R/N R/N R/N R/N R/N R/N R R R R/N R/N In R/N R/N Load-balancing stage Switching stage

  21. Out Out Out R R In 3 3 3 R/N R/N 1 R/N R/N R/N R/N R/N R/N R R In 2 R/N R/N R/N R/N R/N R/N R/N R R R/N In 3 R/N R/N

  22. Out Out Out R R In R/N R/N 1 R/N R/N 3 R/N R/N R/N R/N R R In 2 R/N R/N 3 R/N R/N R/N R/N R/N R R R/N In 3 R/N R/N 3

  23. Chang’s load-balanced switchGood properties • 100% throughput for broad class of traffic • No scheduler needed a Scalable

  24. FOFF: Load-balancing algorithm • Packet sequence maintained • No pathological patterns • 100% throughput - always • Delay within bound of ideal • (See paper for details) Chang’s load-balanced switchBad properties • Packet mis-sequencing • Pathological traffic patterns a Throughput 1/N-th of capacity • Uses two switch fabricsa Hard to package • Doesn’t work with some linecards missinga Impractical

  25. One linecard R R Out R R Out R R Out Single Mesh Switch 2R/N In 2R/N 2R/N 2R/N In 2R/N 2R/N 2R/N 2R/N In 2R/N

  26. 2R/N 2R/N Backplane Out R 2R/N 2R/N 2R/N 2R/N Out R 2R/N 2R/N R/N Out R Packaging R In R In R In

  27. C1, C2, …, CN C1 C2 C3 CN In In In In Out Out Out Out Many fabric options N channels each at rate 2R/N Any permutation network Options Space: Full uniform mesh Time: Round-robin crossbar Wavelength: Static WDM

  28. A, A, A, A A, B, C, D B, B, B, B A, B, C, D C, C, C, C A, B, C, D D, D, D, D A, B, C, D 4 WDM channels, each at rate 2R/N In In In In Out Out Out Out Static WDM switching Array Waveguide Router (AWGR) Passive andAlmost ZeroPower A B C D

  29. 2 2 2 2 2 2 l1 R l1, l2,.., lN WDM lN R l1 l1, l2,.., lN R R WDM 2 lN Out l1 R l1, l2,.., lN R 1 1 1 1 WDM lN Linecard dataflow In l1 l1, l2,.., lN R R WDM lN 1 3 1 1 1 1 2 3 4 1 1 1 1

  30. Problems of scale • For N < 64, WDM is a good solution. • We want N = 640. • Need to decompose.

  31. Decomposing the mesh 2R/8 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8

  32. WDM TDM Decomposing the mesh 1 2R/8 2R/8 1 2R/4 2R/8 2R/8 2 2 3 3 4 4 5 5 6 6 7 7 8 8

  33. 1 L 1 2 2 L When N is too largeDecompose into groups (or racks) Group/Rack 1 2R Array Waveguide Router (AWGR) l1, l2, …, lG 2R 1 2R Group/Rack G 2R l1, l2, …, lG 2R G 2R

  34. When a linecard is missing • Each linecard spreads its data equally over every other linecard. • Problem: If one is missing, or failed, then the spreading no longer works.

  35. R R 2R/3 + 2R/3 = 1.5R 2R/3 + 2R/6 + 2R/3 + 2R/6 = 2R 2R/3 + 2R/6 Out 2R/3 + 2R/6 R R Out R R 2R/3 + 2R/6 Out 2R/3 + 2R/6 When a linecard fails 2R/3 In 2R/3 2R/3 • Solution: • Move light beams • Replace AWGR with MEMS switch. • Reconfigure when linecard added, removed or fails. • Finer channel granularity • Multiple paths. 2R/3 In 2R/3 2R/3 2R/3 2R/3 In 2R/3

  36. 1 MEMS Switch G 1 MEMS Switch G 1 MEMS Switch G L 1 2 1 2 L SolutionUse transparent MEMS switches Group/Rack 1 MEMS switches reconfigured only when linecard added, removed or fails. 2R 2R 2R Group/RackG=40 2R 2R 2R Theorems: 1. Require L+G-1 MEMS switches 2. Polynomial time reconfiguration algorithm

  37. Low-cost, low-power optoelectronic conversion? l1 Pkt Switch How to build a 250ms 160Gb/s buffer? WDM lG l1 R R WDM lG Challenges In l1 Address Lookup l1, l2,.., lG R R WDM lG R l1, l2,.., lG l1, l2,.., lG 1 1 1 2 2 R=160Gb/s 3 4 Out l1 R l1, l2,.., lG R WDM lG

  38. Chip #2: 16 x 55 Opto-electronic crossbar 55 x 10Gb/s 55 x 10Gb/s Optical source 16 x 10Gb/s CMOS ASIC To Linecards To Optical Fabric What we are building 250ms DRAM 320Gb/s Chip #1: 160Gb/s Packet Buffer Buffer Manager 90nm ASIC 160Gb/s 160Gb/s Optical Detector Optical Modulator

  39. 40 x 40 MEMS Linecard Rack 1 Linecard Rack G = 40 Switch Rack < 100W L = 16 160Gb/s linecards L = 16 160Gb/s linecards 1 2 55 56 100Tb/s Load-Balanced Router L = 16 160Gb/s linecards

More Related