Scaling internet routers using optics
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Scaling Internet Routers Using Optics PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on
  • Presentation posted in: General

Scaling Internet Routers Using Optics. Isaac Keslassy, Shang-Tse Chuang, Kyoungsik Yu, David Miller, Mark Horowitz, Olav Solgaard, Nick McKeown Department of Electrical Engineering Stanford University. Backbone router capacity. 1Tb/s. 100Gb/s. 10Gb/s. Router capacity per rack

Download Presentation

Scaling Internet Routers Using Optics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Scaling Internet Routers Using Optics

Isaac Keslassy, Shang-Tse Chuang, Kyoungsik Yu, David Miller, Mark Horowitz, Olav Solgaard, Nick McKeown

Department of Electrical Engineering

Stanford University


Backbone router capacity

1Tb/s

100Gb/s

10Gb/s

Router capacity per rack

2x every 18 months

1Gb/s


Backbone router capacity

1Tb/s

100Gb/s

Traffic

2x every year

10Gb/s

Router capacity per rack

2x every 18 months

1Gb/s


Extrapolating

100Tb/s

2015:

16x disparity

Traffic

2x every year

Router capacity

2x every 18 months

1Tb/s


Consequence

  • Unless something changes, operators will need:

    • 16 times as many routers, consuming

    • 16 times as much space,

    • 256 times the power,

    • Costing 100 times as much.

  • Actually need more than that…


Optical

Switch

Electronic

Linecard #1

Electronic

Linecard #625

160-320Gb/s

160-320Gb/s

40Gb/s

  • Line termination

  • IP packet processing

  • Packet buffering

  • Line termination

  • IP packet processing

  • Packet buffering

40Gb/s

160Gb/s

40Gb/s

100Tb/s = 640 * 160Gb/s

40Gb/s

Stanford 100Tb/s Internet Router

Goal: Study scalability

  • Challenging, but not impossible

  • Two orders of magnitude faster than deployed routers

  • We will build components to show feasibility


Throughput Guarantees

  • Operators increasingly demand throughput guarantees:

    • To maximize use of expensive long-haul links

    • For predictability and planning

  • Despite lots of effort and theory, no commercial router today has a throughput guarantee.


Requirements of our router

  • 100Tb/s capacity

  • 100% throughput for all traffic

  • Must work with any set of linecards present

  • Use technology available within 3 years

  • Conform to RFC 1812


What limits router capacity?

Approximate power consumption per rack

Power density is the limiting factor today


Crossbar

Linecards

Switch

Linecards

Trend: Multi-rack routersReduces power density


Juniper TX8/T640

Alcatel 7670 RSP

TX8

Avici TSR

Chiaro


Limits to scaling

  • Overall power is dominated by linecards

    • Sheer number

    • Optical WAN components

    • Per packet processing and buffering.

  • But power density is dominated by switch fabric


  • Limit today ~2.5Tb/s

    • Electronics

    • Scheduler scales <2x every 18 months

    • Opto-electronic conversion

Switch

Linecards

Trend: Multi-rack routersReduces power density


Multi-rack routers

Switch fabric

Linecard

In

WAN

Out

In

WAN

Out


Question

  • Instead, can we use an optical fabric at 100Tb/s with 100% throughput?

  • Conventional answer: No.

    • Need to reconfigure switch too often

    • 100% throughput requires complex electronic scheduler.


Outline

  • How to guarantee 100% throughput?

  • How to eliminate the scheduler?

  • How to use an optical switch fabric?

  • How to make it scalable and practical?


R

R

?

R

R

?

Out

?

R

R

?

R

R

R

R

?

R

R

R

?

R

Out

?

R

R

R

R

?

?

R

Out

Switch capacity = N2R

Router capacity = NR

100% Throughput

In

In

In


R

R/N

R/N

Out

R/N

R/N

R

R

R

R/N

R/N

Out

R/N

R

R/N

R/N

Out

If traffic is uniform

R

In

R

In

R

In


R

R

R

R

?

R/N

In

R

R/N

Out

R/N

R/N

R

R

R

R

R

In

R

R

R/N

R/N

Out

R/N

R

R

R

R/N

In

R/N

Out

Real traffic is not uniform


Out

Out

Out

Out

Out

100% throughput for weakly mixing, stochastic traffic.

[C.-S. Chang, Valiant]

Two-stage load-balancing switch

R

R

R

R/N

R/N

In

Out

R/N

R/N

R/N

R/N

R/N

R/N

R

R

R

In

R/N

R/N

R/N

R/N

R/N

R/N

R

R

R

R/N

R/N

In

R/N

R/N

Load-balancing stage

Switching stage


Out

Out

Out

R

R

In

3

3

3

R/N

R/N

1

R/N

R/N

R/N

R/N

R/N

R/N

R

R

In

2

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R

R

R/N

In

3

R/N

R/N


Out

Out

Out

R

R

In

R/N

R/N

1

R/N

R/N

3

R/N

R/N

R/N

R/N

R

R

In

2

R/N

R/N

3

R/N

R/N

R/N

R/N

R/N

R

R

R/N

In

3

R/N

R/N

3


Chang’s load-balanced switchGood properties

  • 100% throughput for broad class of traffic

  • No scheduler needed a Scalable


  • FOFF: Load-balancing algorithm

    • Packet sequence maintained

    • No pathological patterns

    • 100% throughput - always

    • Delay within bound of ideal

    • (See paper for details)

Chang’s load-balanced switchBad properties

  • Packet mis-sequencing

  • Pathological traffic patterns a Throughput 1/N-th of capacity

  • Uses two switch fabricsa Hard to package

  • Doesn’t work with some linecards missinga Impractical


One linecard

R

R

Out

R

R

Out

R

R

Out

Single Mesh Switch

2R/N

In

2R/N

2R/N

2R/N

In

2R/N

2R/N

2R/N

2R/N

In

2R/N


2R/N

2R/N

Backplane

Out

R

2R/N

2R/N

2R/N

2R/N

Out

R

2R/N

2R/N

R/N

Out

R

Packaging

R

In

R

In

R

In


C1, C2, …, CN

C1

C2

C3

CN

In

In

In

In

Out

Out

Out

Out

Many fabric options

N channels each at rate 2R/N

Any permutation

network

Options

Space: Full uniform mesh

Time: Round-robin crossbar

Wavelength: Static WDM


A, A, A, A

A, B, C, D

B, B, B, B

A, B, C, D

C, C, C, C

A, B, C, D

D, D, D, D

A, B, C, D

4 WDM channels,

each at rate 2R/N

In

In

In

In

Out

Out

Out

Out

Static WDM switching

Array

Waveguide

Router

(AWGR)

Passive andAlmost ZeroPower

A

B

C

D


2

2

2

2

2

2

l1

R

l1, l2,.., lN

WDM

lN

R

l1

l1, l2,.., lN

R

R

WDM

2

lN

Out

l1

R

l1, l2,.., lN

R

1

1

1

1

WDM

lN

Linecard dataflow

In

l1

l1, l2,.., lN

R

R

WDM

lN

1

3

1

1

1

1

2

3

4

1

1

1

1


Problems of scale

  • For N < 64, WDM is a good solution.

  • We want N = 640.

  • Need to decompose.


Decomposing the mesh

2R/8

1

1

2

2

3

3

4

4

5

5

6

6

7

7

8

8


WDM

TDM

Decomposing the mesh

1

2R/8

2R/8

1

2R/4

2R/8

2R/8

2

2

3

3

4

4

5

5

6

6

7

7

8

8


1

L

1

2

2

L

When N is too largeDecompose into groups (or racks)

Group/Rack 1

2R

Array

Waveguide

Router

(AWGR)

l1, l2, …, lG

2R

1

2R

Group/Rack G

2R

l1, l2, …, lG

2R

G

2R


When a linecard is missing

  • Each linecard spreads its data equally over every other linecard.

  • Problem: If one is missing, or failed, then the spreading no longer works.


R

R

2R/3 + 2R/3 = 1.5R

2R/3 + 2R/6 + 2R/3 + 2R/6 = 2R

2R/3 + 2R/6

Out

2R/3 + 2R/6

R

R

Out

R

R

2R/3 + 2R/6

Out

2R/3 + 2R/6

When a linecard fails

2R/3

In

2R/3

2R/3

  • Solution:

  • Move light beams

    • Replace AWGR with MEMS switch.

    • Reconfigure when linecard added, removed or fails.

  • Finer channel granularity

    • Multiple paths.

2R/3

In

2R/3

2R/3

2R/3

2R/3

In

2R/3


1

MEMS

Switch

G

1

MEMS

Switch

G

1

MEMS

Switch

G

L

1

2

1

2

L

SolutionUse transparent MEMS switches

Group/Rack 1

MEMS switches reconfigured only when linecard added, removed or fails.

2R

2R

2R

Group/RackG=40

2R

2R

2R

Theorems:

1. Require L+G-1 MEMS switches

2. Polynomial time reconfiguration algorithm


Low-cost, low-power optoelectronic conversion?

l1

Pkt

Switch

How to build a 250ms

160Gb/s buffer?

WDM

lG

l1

R

R

WDM

lG

Challenges

In

l1

Address

Lookup

l1, l2,.., lG

R

R

WDM

lG

R

l1, l2,.., lG

l1, l2,.., lG

1

1

1

2

2

R=160Gb/s

3

4

Out

l1

R

l1, l2,.., lG

R

WDM

lG


Chip #2: 16 x 55

Opto-electronic crossbar

55 x 10Gb/s

55 x 10Gb/s

Optical source

16 x 10Gb/s

CMOS ASIC

To Linecards

To Optical Fabric

What we are building

250ms DRAM

320Gb/s

Chip #1: 160Gb/s Packet Buffer

Buffer Manager

90nm ASIC

160Gb/s

160Gb/s

Optical Detector

Optical Modulator


40 x 40

MEMS

Linecard Rack 1

Linecard Rack G = 40

Switch Rack < 100W

L = 16

160Gb/s

linecards

L = 16

160Gb/s

linecards

1

2

55

56

100Tb/s Load-Balanced Router

L = 16

160Gb/s

linecards


  • Login