- 94 Views
- Uploaded on
- Presentation posted in: General

Scaling Internet Routers Using Optics

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Scaling Internet Routers Using Optics

Isaac Keslassy, Shang-Tse Chuang, Kyoungsik Yu, David Miller, Mark Horowitz, Olav Solgaard, Nick McKeown

Department of Electrical Engineering

Stanford University

1Tb/s

100Gb/s

10Gb/s

Router capacity per rack

2x every 18 months

1Gb/s

1Tb/s

100Gb/s

Traffic

2x every year

10Gb/s

Router capacity per rack

2x every 18 months

1Gb/s

100Tb/s

2015:

16x disparity

Traffic

2x every year

Router capacity

2x every 18 months

1Tb/s

- Unless something changes, operators will need:
- 16 times as many routers, consuming
- 16 times as much space,
- 256 times the power,
- Costing 100 times as much.

- Actually need more than that…

Optical

Switch

Electronic

Linecard #1

Electronic

Linecard #625

160-320Gb/s

160-320Gb/s

40Gb/s

- Line termination
- IP packet processing
- Packet buffering

- Line termination
- IP packet processing
- Packet buffering

40Gb/s

160Gb/s

40Gb/s

100Tb/s = 640 * 160Gb/s

40Gb/s

Goal: Study scalability

- Challenging, but not impossible
- Two orders of magnitude faster than deployed routers
- We will build components to show feasibility

- Operators increasingly demand throughput guarantees:
- To maximize use of expensive long-haul links
- For predictability and planning

- Despite lots of effort and theory, no commercial router today has a throughput guarantee.

- 100Tb/s capacity
- 100% throughput for all traffic
- Must work with any set of linecards present
- Use technology available within 3 years
- Conform to RFC 1812

Approximate power consumption per rack

Power density is the limiting factor today

Crossbar

Linecards

Switch

Linecards

Juniper TX8/T640

Alcatel 7670 RSP

TX8

Avici TSR

Chiaro

- Overall power is dominated by linecards
- Sheer number
- Optical WAN components
- Per packet processing and buffering.

- But power density is dominated by switch fabric

- Limit today ~2.5Tb/s
- Electronics
- Scheduler scales <2x every 18 months
- Opto-electronic conversion

Switch

Linecards

Switch fabric

Linecard

In

WAN

Out

In

WAN

Out

- Instead, can we use an optical fabric at 100Tb/s with 100% throughput?
- Conventional answer: No.
- Need to reconfigure switch too often
- 100% throughput requires complex electronic scheduler.

- How to guarantee 100% throughput?
- How to eliminate the scheduler?
- How to use an optical switch fabric?
- How to make it scalable and practical?

R

R

?

R

R

?

Out

?

R

R

?

R

R

R

R

?

R

R

R

?

R

Out

?

R

R

R

R

?

?

R

Out

Switch capacity = N2R

Router capacity = NR

In

In

In

R

R/N

R/N

Out

R/N

R/N

R

R

R

R/N

R/N

Out

R/N

R

R/N

R/N

Out

R

In

R

In

R

In

R

R

R

R

?

R/N

In

R

R/N

Out

R/N

R/N

R

R

R

R

R

In

R

R

R/N

R/N

Out

R/N

R

R

R

R/N

In

R/N

Out

Out

Out

Out

Out

Out

100% throughput for weakly mixing, stochastic traffic.

[C.-S. Chang, Valiant]

R

R

R

R/N

R/N

In

Out

R/N

R/N

R/N

R/N

R/N

R/N

R

R

R

In

R/N

R/N

R/N

R/N

R/N

R/N

R

R

R

R/N

R/N

In

R/N

R/N

Load-balancing stage

Switching stage

Out

Out

Out

R

R

In

3

3

3

R/N

R/N

1

R/N

R/N

R/N

R/N

R/N

R/N

R

R

In

2

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R

R

R/N

In

3

R/N

R/N

Out

Out

Out

R

R

In

R/N

R/N

1

R/N

R/N

3

R/N

R/N

R/N

R/N

R

R

In

2

R/N

R/N

3

R/N

R/N

R/N

R/N

R/N

R

R

R/N

In

3

R/N

R/N

3

- 100% throughput for broad class of traffic
- No scheduler needed a Scalable

- FOFF: Load-balancing algorithm
- Packet sequence maintained
- No pathological patterns
- 100% throughput - always
- Delay within bound of ideal
- (See paper for details)

- Packet mis-sequencing
- Pathological traffic patterns a Throughput 1/N-th of capacity
- Uses two switch fabricsa Hard to package
- Doesn’t work with some linecards missinga Impractical

One linecard

R

R

Out

R

R

Out

R

R

Out

2R/N

In

2R/N

2R/N

2R/N

In

2R/N

2R/N

2R/N

2R/N

In

2R/N

2R/N

2R/N

Backplane

Out

R

2R/N

2R/N

2R/N

2R/N

Out

R

2R/N

2R/N

R/N

Out

R

R

In

R

In

R

In

C1, C2, …, CN

C1

C2

C3

CN

In

In

In

In

Out

Out

Out

Out

N channels each at rate 2R/N

Any permutation

network

Options

Space: Full uniform mesh

Time: Round-robin crossbar

Wavelength: Static WDM

A, A, A, A

A, B, C, D

B, B, B, B

A, B, C, D

C, C, C, C

A, B, C, D

D, D, D, D

A, B, C, D

4 WDM channels,

each at rate 2R/N

In

In

In

In

Out

Out

Out

Out

Array

Waveguide

Router

(AWGR)

Passive andAlmost ZeroPower

A

B

C

D

2

2

2

2

2

2

l1

R

l1, l2,.., lN

WDM

lN

R

l1

l1, l2,.., lN

R

R

WDM

2

lN

Out

l1

R

l1, l2,.., lN

R

1

1

1

1

WDM

lN

In

l1

l1, l2,.., lN

R

R

WDM

lN

1

3

1

1

1

1

2

3

4

1

1

1

1

- For N < 64, WDM is a good solution.
- We want N = 640.
- Need to decompose.

2R/8

1

1

2

2

3

3

4

4

5

5

6

6

7

7

8

8

WDM

TDM

1

2R/8

2R/8

1

2R/4

2R/8

2R/8

2

2

3

3

4

4

5

5

6

6

7

7

8

8

1

L

1

2

2

L

Group/Rack 1

2R

Array

Waveguide

Router

(AWGR)

l1, l2, …, lG

2R

1

2R

Group/Rack G

2R

l1, l2, …, lG

2R

G

2R

- Each linecard spreads its data equally over every other linecard.
- Problem: If one is missing, or failed, then the spreading no longer works.

R

R

2R/3 + 2R/3 = 1.5R

2R/3 + 2R/6 + 2R/3 + 2R/6 = 2R

2R/3 + 2R/6

Out

2R/3 + 2R/6

R

R

Out

R

R

2R/3 + 2R/6

Out

2R/3 + 2R/6

2R/3

In

2R/3

2R/3

- Solution:
- Move light beams
- Replace AWGR with MEMS switch.
- Reconfigure when linecard added, removed or fails.

- Finer channel granularity
- Multiple paths.

2R/3

In

2R/3

2R/3

2R/3

2R/3

In

2R/3

1

MEMS

Switch

G

1

MEMS

Switch

G

1

MEMS

Switch

G

L

1

2

1

2

L

Group/Rack 1

MEMS switches reconfigured only when linecard added, removed or fails.

2R

2R

2R

Group/RackG=40

2R

2R

2R

Theorems:

1. Require L+G-1 MEMS switches

2. Polynomial time reconfiguration algorithm

Low-cost, low-power optoelectronic conversion?

l1

Pkt

Switch

How to build a 250ms

160Gb/s buffer?

WDM

lG

l1

R

R

WDM

lG

In

l1

Address

Lookup

l1, l2,.., lG

R

R

WDM

lG

R

l1, l2,.., lG

l1, l2,.., lG

1

1

1

2

2

R=160Gb/s

3

4

Out

l1

R

l1, l2,.., lG

R

WDM

lG

Chip #2: 16 x 55

Opto-electronic crossbar

55 x 10Gb/s

55 x 10Gb/s

Optical source

16 x 10Gb/s

CMOS ASIC

To Linecards

To Optical Fabric

250ms DRAM

320Gb/s

Chip #1: 160Gb/s Packet Buffer

Buffer Manager

90nm ASIC

160Gb/s

160Gb/s

Optical Detector

Optical Modulator

40 x 40

MEMS

Linecard Rack 1

Linecard Rack G = 40

Switch Rack < 100W

L = 16

160Gb/s

linecards

L = 16

160Gb/s

linecards

1

2

55

56

L = 16

160Gb/s

linecards