The Fork-Join Router
This presentation is the property of its rightful owner.
Sponsored Links
1 / 34

The Fork-Join Router PowerPoint PPT Presentation


  • 48 Views
  • Uploaded on
  • Presentation posted in: General

The Fork-Join Router. Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University [email protected] http://www.stanford.edu/~nickm. Outline. Quick Background on Packet Switches What’s the problem? “What if data rates exceed memory bandwidth?”

Download Presentation

The Fork-Join Router

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The Fork-Join Router

Nick McKeown

Assistant Professor of Electrical Engineering

and Computer Science, Stanford University

[email protected]

http://www.stanford.edu/~nickm


Outline

  • Quick Background on Packet Switches

  • What’s the problem?

    “What if data rates exceed memory bandwidth?”

  • The Fork-Join Router

  • Parallel Packet Switches


Buffer

Memory

CPU

CPU

DMA

DMA

DMA

Line

Interface

Line

Interface

Line

Interface

Memory

MAC

MAC

MAC

First Generation Packet Switches

Fixed length “DMA” blocks

or cells. Reassembled on egress

linecard

Shared Backplane

Line Interface

Fixed length cells or

variable length packets


DMA

DMA

DMA

Line

Card

Line

Card

Line

Card

Local

Buffer

Memory

Local

Buffer

Memory

Local

Buffer

Memory

MAC

MAC

MAC

Second Generation Packet Switches

Buffer

Memory

CPU


Third Generation Packet Switches

Switched Backplane

Line

Card

CPU

Card

Line

Card

Local

Buffer

Memory

Local

Buffer

Memory

Line Interface

CPU

Memory

MAC

MAC


Fourth Generation Packet Switches


1+1 = 2 operations per cell time

N+N = 2N operations per cell time

Shared Memory

Two Basic Techniques

Input-queued Crossbar


Shared MemoryThe Ideal

A

D

T

K

I

P

Z

Z

Z

Numerous work has proven and made possible:

  • Fairness

  • Delay Guarantees

  • Delay Variation Control

  • Loss Guarantees

  • Statistical Guarantees

A

A

A

A

A

A

A

A

A

Z

Z

Z

A

A

D

A

B

H

X

F

Z


= ?

Combined Input-Output Queued Switch

Scheduler

Precise Emulation of an Output Queued Switch

Output Queued Switch

1

N

N

N


Result

Theorem:

A speedup of 2-1/N is necessary and sufficient for a combined input- and output-queued switch to precisely emulate an output-queued switch for all traffic.

Joint work with Balaji Prabhakar at Stanford.


Outline

  • Quick Background on Packet Switches

  • What’s the problem?

    “What if data rates exceed memory bandwidth?”

  • The Fork-Join Router

  • Parallel Packet Switches


Buffer MemoryHow Fast Can I Make a Packet Buffer?

5ns SRAM

Buffer

Memory

64-byte wide bus

64-byte wide bus

Rough Estimate:

  • 5ns per memory operation.

  • Two memory operations per packet.

  • Therefore, maximum 51.2Gb/s.

  • In practice, closer to 40Gb/s.


Memory

Bandwidth

(to core)

time

Buffer MemoryIs It Going to Get Better?

Specmarks,

Memory size,

Gate density

time


Optical Physical Layers……are Going to Make Things “Worse”

DWDM:

  • More l’s per fiber a more “ports” per switch.

  • # ports: 16, …, 1000’s.

    Data rate:

  • More b/s per la higher capacity.

  • Data rates: 2.5Gb/s, 10Gb/s, 40Gb/s, 160Gb/s, …


Approach #1: Ping-pong Buffering

Buffer

Memory

64-byte wide bus

64-byte wide bus

Buffer

Memory


Approach #1: Ping-pong Buffering

Buffer

Memory

64-byte wide bus

64-byte wide bus

Buffer

Memory

Memory bandwidth doubled to ~80 Gb/s


Approach #2: Multiple Parallel Buffersaka Banking, Interleaving

Buffer

Memory

Buffer

Memory

Buffer

Memory

Buffer

Memory


Outline

  • Quick Background on Packet Switches

  • What’s the problem?

    “What if data rates exceed memory bandwidth?”

  • The Fork-Join Router

  • Parallel Packet Switches


The Fork-Join Router

Router

1

rate, R

rate, R

1

1

2

rate, R

rate, R

N

N

k

Bufferless


The Fork-Join Router

  • Advantages

    • kh a memory bandwidth i

    • kh a lookup/classification rate i

    • kh a routing/classification table size i

  • Problems

    • How to demultiplex prior to lookup/classification?

    • How does the system perform/behave?

    • Can we predict/guarantee performance?


Outline

  • Quick Background on Packet Switches

  • What’s the problem?

    “What if data rates exceed memory bandwidth?”

  • The Fork-Join Router

  • Parallel Packet Switches


A Parallel Packet Switch

1

Output

Queued

Switch

rate, R

rate, R

2

1

1

Output

Queued

Switch

rate, R

rate, R

N

N

k

Output

Queued

Switch


Parallel Packet SwitchQuestions

  • Can it be work-conserving?

  • Can it emulate a single big output queued switch?

  • Can it support delay guarantees, strict-priorities, WFQ, …?

  • What happens with multicast?


Parallel Packet SwitchWork Conservation

1

R/k

R/k

2

R/k

R/k

rate, R

rate, R

1

1

R/k

R/k

k

Output Link

Constraint

Input Link

Constraint


5

1

1

4

3

2

1

Parallel Packet SwitchWork Conservation

1

5

4

1

R/k

R/k

4

1

2

2

R/k

R/k

2

rate, R

rate, R

1

1

3

R/k

R/k

k

3

Output Link

Constraint


Parallel Packet SwitchWork Conservation

1

S(R/k)

Output

Queued

Switch

S(R/k)

rate, R

rate, R

S(R/k)

S(R/k)

2

1

1

Output

Queued

Switch

rate, R

rate, R

N

N

k

Output

Queued

Switch

S(R/k)

S(R/k)


= ?

Parallel Packet Switch

1

1

N

N

Precise Emulation of an Output Queued Switch

Output Queued Switch

1

N

N

N


Parallel Packet SwitchTheorems

  • If S > 2k/(k+2) @ 2 then a parallel packet switch can be work-conserving for all traffic.

  • If S > 2k/(k+2) @ 2 then a parallel packet switch can precisely emulate a FCFS output-queued switch for all traffic.


Parallel Packet SwitchTheorems

3. If S > 3k/(k+3) @ 3 then a parallel packet switch can be precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic.


Expansion factor required = 2-1/N

An asideUnbuffered Clos Circuit Switch


O1 O2 O3 Ox

b

I1 I2 I3 Ix

<= min(R,m) entries in each row

<= min(R,m) entries in each column

Clos Network

a

m {

}m

b

I1

O1

}m

m {

IX

OX

c

R middle

stage switches


Clos Network

O1 O2 O3 Ox

a

b

m {

}m

b

I1

O1

I1 I2 I3 Ix

}m

m {

IX

OX

c

R middle

stage switches

  • <= min(R,m) entries in each row

  • <= min(R,m) entries in each column

Define: UIL(Ii) = used links at switch Ii to connect to middle stages.

UOL(Oi) = used links at switch Oi to connect to middle stages.

If we wish to connect Ii to Oi:

When adding connection: |UIL(Ii)| <= m-1 and |UOL(Oi)| <= m-1

Worst-case: |UIL(Ii) U UOL(Oi)| = 2m -2

Therefore, if R >= 2m-2 there are always enough middle stages.


Expansion factor required = 2-1/N

An asideUnbuffered Clos Circuit Switch

Expansiona 2 - 4/(k+2)


Fork-Join Router ProjectWhat’s next?

  • Theory:

    • Extending results to distributed algorithms.

    • Extending results to multicast.

  • Implementation/Prototyping:

    • Under discussion...


  • Login