The Fork-Join Router
This presentation is the property of its rightful owner.
Sponsored Links
1 / 34

The Fork-Join Router PowerPoint PPT Presentation


  • 45 Views
  • Uploaded on
  • Presentation posted in: General

The Fork-Join Router. Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University [email protected] http://www.stanford.edu/~nickm. Outline. Quick Background on Packet Switches What’s the problem? “What if data rates exceed memory bandwidth?”

Download Presentation

The Fork-Join Router

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The fork join router

The Fork-Join Router

Nick McKeown

Assistant Professor of Electrical Engineering

and Computer Science, Stanford University

[email protected]

http://www.stanford.edu/~nickm


Outline

Outline

  • Quick Background on Packet Switches

  • What’s the problem?

    “What if data rates exceed memory bandwidth?”

  • The Fork-Join Router

  • Parallel Packet Switches


The fork join router

Buffer

Memory

CPU

CPU

DMA

DMA

DMA

Line

Interface

Line

Interface

Line

Interface

Memory

MAC

MAC

MAC

First Generation Packet Switches

Fixed length “DMA” blocks

or cells. Reassembled on egress

linecard

Shared Backplane

Line Interface

Fixed length cells or

variable length packets


The fork join router

DMA

DMA

DMA

Line

Card

Line

Card

Line

Card

Local

Buffer

Memory

Local

Buffer

Memory

Local

Buffer

Memory

MAC

MAC

MAC

Second Generation Packet Switches

Buffer

Memory

CPU


The fork join router

Third Generation Packet Switches

Switched Backplane

Line

Card

CPU

Card

Line

Card

Local

Buffer

Memory

Local

Buffer

Memory

Line Interface

CPU

Memory

MAC

MAC


The fork join router

Fourth Generation Packet Switches


Two basic techniques

1+1 = 2 operations per cell time

N+N = 2N operations per cell time

Shared Memory

Two Basic Techniques

Input-queued Crossbar


Shared memory the ideal

Shared MemoryThe Ideal

A

D

T

K

I

P

Z

Z

Z

Numerous work has proven and made possible:

  • Fairness

  • Delay Guarantees

  • Delay Variation Control

  • Loss Guarantees

  • Statistical Guarantees

A

A

A

A

A

A

A

A

A

Z

Z

Z

A

A

D

A

B

H

X

F

Z


Precise emulation of an output queued switch

= ?

Combined Input-Output Queued Switch

Scheduler

Precise Emulation of an Output Queued Switch

Output Queued Switch

1

N

N

N


Result

Result

Theorem:

A speedup of 2-1/N is necessary and sufficient for a combined input- and output-queued switch to precisely emulate an output-queued switch for all traffic.

Joint work with Balaji Prabhakar at Stanford.


Outline1

Outline

  • Quick Background on Packet Switches

  • What’s the problem?

    “What if data rates exceed memory bandwidth?”

  • The Fork-Join Router

  • Parallel Packet Switches


Buffer memory how fast can i make a packet buffer

Buffer MemoryHow Fast Can I Make a Packet Buffer?

5ns SRAM

Buffer

Memory

64-byte wide bus

64-byte wide bus

Rough Estimate:

  • 5ns per memory operation.

  • Two memory operations per packet.

  • Therefore, maximum 51.2Gb/s.

  • In practice, closer to 40Gb/s.


Buffer memory is it going to get better

Memory

Bandwidth

(to core)

time

Buffer MemoryIs It Going to Get Better?

Specmarks,

Memory size,

Gate density

time


Optical physical layers are going to make things worse

Optical Physical Layers……are Going to Make Things “Worse”

DWDM:

  • More l’s per fiber a more “ports” per switch.

  • # ports: 16, …, 1000’s.

    Data rate:

  • More b/s per la higher capacity.

  • Data rates: 2.5Gb/s, 10Gb/s, 40Gb/s, 160Gb/s, …


Approach 1 ping pong buffering

Approach #1: Ping-pong Buffering

Buffer

Memory

64-byte wide bus

64-byte wide bus

Buffer

Memory


Approach 1 ping pong buffering1

Approach #1: Ping-pong Buffering

Buffer

Memory

64-byte wide bus

64-byte wide bus

Buffer

Memory

Memory bandwidth doubled to ~80 Gb/s


Approach 2 multiple parallel buffers aka banking interleaving

Approach #2: Multiple Parallel Buffersaka Banking, Interleaving

Buffer

Memory

Buffer

Memory

Buffer

Memory

Buffer

Memory


Outline2

Outline

  • Quick Background on Packet Switches

  • What’s the problem?

    “What if data rates exceed memory bandwidth?”

  • The Fork-Join Router

  • Parallel Packet Switches


The fork join router

The Fork-Join Router

Router

1

rate, R

rate, R

1

1

2

rate, R

rate, R

N

N

k

Bufferless


The fork join router1

The Fork-Join Router

  • Advantages

    • kh a memory bandwidth i

    • kh a lookup/classification rate i

    • kh a routing/classification table size i

  • Problems

    • How to demultiplex prior to lookup/classification?

    • How does the system perform/behave?

    • Can we predict/guarantee performance?


Outline3

Outline

  • Quick Background on Packet Switches

  • What’s the problem?

    “What if data rates exceed memory bandwidth?”

  • The Fork-Join Router

  • Parallel Packet Switches


A parallel packet switch

A Parallel Packet Switch

1

Output

Queued

Switch

rate, R

rate, R

2

1

1

Output

Queued

Switch

rate, R

rate, R

N

N

k

Output

Queued

Switch


Parallel packet switch questions

Parallel Packet SwitchQuestions

  • Can it be work-conserving?

  • Can it emulate a single big output queued switch?

  • Can it support delay guarantees, strict-priorities, WFQ, …?

  • What happens with multicast?


Parallel packet switch work conservation

Parallel Packet SwitchWork Conservation

1

R/k

R/k

2

R/k

R/k

rate, R

rate, R

1

1

R/k

R/k

k

Output Link

Constraint

Input Link

Constraint


Parallel packet switch work conservation1

5

1

1

4

3

2

1

Parallel Packet SwitchWork Conservation

1

5

4

1

R/k

R/k

4

1

2

2

R/k

R/k

2

rate, R

rate, R

1

1

3

R/k

R/k

k

3

Output Link

Constraint


Parallel packet switch work conservation2

Parallel Packet SwitchWork Conservation

1

S(R/k)

Output

Queued

Switch

S(R/k)

rate, R

rate, R

S(R/k)

S(R/k)

2

1

1

Output

Queued

Switch

rate, R

rate, R

N

N

k

Output

Queued

Switch

S(R/k)

S(R/k)


Precise emulation of an output queued switch1

= ?

Parallel Packet Switch

1

1

N

N

Precise Emulation of an Output Queued Switch

Output Queued Switch

1

N

N

N


Parallel packet switch theorems

Parallel Packet SwitchTheorems

  • If S > 2k/(k+2) @ 2 then a parallel packet switch can be work-conserving for all traffic.

  • If S > 2k/(k+2) @ 2 then a parallel packet switch can precisely emulate a FCFS output-queued switch for all traffic.


Parallel packet switch theorems1

Parallel Packet SwitchTheorems

3. If S > 3k/(k+3) @ 3 then a parallel packet switch can be precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic.


An aside unbuffered clos circuit switch

Expansion factor required = 2-1/N

An asideUnbuffered Clos Circuit Switch


Clos network

O1 O2 O3 Ox

b

I1 I2 I3 Ix

<= min(R,m) entries in each row

<= min(R,m) entries in each column

Clos Network

a

m {

}m

b

I1

O1

}m

m {

IX

OX

c

R middle

stage switches


Clos network1

Clos Network

O1 O2 O3 Ox

a

b

m {

}m

b

I1

O1

I1 I2 I3 Ix

}m

m {

IX

OX

c

R middle

stage switches

  • <= min(R,m) entries in each row

  • <= min(R,m) entries in each column

Define: UIL(Ii) = used links at switch Ii to connect to middle stages.

UOL(Oi) = used links at switch Oi to connect to middle stages.

If we wish to connect Ii to Oi:

When adding connection: |UIL(Ii)| <= m-1 and |UOL(Oi)| <= m-1

Worst-case: |UIL(Ii) U UOL(Oi)| = 2m -2

Therefore, if R >= 2m-2 there are always enough middle stages.


An aside unbuffered clos circuit switch1

Expansion factor required = 2-1/N

An asideUnbuffered Clos Circuit Switch

Expansiona 2 - 4/(k+2)


Fork join router project what s next

Fork-Join Router ProjectWhat’s next?

  • Theory:

    • Extending results to distributed algorithms.

    • Extending results to multicast.

  • Implementation/Prototyping:

    • Under discussion...


  • Login