Analysis of a Packet Switch with Memories Running Slower than the Line Rate

Analysis of a Packet Switch with Memories Running Slowerthan the Line Rate Sundar Iyer, Amr Awadallah, Nick McKeown (sundaes,aaa,nickm)@stanford.edu Departments of Electrical Engineering & Computer Science, Stanford University http://klamath.stanford.edu/pps

Problem Statement Motivation: To design an extremely high speed packet switch with memories running slower than the line rate. Stanford University

1 2 3 N=4 Architecture of a PPS Demultiplexor OQ Switch Multiplexor (R/k) (R/k) R R 1 1 Multiplexor Demultiplexor R R OQ Switch 2 2 Demultiplexor Multiplexor R R 3 OQ Switch Demultiplexor Multiplexor R R k=3 N=4 (R/k) (R/k) Stanford University

Parallel Packet SwitchQuestions • Can it behave like a single big output queued switch? • Can it provide delay guarantees, strict-priorities, WFQ, …? Stanford University

Parallel Packet SwitchResults • If S > 2k/(k+2) @ 2 then a PPS can precisely emulate a FIFO output queued switch for all traffic patterns. • If S > 3k/(k+3) @ 3 then a PPS can precisely emulate an OQ switch with WFQ or strict priorities for all traffic patterns. Stanford University

Parallel Packet SwitchResults • If S > 2sqrt(N) then a PPS can precisely emulate a multicast FIFO OQ switch • If S > 2sqrt(2N) then a PPS can precisely emulate a multicast OQ switch with WFQ or strict priorities for all traffic patterns. Stanford University

Questions • Can we have a completely distributed algorithm? • Can we reduce the speedup further? • “Two is too much” • Can we smoothen the load on all the middle stage switches? Stanford University

Completely Distributed Algorithm • Local Available Output Link Set (LAOL) • Definition: • LAOL consists of the (k/s -1) “oldest” layers used by an input for that output. • We can prevent a layer from appearing in the LAOL till another k -k/s +1 cells have been sent to other layers for that output. • Result : • For any given output a layer is used only after k -k/s +1 cells to that output are sent . Stanford University

Conflict Free Ordering Parallel Packet Switch sR/k Demultiplexor . . . 5 .1 1 1 R 1 sR/k . . 4 2. 2 2 Demultiplexor Demultiplexor sR/k R R . . . 6 5 3 3 2 2 Demultiplexor sR/k . . .4 4 3 1 R 3 sR/k . . .6 5 4 2 Demultiplexor R N=4 sR/k . . . 7. 3 .6 5 Stanford University

Re-Sequencing • A cell might be delayed by as much as N/S time slots. • Cells might leave in a wrong order. • A buffer of size Nk/S will be needed to re-sequence cells to prevent out of order transmissions. Stanford University

A Practical Distributed Algorithm • If S > 2k/(k+2) @ 2 then a PPS with a completely distributed algorithm can precisely emulate a FIFO output queued switch for all traffic patterns. The PPS will have a fixed latency of Nk/S time slots. A re-sequencing buffer of size Nk/S is needed. Stanford University

PPS with no Speedup • Speedup = 1 • LAOL is round robin • |LAOL| = 1 • D(i,l): Number of cells sent by demultiplexor i to layer l Stanford University

Buffer Degree sR/k a • Degree of Buffer () sR/k  b Demultiplexor sR/k a c e b R c sR/k c d sR/k d sR/k Stanford University

Buffered AIL Set (BAIL) • Buffered Available Input Link Set (BAIL) • “Set of layers which have less than  cells in the buffer (including transmission) for layer l” • It is the set of layers which can start sending the arriving cell between time n and n + k” • Till now we have only considered a PPS with =0 Stanford University

Claim • BAIL is never empty • The buffer never overflows for some  • LAOL is always satisfied Stanford University

Buffer Occupancy Sequence i-1 i =0 1 2  • The last of the  i cells left at least by time t-k+1. •  I >=(t-k+1– ti)/k >= (t- ti)/k - 1 • D(i,l) =  I +  c … t t-k+1 t1 t2 ti-1 ti Stanford University

Buffer Occupancy Sequence.. i-1 i =0 1 2  c … t t-k+1 t1 t2 ti-1 ti  = N gives a contradiction. Stanford University

Observations • Each cell reaches the middle stage switch with a variable input delay, Di = 1..N. • If all cells are delayed at the input of the middle stage switches by “N - Di” then they all reach the outputs of the middle stage in order. Stanford University

Symmetry Argument • Demultiplexors • Cells arrive at rate R • Each cell has a property: output • Cells to same output are written in a round robin manner • Cells leave at link rate R • The buffer is used to prevent temporary load on the same middle stage switch • Max Delay = N Stanford University

Symmetry Argument … • Multiplexors • Cells need to be read in at rate R • Each cell has a property: input • Cells from same input are read in a round robin manner • Cells leave at a rate k(R/k) = R • The buffer is used to re-order cells and send them in a correct order. • Max Delay = N Stanford University

Buffered PPSResults • A PPS with a completely distributed algorithm and no speedup with a buffer degree N, can precisely emulate a FIFO output queued switch for all traffic patterns within a delay bound of 2N time slots. Stanford University

Conclusions • Implementation • Timestamps • Sequence Numbers • Open questions • Making QoS practical. • Making multicasting practical. Stanford University

Analysis of a Packet Switch with Memories Running Slower than the Line Rate

Analysis of a Packet Switch with Memories Running Slower than the Line Rate

Presentation Transcript

Line-rate OpenFlow Switch

EE384x: Packet Switch Architectures

Do males learn slower than females?

Running the Line

Boys learn slower than Girls -what's the evidence?

Packet analysis

EE384x: Packet Switch Architectures I

“Running Line” mode

048866: Packet Switch Architectures

048866: Packet Switch Architectures

EE384x: Packet Switch Architectures

048866: Packet Switch Architectures

EE384x: Packet Switch Architectures

Packet Switch Network

The Parallel Packet Switch

The Parallel Packet Switch

Running Line

IP routers with memory that runs slower than the line rate

RUNNING ANALYSIS

Earn a Better Interest Rate than the Bank

Overview of Circuit and Packet Switch

Packet Switch Architectures