1 / 22

Analysis of a Packet Switch with Memories Running Slower than the Line Rate

Analysis of a Packet Switch with Memories Running Slower than the Line Rate. Sundar Iyer, Amr Awadallah, Nick McKeown (sundaes,aaa,nickm)@stanford.edu Departments of Electrical Engineering & Computer Science, Stanford University http://klamath.stanford.edu/pps. Problem Statement.

aimee
Download Presentation

Analysis of a Packet Switch with Memories Running Slower than the Line Rate

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of a Packet Switch with Memories Running Slowerthan the Line Rate Sundar Iyer, Amr Awadallah, Nick McKeown (sundaes,aaa,nickm)@stanford.edu Departments of Electrical Engineering & Computer Science, Stanford University http://klamath.stanford.edu/pps

  2. Problem Statement Motivation: To design an extremely high speed packet switch with memories running slower than the line rate. Stanford University

  3. 1 2 3 N=4 Architecture of a PPS Demultiplexor OQ Switch Multiplexor (R/k) (R/k) R R 1 1 Multiplexor Demultiplexor R R OQ Switch 2 2 Demultiplexor Multiplexor R R 3 OQ Switch Demultiplexor Multiplexor R R k=3 N=4 (R/k) (R/k) Stanford University

  4. Parallel Packet SwitchQuestions • Can it behave like a single big output queued switch? • Can it provide delay guarantees, strict-priorities, WFQ, …? Stanford University

  5. Parallel Packet SwitchResults • If S > 2k/(k+2) @ 2 then a PPS can precisely emulate a FIFO output queued switch for all traffic patterns. • If S > 3k/(k+3) @ 3 then a PPS can precisely emulate an OQ switch with WFQ or strict priorities for all traffic patterns. Stanford University

  6. Parallel Packet SwitchResults • If S > 2sqrt(N) then a PPS can precisely emulate a multicast FIFO OQ switch • If S > 2sqrt(2N) then a PPS can precisely emulate a multicast OQ switch with WFQ or strict priorities for all traffic patterns. Stanford University

  7. Questions • Can we have a completely distributed algorithm? • Can we reduce the speedup further? • “Two is too much” • Can we smoothen the load on all the middle stage switches? Stanford University

  8. Completely Distributed Algorithm • Local Available Output Link Set (LAOL) • Definition: • LAOL consists of the (k/s -1) “oldest” layers used by an input for that output. • We can prevent a layer from appearing in the LAOL till another k -k/s +1 cells have been sent to other layers for that output. • Result : • For any given output a layer is used only after k -k/s +1 cells to that output are sent . Stanford University

  9. Conflict Free Ordering Parallel Packet Switch sR/k Demultiplexor . . . 5 .1 1 1 R 1 sR/k . . 4 2. 2 2 Demultiplexor Demultiplexor sR/k R R . . . 6 5 3 3 2 2 Demultiplexor sR/k . . .4 4 3 1 R 3 sR/k . . .6 5 4 2 Demultiplexor R N=4 sR/k . . . 7. 3 .6 5 Stanford University

  10. Re-Sequencing • A cell might be delayed by as much as N/S time slots. • Cells might leave in a wrong order. • A buffer of size Nk/S will be needed to re-sequence cells to prevent out of order transmissions. Stanford University

  11. A Practical Distributed Algorithm • If S > 2k/(k+2) @ 2 then a PPS with a completely distributed algorithm can precisely emulate a FIFO output queued switch for all traffic patterns. The PPS will have a fixed latency of Nk/S time slots. A re-sequencing buffer of size Nk/S is needed. Stanford University

  12. PPS with no Speedup • Speedup = 1 • LAOL is round robin • |LAOL| = 1 • D(i,l): Number of cells sent by demultiplexor i to layer l Stanford University

  13. Buffer Degree sR/k a • Degree of Buffer () sR/k  b Demultiplexor sR/k a c e b R c sR/k c d sR/k d sR/k Stanford University

  14. Buffered AIL Set (BAIL) • Buffered Available Input Link Set (BAIL) • “Set of layers which have less than  cells in the buffer (including transmission) for layer l” • It is the set of layers which can start sending the arriving cell between time n and n + k” • Till now we have only considered a PPS with =0 Stanford University

  15. Claim • BAIL is never empty • The buffer never overflows for some  • LAOL is always satisfied Stanford University

  16. Buffer Occupancy Sequence i-1 i =0 1 2  • The last of the  i cells left at least by time t-k+1. •  I >=(t-k+1– ti)/k >= (t- ti)/k - 1 • D(i,l) =  I +  c … t t-k+1 t1 t2 ti-1 ti Stanford University

  17. Buffer Occupancy Sequence.. i-1 i =0 1 2  c … t t-k+1 t1 t2 ti-1 ti  = N gives a contradiction. Stanford University

  18. Observations • Each cell reaches the middle stage switch with a variable input delay, Di = 1..N. • If all cells are delayed at the input of the middle stage switches by “N - Di” then they all reach the outputs of the middle stage in order. Stanford University

  19. Symmetry Argument • Demultiplexors • Cells arrive at rate R • Each cell has a property: output • Cells to same output are written in a round robin manner • Cells leave at link rate R • The buffer is used to prevent temporary load on the same middle stage switch • Max Delay = N Stanford University

  20. Symmetry Argument … • Multiplexors • Cells need to be read in at rate R • Each cell has a property: input • Cells from same input are read in a round robin manner • Cells leave at a rate k(R/k) = R • The buffer is used to re-order cells and send them in a correct order. • Max Delay = N Stanford University

  21. Buffered PPSResults • A PPS with a completely distributed algorithm and no speedup with a buffer degree N, can precisely emulate a FIFO output queued switch for all traffic patterns within a delay bound of 2N time slots. Stanford University

  22. Conclusions • Implementation • Timestamps • Sequence Numbers • Open questions • Making QoS practical. • Making multicasting practical. Stanford University

More Related