1 / 53

Packet-Mode Emulation of Output-Queued Switches

Packet-Mode Emulation of Output-Queued Switches. David Hay, CS, Technion Joint work with Hagit Attiya (CS) and Isaac Keslassy (EE). Outline. Cell-Mode Scheduling vs. Packet-Mode Scheduling Impossibility of an Exact Emulation Speedup-RQD Tradeoff Emulation with S 4 Emulation with S 2

jerold
Download Presentation

Packet-Mode Emulation of Output-Queued Switches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Packet-Mode Emulation of Output-Queued Switches David Hay, CS, Technion Joint work with Hagit Attiya (CS) and Isaac Keslassy (EE)

  2. Outline • Cell-Mode Scheduling vs. Packet-Mode Scheduling • Impossibility of an Exact Emulation • Speedup-RQD Tradeoff • Emulation with S4 • Emulation with S2 • Emulation of OQ switch w/ bounded buffer • Simulation Results

  3. CIOQ Switches

  4. Cell-Mode Scheduling

  5. Cell-Mode Scheduling

  6. Cell-Mode Scheduling

  7. Trend towards Packet-Mode • Cell-mode scheduling is getting too hard • Fragmentation and reassembly should work very fast, at the external rate • Extra header for each cell  loss of bandwidth • For optical switches such fragmentation and reassembly are prohibitive • Cell-mode schedulers are packet-oblivious • Degradation of the overall performance

  8. Packet-Mode Scheduling

  9. Packet-Mode Scheduling [Marsan et al., 2002][Ganjali et al., 2003][Turner, 2006] • No need for fragmentation and reassembly • Must ensure contiguous packet delivery over the fabric • While input i delivers a packet to output j, neither input i nor output j can handle other packets. Can packet-mode schedulers provide similar performance guarantees as cell-mode schedulers?

  10. Output Queuing Emulation • OQ switches are considered optimal with respect to queuing delay and throughput • But too hard to implement in practice… • Emulation: Same input traffic  same output traffic • How hard is it for cell-mode / packet-mode CIOQ switch to emulate OQ switch?

  11. Output Queuing Emulation • OQ switches are considered optimal with respect to queuing delay and throughput • But too hard to implement in practice… • Emulation: Same input traffic  same output traffic • How hard is it for cell-mode / packet-mode CIOQ switch to emulate OQ switch?

  12. Cell-Mode Emulation is Possible • Easy with speedup S=N • N scheduling decisions every time-slot: • In the 1st decision forward the cell of input 1 • In the 2nd decision forward the cell of input 2 • In the Nth decision forward the cell of input N

  13. Cell-Mode Emulation is Possible • Easy with speedup S=N • N scheduling decisions every time-slot: • In the 1st decision forward the cell of input 1 • In the 2nd decision forward the cell of input 2 • In the Nth decision forward the cell of input N

  14. Cell-Mode Emulation w/ S=2 [Chuang et al.,1999] • 1st Key Concept: Slackness of a cell (in the input side) L(C) = OC(C) - IT(C) • Slackness may decrease by at most 2 in every time-slot • A cell leaves the destination of C  OC-- • A cell arrives at the input and is queued before C  IT++ • Initial slackness can be made non-negative • When C arrive, Insert it in the OC(C)th place of its input buffer. Plan: Ensure that slackness always increases by 2 • Slackness is never negative • All cells are delivered on time Output Cushion: (“good guys”) How many cells are queued in the output-buffer of C’s destination, and should leave the OQ switch before C Input Thread: (“bad guys”) How many cells proceed C in its input-port buffer?

  15. Cell-Mode Emulation w/ S=2 [Chuang et al.,1999] • Stable Marriage (stable matching): Given two equal-size sets M,W and preference lists from every mM, wW. Find a matching in which there are no two pairs (m,w),(m’,w’) s.t. • m prefer w’ over w • w’ prefer m over m • Classical problem in CS • Stable marriage always exists • Many algorithms..

  16. Cell-Mode Emulation w/ S=2 [Chuang et al.,1999] • Critical Cell First (CCF) algorithm performs stable marriage at each decision: • M is the set of inputs, W is the set of outputs • i prefers o1 over o2 if there is a cell for o1 that is queued before all cells for o2 • o prefers i1 over i2 if there is a cell from i1 that should leave before all cells from i2

  17. Cell-Mode Emulation w/ S=2 [Chuang et al.,1999] • For each cell C from input-port i to output port j, and each scheduling decision: • C is forwarded (and we don’t care about it) • C’ was forwarded from i, and i preferred to forward it  IT-- • C’ was forwarded to j, and j preferred to receive it  OC++ • Two scheduling decisions every time-slots  Slackness always increases by 2

  18. Cell-Mode Emulation • Easy with speedup S=N • Possible with speedup S=2 (w/ CCF) • Lower bound: S≥2-1/N is required [Chuang et al.,1999] What is the speedup required for packet-mode emulation?

  19. Outline • Cell-Mode Scheduling vs. Packet-Mode Scheduling • Impossibility of an Exact Emulation • Speedup-RQD Tradeoff • Emulation with S4 • Emulation with S2 • Emulation of OQ switch w/ bounded buffer • Simulation Results

  20. Packet-Mode Emulation is Impossible • Regardless of speedup • Even with speedup S=N

  21. Packet-Mode Emulation is Impossible

  22. Packet-Mode Emulation is Impossible

  23. Packet-Mode Emulation is Impossible

  24. Packet-Mode Emulation is Impossible

  25. Packet-Mode Emulation is Impossible

  26. Outline • Cell-Mode Scheduling vs. Packet-Mode Scheduling • Impossibility of an Exact Emulation • Speedup-RQD Tradeoff • Emulation with S4 • Emulation with S2 • Emulation of OQ switch w/ bounded buffer • Simulation Results

  27. Emulation w/ Relative Queuing Delay • The CIOQ switch is allowed a bounded lag behind the shadow OQ switch • Exact same behavior as the optimal OQ switch, but with some extra delay • Called relative queuing delay Can we provide packet-mode OQ emulation with bounded RQD and small speedup?

  28. Our Results:Speedup-RQD tradeoff Speedup 2Lmax Lmax= maximum packet size (known value) Generalization of cell-mode scheduling with S=2: Taking each packet of size ≤ Lmax as one huge cell Lower bound on RQD (even with infinite speedup) 4 2 RQD Lower bound on the speedup (from cell-mode scheduling)

  29. Intuition for Emulation Algorithms Packet Mode CIOQ Cell Mode CIOQ w/ S=2 Packet Mode OQ

  30. PIFO Cell-Mode OQ Switch • FIFO = First-In First-Out

  31. PIFO Cell-Mode OQ Switch • FIFO = First-In First-Out • PIFO = Push-In First-Out

  32. FIFO Packet-Mode OQ Switch is a PIFO Cell-Mode Switch PIFO Cell-Mode OQ Switch • FIFO = First-In First-Out • PIFO = Push-In First-Out

  33. Underlying CCF Algorithm • Cell-Mode CIOQ w/ CCF (and speedup S=2) emulates any PIFO cell-mode OQ switch [Chuang et al.,1999] • But, CCF does not maintain contiguous packet forwarding over the fabric! Packet Mode CIOQ Cell Mode CIOQ w/ S=2 PIFO Cell-Mode OQ = Packet Mode OQ

  34. Intuition for Emulation Algorithms Packet Mode CIOQ • Two sub-steps: • Framing • Contiguous Decomposition Cell Mode CIOQ w/ S=2 Packet Mode OQ

  35. time Frame-Based Schedulers Works in pipelined frame-based manner Within each frame: • Builda demand matrix for this frame • Schedule the demand matrix of the previous frame

  36. ≤2T + + + + + + + ≤2T + + + + + + + ≤2T + + + + + + + ≤2T + + + ≤ ≤ ≤ ≤ 2T 2T 2T 2T Building the Demand Matrix • At each frame of size T, CCF forwards at most 2T cells from each input and to each output. Number of cells CCF sent from input 1 to output 1 in the last frame Problem: A packet may span several frames.

  37. Building the Demand Matrix • Count only packets whose last cell is forwarded by the CCF in the frame • Each row/column in the matrix is bounded by 2T+N(Lmax-1) • For each input-output pair only cells of one additional packet can be added. • Translates into RQD of 2T+(Lmax-2).

  38. Intuition for Emulation Algorithms Packet Mode CIOQ • Two sub-steps: • Framing • Contiguous Decomposition Cell Mode CIOQ w/ S=2 Packet Mode OQ

  39. Decomposing the Demand Matrix • Challenge: Decompose the matrix into permutations while maintaining contiguous packet delivery. • Each permutation dictates a scheduling decision. • First try: optimal Birkhoff von-Neumann decomposition results in 2T+N(Lmax-1) permutations.

  40. Contiguous Greedy Decomposition • To maintain contiguous packet delivery: • If (i,j) was matched in iteration t-1 and there are more (i,j) cells to schedule  keep for iteration t. • Find a greedy matching for the rest of the matrix. • Speedup: RQD: 2T+Lmax-1

  41. Our Results:Speedup-RQD tradeoff Speedup 2Lmax S=4+ (N(Lmax-1))/T RQD = 2T+Lmax-1 Next… 4 2 RQD

  42. Intuition for Emulation Algorithms Packet Mode CIOQ • Two sub-steps: • Framing • Contiguous Decomposition Cell Mode CIOQ w/ S=2 Packet Mode OQ

  43. Emulation w/ S2 - Framing • Keep a separate demand matrix for every possible packet size • Example: Possible packets sizes are 3,4,6 # of size 3 packets # of size 4 packets # of size 6 packets

  44. Mega Packets (of size 12) size 3 size 4 Emulation w/ S2 - Framing • Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax) • Leftover matrix for each size m size 6

  45. Mega Packets (of size k=12) size 3 size 4 Emulation w/ S2 - Framing • Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax) • Leftover matrix for each size m size 6

  46. Mega Packets (of size 12) size 3(leftovers) size 4 Emulation w/ S2 - Framing • Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax) • Leftover matrix for each size m size 6

  47. Mega Packets (of size 12) size 3(leftovers) size 4 (leftovers) Emulation w/ S2 - Framing • Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax) • Leftover matrix for each size m size 6

  48. Mega Packets (of size 12) size 3(leftovers) size 4 (leftovers) Emulation w/ S2 - Framing • Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax) • Leftover matrix for each size m size 6 (leftovers)

  49. < 12/6 < 12/4 < 12/3 Mega Packets (of size 12) size 3(leftovers) size 4 (leftovers) Emulation w/ S2 - Framing • Sum of each row/column is bounded • For mega packets matrix: ≤ (2T+N(Lmax-1))/k • For each leftover matrix of size m: ≤ N(k -1)/m size 6 (leftovers)

  50. Emulation w/ S2 - Decomposition • Optimally decompose (w/ Birkhoff von-Neumann) the mega-packets matrix and then the leftover matrices Hold each permutation k times for contiguous (mega)-packet delivery Bound on the mega-packets matrix

More Related