Download
devavrat shah stanford university n.
Skip this Video
Loading SlideShow in 5 Seconds..
Devavrat Shah Stanford University PowerPoint Presentation
Download Presentation
Devavrat Shah Stanford University

Devavrat Shah Stanford University

88 Views Download Presentation
Download Presentation

Devavrat Shah Stanford University

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Randomization and Heavy Traffic Theory: New approaches to the design and analysis of switch algorithms Devavrat Shah Stanford University

  2. Network algorithms • Algorithms implemented in networks, e.g. in • switches/routers scheduling algorithms routing lookup packet classification • memory/buffer managers maintaining statistics active queue management bandwidth partitioning • load balancers randomized load balancing • web caches eviction schemes placement of caches in a network

  3. Network algorithms: challenges • Time constraint: Need to make complicated decisions very quickly • e.g. line speeds in the Internet core 10Gbps (soon to be 40Gbps) • packets arrive roughly every 50ns (roughly 10ns) • Limited computational resources • due to rigid space and heat dissipation constraints • Algorithms need to be very simple so as to be implementable • but simple algorithms may perform poorly, if not well-designed

  4. An illustration • Consider the following simple system Capacity =1 • Algorithms • serve a randomly chosen queue: very easy • longest queue first (LQF): complicated to implement • Performance • serve at random: does not give maximum throughput • LQF: max throughput and minimizes the maximum backlog

  5. Input queued switch • Multiple queues and servers

  6. Input queued switch Not allowed Crossbar fabric 1 2 3 1 2 3 • Crossbar constraints • each input can connect to at most one output • each output can connect to at most one input

  7. Switch scheduling Crossbar fabric 1 2 3 1 2 3 • Crossbar constraints • each input can connect to at most one output • each output can connect to at most one input

  8. Switch scheduling Crossbar fabric 1 2 3 1 2 3 • Crossbar constraints • each input can connect to at most one output • each output can connect to at most one input

  9. Switch scheduling Crossbar fabric 1 2 3 1 2 3 • Crossbar constraints • each input can connect to at most one output • each output can connect to at most one input

  10. Notation and definitions 1 2 3 1 2 3

  11. Useful performance metric • Throughput • an algorithm is stable (or delivers 100% throughput)if for any admissible arrival, the average backlog is bounded; i.e. • Average delay or average backlog (queue-size)

  12. Schedule ´ Bipartite graph matching Schedule or Matching 19 3 4 21 1 18 7

  13. Scheduling algorithms 19 3 4 21 1 18 7 • iSLIP: Cisco’s GSR 12000 Series • Nick McKeown • Wave Front Arbiter • Tamir and Chi • Parallel Iterative Matching • Anderson, Owicki, Saxe, Thacker Practical Maximal Matchings  Not stable

  14. Scheduling algorithms 19 19 18 1 7 Max Wt Matching Max Size Matching 19 3 4 21 1 18 7 Practical Maximal Matchings  Stable (Tassiulas-Ephremides 92, McKeown et. al. 96, Dai-Prabhakar 00)  Not stable  Not stable (McKeown-Ananthram-Walrand 96)

  15. The Maximum Weight Matching Algorithm • MWM: performance • throughput: stable(Tassiulas-Ephremides 92; McKeown et al 96; Dai-Prabhakar 00) • backlogs: very low on average(Leonardi et al 01; Shah-Kopikare 02) • MWM: implementation • has cubic worst-case complexity (approx. 27,000 iterations for a 30-port switch) • MWM algorithms involve backtracking: i.e. edges laid down in one iteration may be removed in a subsequent iteration • algorithm not amenable to pipelining

  16. Switch algorithms 19 19 18 1 7 Better performance Max Wt Matching Maximal matching Max Size Matching Easier to implement Not stable Not stable Stable and low backlogs

  17. Summarizing… • Need algorithms that are simple and perform well • can process packets economically at line speeds • deliver a high throughput • and low latencies/backlogs • Need to develop methods for analyzing the performance of these algorithms • traditional methods aren’t good enough

  18. Rest of the talk • A simple, high performance scheduling algorithm • Randomized approximation of Max Wt Matching (joint work with P. Giaccone and B. Prabhakar) • A method for analyzing backlogs based on heavy traffic theory (joint work with D. Wischik) • A new approach to switch scheduling • Randomized edge coloring (joint work with G. Aggrawal, R. Motwani and A. Zhu)

  19. Algorithm IRandomized approximationto the Max Wt Matching algorithm

  20. Randomized approximation to MWM • Consider the following randomized approximation: • At every time • - sample d matchings independently and uniformly • - use the heaviest of these d matchings to schedule packets • Ideally we would like to use a small value of d. However,… Theorem. Even with d = N, this algorithm is not stable. In fact, when d=N, the throughput · 1 – e-1¼ 0.63. (Giaccone-Prabhakar-Shah 02)

  21. Tassiulas’ algorithm Previous schedule S(t-1) Random Matching R(t) Next time MAX Current schedule S(t)

  22. Tassiulas’ algorithm MAX 10 50 10 40 30 10 70 10 60 20 S(t-1) R(t) W(R(t))=150 W(S(t-1))=160 S(t)

  23. Performance of Tassiulas’ algorithm Theorem (Tassiulas 98): The above scheme is stable under any admissible Bernoulli IID inputs.

  24. Backlogs under Tassiulas’ algorithm

  25. Reducing backlogs: the Merge operation Merge 30v/s120 130 v/s30 10 50 40 10 30 10 70 10 60 20 S(t-1) R(t) W(R(t))=150 W(S(t-1))=160

  26. Reducing backlogs: the Merge operation Merge 10 50 40 10 30 10 70 10 60 20 S(t-1) R(t) W(R(t))=150 W(S(t-1))=160 W(S(t)) = 250

  27. Performance of Merge algorithm Theorem (GPS): The Merge scheme is stable under any admissible Bernoulli IID inputs.

  28. Merge v/s Max

  29. Use arrival information: Serena 23 7 47 89 3 11 31 2 97 5 S(t-1) The arrival graph W(S(t-1))=209

  30. Use arrival information: Serena 23 47 89 3 11 31 2 97 5 S(t-1) The arrival graph W(S(t-1))=209

  31. Use arrival information: Serena 23 Merge 0 23 89 3 31 97 S(t) W(S(t))=243 23 47 89 3 11 31 97 6 S(t-1) W=121 W(S(t-1))=209

  32. Performance of Serena algorithm Theorem (GPS): The Serena algorithm is stable under any admissible Bernoulli IID inputs.

  33. Backlogs under Serena

  34. Summary of main ideas and results • Randomization alone is not enough • Memory (using past sample) is very powerful • Exploiting problem structure: the Merge operation • Using arrival information • We obtained an algorithm • whose performance is close to that of MWM • and is very simple to implement

  35. A new method for analyzing backlogsvia Heavy Traffic Theory

  36. Some context… • We have seen the design of a simple, high throughput switch scheduling algorithm • The backlog/delay induced by an algorithm is another very important measure of its performance • For example, recall Tassiulas’ scheme

  37. Delay or backlog analysis • Delay analysis is harder than throughput analysis because • throughput measures the long-term efficiency of an algorithm • whereas, the delay (or backlog) induced by an algorithm is a measure of its instantaneous efficiency • It is also harder to design low delay schedulers

  38. Current delay analysis methods are not suitable • Queuing theory • requires knowledge of service time distribution • but this depends on the scheduling algorithm • service distribution is too complicated to determine • Large deviations: very successful for output-queued switch • N-port OQ switch can be “decomposed” into N independent 1-port switches • but an N-port input-queued switch is not decomposable • Stochastic coupling arguments • effective for establishing: Delay(Alg A) < Delay(Alg B) (without saying how much better) • very hard to make for input-queued switches

  39. Our approach a • Use the machinery of heavy traffic theory (developed by J. M. Harrison, M. Bramson, R. Williams and many others) • Focus on the following intriguing observation • consider the MWM-a algorithm: edge(i,j) has weight Qija • Keslassy-McKeown 01 have observed that the average delay increases with a

  40. Why this is worth focusing on • The MWM class of algorithms are the most analyzed • huge body of literature on throughput analysis • hardly anything known about their delay properties • in particular, no explanation of the Keslassy-McKeown observation • Note the singularity at a=0 in the K-M observation • “continuity” implies that delay should be the smallest when a=0 • however, MWM-0 is the Maximum Size Matching algorithm • MSM is known to be unstable; i.e. delays are infinite when a=0 ! (McKeown-Ananthram-Walrand 96) • Question: if MSM is not the optimum, who is? • Heavy traffic theory helps answer these questions

  41. Heavy traffic theory • Consider a network with N queues • as some parameter r ! 1 • let the load on the system approach its capacity (hence the name HT) • speed up time by a factor r2, and scale down space by a factor r; i.e. quantity of interest: Qi(r2t)/r = Qir(t), say • as r !1, we’re looking at the zoomed out system both in time and space • this type of scaling occurs in the Central Limit Theorem time Original system 1/r r2 Scaled system

  42. Heavy traffic theory • Consider a network with N queues • as some parameter r ! 1 • let the load on the system approach its capacity (hence the name HT) • speed up time by a factor r2, and scale down space by a factor r; i.e. quantity of interest: Qi(r2t)/r = Qir(t), say • as r !1, we’re looking at the zoomed out system both in time and space • this type of scaling occurs in the Central Limit Theorem • Main result: As r !1 • (Q1r(t),…,QNr(t)) ! (Q11(t),…,QN1(t)) = Q1(t), which is a multidimensional Reflected Brownian Motion (RBM) (such limiting results are independent of details of distribution) w

  43. State space collapse: dimension reduction • Q1(t) can roam in lower dimensional space, say of dim K < N

  44. State space collapse: dimension reduction • Q1(t) can roam in lower dimensional space, say of dim K < N • The HT limit procedure for LQF system • load: lr=l1r+L+lNr = 1-r-1 ! 1 • state: Qr(t)=(Q1r(t),L,QNr(t)) ! Q1(t) = (Q11(t),L, QN1(t)) • State space collapse • because of the LQF policy, Q11(t) = L = QN1(t) • or, the dimension of the state space is just 1, not N Q1r(t) l1r lNr Capacity =1 QNr(t)

  45. State space collapse: performance • More importantly, the state space induced by an IQ switch scheduling algorithm gives an indication of its performance; i.e. State space(Alg A) ½ State space(Alg B) ) Alg B is better than Alg A • So , to resolve the Keslassy-McKeown observation, we need to • determine the state space of MWM-a, and • show • State space(MWM-a1) ½ State space(MWM-a2) if a1 > a2

  46. Switch: state space collapse 17 4 ? • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 19 17 3 4 MWM=28 6

  47. Switch: state space collapse 19 17 3 4 MWM=28 6 19 17 4 7 ? • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 7

  48. Switch: state space collapse 19 17 3 4 MWM=28 6 4 • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 7 5

  49. Switch: state space collapse 19 17 3 4 MWM=28 6 ? • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 7 5

  50. Switch: state space collapse 19 17 3 4 MWM=28 6 18 • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 7 5