1 / 74

740 likes | 832 Views

Randomization and Heavy Traffic Theory: New approaches to the design and analysis of switch algorithms. Devavrat Shah Stanford University. Network algorithms. Algorithms implemented in networks, e.g. in switches/routers scheduling algorithms routing lookup

Download Presentation
## Devavrat Shah Stanford University

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Randomization and Heavy Traffic Theory:**New approaches to the design and analysis of switch algorithms Devavrat Shah Stanford University**Network algorithms**• Algorithms implemented in networks, e.g. in • switches/routers scheduling algorithms routing lookup packet classification • memory/buffer managers maintaining statistics active queue management bandwidth partitioning • load balancers randomized load balancing • web caches eviction schemes placement of caches in a network**Network algorithms: challenges**• Time constraint: Need to make complicated decisions very quickly • e.g. line speeds in the Internet core 10Gbps (soon to be 40Gbps) • packets arrive roughly every 50ns (roughly 10ns) • Limited computational resources • due to rigid space and heat dissipation constraints • Algorithms need to be very simple so as to be implementable • but simple algorithms may perform poorly, if not well-designed**An illustration**• Consider the following simple system Capacity =1 • Algorithms • serve a randomly chosen queue: very easy • longest queue first (LQF): complicated to implement • Performance • serve at random: does not give maximum throughput • LQF: max throughput and minimizes the maximum backlog**Input queued switch**• Multiple queues and servers**Input queued switch**Not allowed Crossbar fabric 1 2 3 1 2 3 • Crossbar constraints • each input can connect to at most one output • each output can connect to at most one input**Switch scheduling**Crossbar fabric 1 2 3 1 2 3 • Crossbar constraints • each input can connect to at most one output • each output can connect to at most one input**Switch scheduling**Crossbar fabric 1 2 3 1 2 3 • Crossbar constraints • each input can connect to at most one output • each output can connect to at most one input**Switch scheduling**Crossbar fabric 1 2 3 1 2 3 • Crossbar constraints • each input can connect to at most one output • each output can connect to at most one input**Notation and definitions**1 2 3 1 2 3**Useful performance metric**• Throughput • an algorithm is stable (or delivers 100% throughput)if for any admissible arrival, the average backlog is bounded; i.e. • Average delay or average backlog (queue-size)**Schedule ´ Bipartite graph matching**Schedule or Matching 19 3 4 21 1 18 7**Scheduling algorithms**19 3 4 21 1 18 7 • iSLIP: Cisco’s GSR 12000 Series • Nick McKeown • Wave Front Arbiter • Tamir and Chi • Parallel Iterative Matching • Anderson, Owicki, Saxe, Thacker Practical Maximal Matchings Not stable**Scheduling algorithms**19 19 18 1 7 Max Wt Matching Max Size Matching 19 3 4 21 1 18 7 Practical Maximal Matchings Stable (Tassiulas-Ephremides 92, McKeown et. al. 96, Dai-Prabhakar 00) Not stable Not stable (McKeown-Ananthram-Walrand 96)**The Maximum Weight Matching Algorithm**• MWM: performance • throughput: stable(Tassiulas-Ephremides 92; McKeown et al 96; Dai-Prabhakar 00) • backlogs: very low on average(Leonardi et al 01; Shah-Kopikare 02) • MWM: implementation • has cubic worst-case complexity (approx. 27,000 iterations for a 30-port switch) • MWM algorithms involve backtracking: i.e. edges laid down in one iteration may be removed in a subsequent iteration • algorithm not amenable to pipelining**Switch algorithms**19 19 18 1 7 Better performance Max Wt Matching Maximal matching Max Size Matching Easier to implement Not stable Not stable Stable and low backlogs**Summarizing…**• Need algorithms that are simple and perform well • can process packets economically at line speeds • deliver a high throughput • and low latencies/backlogs • Need to develop methods for analyzing the performance of these algorithms • traditional methods aren’t good enough**Rest of the talk**• A simple, high performance scheduling algorithm • Randomized approximation of Max Wt Matching (joint work with P. Giaccone and B. Prabhakar) • A method for analyzing backlogs based on heavy traffic theory (joint work with D. Wischik) • A new approach to switch scheduling • Randomized edge coloring (joint work with G. Aggrawal, R. Motwani and A. Zhu)**Algorithm IRandomized approximationto the Max Wt Matching**algorithm**Randomized approximation to MWM**• Consider the following randomized approximation: • At every time • - sample d matchings independently and uniformly • - use the heaviest of these d matchings to schedule packets • Ideally we would like to use a small value of d. However,… Theorem. Even with d = N, this algorithm is not stable. In fact, when d=N, the throughput · 1 – e-1¼ 0.63. (Giaccone-Prabhakar-Shah 02)**Tassiulas’ algorithm**Previous schedule S(t-1) Random Matching R(t) Next time MAX Current schedule S(t)**Tassiulas’ algorithm**MAX 10 50 10 40 30 10 70 10 60 20 S(t-1) R(t) W(R(t))=150 W(S(t-1))=160 S(t)**Performance of Tassiulas’ algorithm**Theorem (Tassiulas 98): The above scheme is stable under any admissible Bernoulli IID inputs.**Reducing backlogs: the Merge operation**Merge 30v/s120 130 v/s30 10 50 40 10 30 10 70 10 60 20 S(t-1) R(t) W(R(t))=150 W(S(t-1))=160**Reducing backlogs: the Merge operation**Merge 10 50 40 10 30 10 70 10 60 20 S(t-1) R(t) W(R(t))=150 W(S(t-1))=160 W(S(t)) = 250**Performance of Merge algorithm**Theorem (GPS): The Merge scheme is stable under any admissible Bernoulli IID inputs.**Use arrival information: Serena**23 7 47 89 3 11 31 2 97 5 S(t-1) The arrival graph W(S(t-1))=209**Use arrival information: Serena**23 47 89 3 11 31 2 97 5 S(t-1) The arrival graph W(S(t-1))=209**Use arrival information: Serena**23 Merge 0 23 89 3 31 97 S(t) W(S(t))=243 23 47 89 3 11 31 97 6 S(t-1) W=121 W(S(t-1))=209**Performance of Serena algorithm**Theorem (GPS): The Serena algorithm is stable under any admissible Bernoulli IID inputs.**Summary of main ideas and results**• Randomization alone is not enough • Memory (using past sample) is very powerful • Exploiting problem structure: the Merge operation • Using arrival information • We obtained an algorithm • whose performance is close to that of MWM • and is very simple to implement**Some context…**• We have seen the design of a simple, high throughput switch scheduling algorithm • The backlog/delay induced by an algorithm is another very important measure of its performance • For example, recall Tassiulas’ scheme**Delay or backlog analysis**• Delay analysis is harder than throughput analysis because • throughput measures the long-term efficiency of an algorithm • whereas, the delay (or backlog) induced by an algorithm is a measure of its instantaneous efficiency • It is also harder to design low delay schedulers**Current delay analysis methods are not suitable**• Queuing theory • requires knowledge of service time distribution • but this depends on the scheduling algorithm • service distribution is too complicated to determine • Large deviations: very successful for output-queued switch • N-port OQ switch can be “decomposed” into N independent 1-port switches • but an N-port input-queued switch is not decomposable • Stochastic coupling arguments • effective for establishing: Delay(Alg A) < Delay(Alg B) (without saying how much better) • very hard to make for input-queued switches**Our approach**a • Use the machinery of heavy traffic theory (developed by J. M. Harrison, M. Bramson, R. Williams and many others) • Focus on the following intriguing observation • consider the MWM-a algorithm: edge(i,j) has weight Qija • Keslassy-McKeown 01 have observed that the average delay increases with a**Why this is worth focusing on**• The MWM class of algorithms are the most analyzed • huge body of literature on throughput analysis • hardly anything known about their delay properties • in particular, no explanation of the Keslassy-McKeown observation • Note the singularity at a=0 in the K-M observation • “continuity” implies that delay should be the smallest when a=0 • however, MWM-0 is the Maximum Size Matching algorithm • MSM is known to be unstable; i.e. delays are infinite when a=0 ! (McKeown-Ananthram-Walrand 96) • Question: if MSM is not the optimum, who is? • Heavy traffic theory helps answer these questions**Heavy traffic theory**• Consider a network with N queues • as some parameter r ! 1 • let the load on the system approach its capacity (hence the name HT) • speed up time by a factor r2, and scale down space by a factor r; i.e. quantity of interest: Qi(r2t)/r = Qir(t), say • as r !1, we’re looking at the zoomed out system both in time and space • this type of scaling occurs in the Central Limit Theorem time Original system 1/r r2 Scaled system**Heavy traffic theory**• Consider a network with N queues • as some parameter r ! 1 • let the load on the system approach its capacity (hence the name HT) • speed up time by a factor r2, and scale down space by a factor r; i.e. quantity of interest: Qi(r2t)/r = Qir(t), say • as r !1, we’re looking at the zoomed out system both in time and space • this type of scaling occurs in the Central Limit Theorem • Main result: As r !1 • (Q1r(t),…,QNr(t)) ! (Q11(t),…,QN1(t)) = Q1(t), which is a multidimensional Reflected Brownian Motion (RBM) (such limiting results are independent of details of distribution) w**State space collapse: dimension reduction**• Q1(t) can roam in lower dimensional space, say of dim K < N**State space collapse: dimension reduction**• Q1(t) can roam in lower dimensional space, say of dim K < N • The HT limit procedure for LQF system • load: lr=l1r+L+lNr = 1-r-1 ! 1 • state: Qr(t)=(Q1r(t),L,QNr(t)) ! Q1(t) = (Q11(t),L, QN1(t)) • State space collapse • because of the LQF policy, Q11(t) = L = QN1(t) • or, the dimension of the state space is just 1, not N Q1r(t) l1r lNr Capacity =1 QNr(t)**State space collapse: performance**• More importantly, the state space induced by an IQ switch scheduling algorithm gives an indication of its performance; i.e. State space(Alg A) ½ State space(Alg B) ) Alg B is better than Alg A • So , to resolve the Keslassy-McKeown observation, we need to • determine the state space of MWM-a, and • show • State space(MWM-a1) ½ State space(MWM-a2) if a1 > a2**Switch: state space collapse**17 4 ? • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 19 17 3 4 MWM=28 6**Switch: state space collapse**19 17 3 4 MWM=28 6 19 17 4 7 ? • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 7**Switch: state space collapse**19 17 3 4 MWM=28 6 4 • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 7 5**Switch: state space collapse**19 17 3 4 MWM=28 6 ? • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 7 5**Switch: state space collapse**19 17 3 4 MWM=28 6 18 • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 7 5

More Related