Loading in 2 Seconds...

Handout # 7: Input-queued Switches – Head of Line Blocking, Scheduling

Loading in 2 Seconds...

- 64 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Handout # 7: Input-queued Switches – Head of Line Blocking, Scheduling' - giulia

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Handout # 7: Input-queued Switches – Head of Line Blocking, Scheduling

CSC 2203 – Packet Switch and Network Architectures

Professor Yashar Ganjali

Department of Computer Science

University of Toronto

yganjali@cs.toronto.edu

http://www.cs.toronto.edu/~yganjali

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.:

Announcements

- Reading for next week: [6] and [7]
- Final project proposal
- Due: 5PM Friday October 12th.
- Try to be as specific as you can; and
- Start as soon as you can.
- Presentations
- Each team presents 1 paper
- Preferably related to their own project
- Talk to me before choosing a paper
- Each presentation 25 mins (including Q&A).
- Volunteers for next week?

University of Toronto – Fall 2012

Where We Are

- We have studied output-queued and shared-memory switches
- Why they provide an ideal performance (work-conserving)
- Why they can hardly be implemented (speed-up)
- We have also studied techniques for
- Output link scheduling
- Fairness
- Parallelism

University of Toronto – Fall 2012

Next

- We will now study input-queued switches
- Why they solve the speed-up problem
- A first problem: head-of-line blocking reduces throughput
- Solution: virtual output queues
- A second problem: arbitration between virtual output queues
- Solution: scheduling algorithms

University of Toronto – Fall 2012

Outline – Part I

- Head-of-Line Blocking
- HoL Blocking in Small Switches
- 58% Throughput

University of Toronto – Fall 2012

Input-Queued Switch: How It Works

The switch matches inputs and outputs…

Packets are queued at the inputs.

University of Toronto – Fall 2012

Input-Queued Switch: How It Works

University of Toronto – Fall 2012

Input-Queued Switch: Speed-Up Advantage

At most one packet leaves from each input (arrives to each output) speed-up=1, not N

University of Toronto – Fall 2012

Head-of-Line Blocking

Blocked!

Blocked!

Blocked!

The switch is NOT work-conserving!

University of Toronto – Fall 2012

Glimpse: Virtual Output Queues

University of Toronto – Fall 2012

Question: Do More Lanes Help?

- Answer: It depends on the scheduling.

Head of Line Blocking

University of Toronto – Fall 2012

Question: Do More Lanes Help?

- Answer: It depends on the scheduling.

VOQs with Bad Scheduling

University of Toronto – Fall 2012

Question: Do More Lanes Help?

- Answer: It depends on the scheduling.

Good Scheduling? Depends on traffic matrix…

University of Toronto – Fall 2012

Outline – Part I

- Head-of-Line Blocking
- HoL Blocking in Small Switches
- 58% Throughput

University of Toronto – Fall 2012

Assumptions

- As in analysis of OQ switch:
- Time is slotted
- At each time-slot, at each of the N inputs: Bernoulli IID packet arrivals with probability
- Each packet is destined for one of the N outputs uniformly at random
- By symmetry, consider some given output
- Scheduling: at each time-slot the output picks an HoL u.a.r.

Problem. What throughput can we get?

University of Toronto – Fall 2012

HoL Blocking in 2x2 Switch

University of Toronto – Fall 2012

HoL Blocking in 2x2 Switch

University of Toronto – Fall 2012

HoL Blocking in 2x2 Switch

University of Toronto – Fall 2012

Balls-and-Bins Model

University of Toronto – Fall 2012

Balls-and-Bins Model

University of Toronto – Fall 2012

Balls-and-Bins Model

University of Toronto – Fall 2012

Balls-and-Bins Model

- Saturated switch
- Assume infinite number of packets in each queue
- They are all destined to some output u.a.r. (random coloring of packets)
- Balls-and-bins model
- N outputs N bins
- N HoL packets N balls
- At each time-slot
- Remove one ball from each non-empty bin
- Assign free balls to bins independently and u.a.r.

University of Toronto – Fall 2012

Markov Chain

- There are three states for the bin occupancy: (2,0), (1,1), (0,2)
- E.g., (2,0) means both HoL packets are destined to first output
- We get a Markov chain:

(2,0)

(1,1)

(0,2)

University of Toronto – Fall 2012

Transition Probabilities in Markov Chain

- Transition from (2,0)

1/2

1/2

University of Toronto – Fall 2012

Transition Probabilities in Markov Chain

- Equilibrium state distribution: ={¼, ½, ¼}
- Output throughput = 1-P(output empty) = 75%

1/2

1/2

1/2

1/2

1/4

(2,0)

(1,1)

(0,2)

1/2

1/4

University of Toronto – Fall 2012

1/2

1/2

(2,0)

(1,1)

1/2

Side Note: State Collapse- Symmetric Markov chain
- State collapse: (2,0) and (1,1)
- Equilibrium (collapsed) state distribution: (1/2,1/2) get real state distribution

University of Toronto – Fall 2012

2/3

2/9

2/3

2/9

(3,0,0)

(2,1,0)

(1,1,1)

2/3

1/9

1/9

3x3 Switch- Markov chain with following states:(3,0,0),(0,3,0),(0,0,3),(2,1,0),(2,0,1),(1,2,0),(0,2,1),(0,1,2),(1,0,2)(1,1,1)
- State collapse into: (3,0,0),(2,1,0) and (1,1,1)

University of Toronto – Fall 2012

3x3 Switch

- Equilibrium state distribution
- Per-output throughput
- 75% for 2x2, 68% for 3x3… but state space explosion for large N

University of Toronto – Fall 2012

Outline – Part I

- Head-of-Line Blocking
- HoL Blocking in Small Switches
- 58% Throughput

University of Toronto – Fall 2012

Method #2: Recurrence Equations

- Consider a given bin (output)
- Let Xt be the number of balls in this bin
- Number of HoL packets for this output
- Let At be the number of arrivals to this bin
- Let Bt be the number of departures from all bins
- The recurrence equation is:

University of Toronto – Fall 2012

Method #2: Recurrence Equations

- The only queues with new HoL packets are those from which HoL packets left at the last time-slot
- At+1 is the sum of Bt Bernoulli I.I.D. variables:

University of Toronto – Fall 2012

Method #2: Recurrence Equations

- Steady-state: E[B] is N times the per-output throughput
- As N , binomial goes to Poisson and
- (N) x (1/N) (approximation)

University of Toronto – Fall 2012

Method #2: Recurrence Equations

- Same equations lead to same results (cf OQ switch)
- When switch is saturated, there are N balls for N bins: EX=1
- Hence

University of Toronto – Fall 2012

Where We Are

- We introduced Input-Queued switches.
- We saw that HoL blocking reduces throughput.
- We use VOQs to solve HoL blocking problem.

University of Toronto – Fall 2012

Next

- Scheduling in Input-Queued Switches
- Uniform Traffic
- Maximum Size Matching (MSM)
- Maximum Weighted Matching (MWM)
- Maximal matching with speedup
- Heuristic algorithms (PIMs, iSLIP, …)

University of Toronto – Fall 2012

Outline

- Uniform traffic
- Uniform cyclic
- Random permutation
- Wait-until-full
- Non-uniform traffic, known traffic matrix
- Birkhoff-von-Neumann
- Unknown traffic matrix
- Maximum Size Matching
- Maximum Weight Matching

University of Toronto – Fall 2012

Basic Switch Model

S(n)

Q11(n)

A11(n)

D11(n)

1

1

A1(n)

A1N(n)

D1N(n)

AN1(n)

DN1(n)

AN(n)

N

N

ANN(n)

DNN(n)

QNN(n)

University of Toronto – Fall 2012

Notations: Arrivals

- Aij(n): packet arrivals at input i for output j at time-slot n
- Aij(n) = 0 or 1
- ij=E[Aij(n)]: arrival rate
- =[ij]: traffic matrix
- A=[Aij(n)] admissible iff:
- For all i, j ij < 1: no input is oversubscribed
- For all j, iij < 1: no output is oversubscribed

University of Toronto – Fall 2012

Notations: Schedule

- Qij(n): queue size of VOQ (i,j)
- Q=[Qij(n)]
- Sij(n): whether the schedule connects input i to output j
- Sij(n) = 0 or 1
- No speedup: each input is connected to at most one output, each output to at most one input
- We will assume that each input is connected to exactly one output, and each output to exactly one input S=[Sij(n)] permutation matrix

University of Toronto – Fall 2012

Scheduling Algorithm

- What it does: determine S(n)
- How:
- Either using traffic matrix ,
- Or, in most cases, using queue sizes Q(n) (because unknown)
- Objective: 100% throughput
- So that lines are fully utilized
- Secondary objective: minimize packet delays/backlogs

University of Toronto – Fall 2012

What is “100% throughput”?

- Work-conserving scheduler
- Definition: If there is one or more packet in the system for an output, then the output is busy.
- An output queued switch is work-conserving.
- Each output can be modeled as an independent single-server queue.
- If λ < then E[Qij(n)] < C for some C.
- Therefore, we say it achieves “100% throughput”.
- For fixed-sized packets, work-conservation also minimizes average packet delay.
- Q: What happens when packet sizes vary?
- Non work-conserving scheduler
- An input-queued switch is, in general, non work-conserving.
- Q: What definitions make sense for “100% throughput”?

University of Toronto – Fall 2012

We will focus on this definition.

Common Definitions of 100% throughputWork-conserving

For alln,i,j, Qij(n) < C,i.e.,

For alln,i,j, E[Qij(n)] < Ci.e.,

Departure rate = arrival rate,i.e.,

weaker

University of Toronto – Fall 2012

Uniform Traffic

- Definition: ij= for all i,j
- i.e., all input-output pairs have same traffic rate
- Condition for admissible traffic: < 1/N
- Example: Bernoulli traffic
- = /N
- Arrivals at input i are Bernoulli() and i.i.d.

University of Toronto – Fall 2012

100% Throughput for Uniform Traffic

- Nearly all algorithms in literature can give 100% throughput when traffic is uniform
- For example:
- Uniform cyclic.
- Random permutation.
- Wait-until-full [simulations].
- Maximum size matching (MSM) [simulations].
- Maximal size matching (e.g. WFA, PIM, iSLIP) [simulations].

University of Toronto – Fall 2012

1

A

1

A

1

2

2

B

B

2

B

3

3

C

C

3

C

4

4

D

D

4

D

Uniform Cyclic SchedulingEach (i, j) pair is served every N time slots: Geom/D/1

λ=/N < 1/N

1/N

Stable for < 1

University of Toronto – Fall 2012

Wait Until Full

- We don’t have to do much at all to achieve 100% throughput when arrivals are Bernoulli IID uniform.
- Simulation suggests that the following algorithm leads to 100% throughput.
- Wait-until-full:
- If any VOQ is empty, do nothing (i.e. serve no queues).
- If no VOQ is empty, pick a random permutation.

University of Toronto – Fall 2012

Maximum Size Matching (MSM)

- Intuition: maximize instantaneous throughput
- Simulations suggest 100% throughput for uniform traffic.

Q11(n)>0

Maximum

Size Match

QN1(n)>0

Bipartite Match

Request Graph

University of Toronto – Fall 2012

until

full

Uniform Cyclic

Maximal

Matching

Algorithm

(iSLIP)

MSM

Simple Algorithms with 100% ThroughputUniversity of Toronto – Fall 2012

1

A

1

A

1

2

2

B

B

2

B

3

3

C

C

3

C

4

4

D

D

4

D

Uniform Random Scheduling- At each time-slot, pick a schedule u.a.r. among:
- The N cyclic permutations
- Or the N! permutations
- Then P(Si,j=1) = 1/N
- Q: why?

University of Toronto – Fall 2012

Uniform Random Scheduling

- We get a Geom/Geom/1 system:
- Birth-death chain
- We get:
- Stable when < 1

=1/N

l=/N

University of Toronto – Fall 2012

Outline

- Uniform traffic
- Uniform cyclic
- Random permutation
- Wait-until-full
- Non-uniform traffic, known traffic matrix
- Birkhoff-von-Neumann
- Unknown traffic matrix
- Maximum Size Matching
- Maximum Weight Matching

University of Toronto – Fall 2012

Non-Uniform Traffic

- Assume the traffic matrix is:
- is admissible
- … and non-uniform

University of Toronto – Fall 2012

Uniform Schedule?

- What if uniform schedule?
- Each VOQ serviced at rate = 1/N = 1/4
- But arrivals to VOQ(1,2) have rate 12 = 0.57
- Arrival rate > departure rate switch unstable!

Need to adapt schedule to traffic matrix.

University of Toronto – Fall 2012

Example 1 – Scheduling (Trivial)

- Assume we know the traffic matrix, it is admissible, and it follows a permutation:
- Then we can simply choose:

University of Toronto – Fall 2012

Example 2 - Scheduling

- Assume we know the traffic matrix, and it doesn’t follow a permutation. For example:
- Then we can choose the sequence of service permutations:
- And either cycle though it or pick randomly
- In general, if we know an admissible , can we pick a sequence S(n) so that < ?

University of Toronto – Fall 2012

Definitions

- Doubly Stochastic Matrix: An NxN matrix with nonnegative entries where all rows and all columns sum to 1.
- Doubly Sub-Stochastic Matrix: An NxN matrix with nonnegative entries where the sum of entries in each row or column is less than or equal to 1.

University of Toronto – Fall 2012

Doubly Stochastic Matrices

- is admissible, or “doubly sub-stochastic”
- Theorem 1 (von Neumann): There exists ’={ij’} such that < ’ and ’ is doubly stochastic: iij = j ij = 1
- Example:

University of Toronto – Fall 2012

Doubly Stochastic Matrices

Fact 1. The set of doubly stochastic matrices is convex, compact (closed and bounded), in RN2

Fact 2. Any convex, compact set in RN2 has extreme points, and is equal to the convex hull of its extreme points (Krein-Milman Theorem)

University of Toronto – Fall 2012

Doubly Stochastic Matrices

Theorem 2 (Birkhoff): Permutation matrices are the extreme points of the set of doubly stochastic matrices

In other words: Given’, there exists K numbers k >0 and K permutation matrices Pk such that

Note: K = N2-2N+2.

Von Neumann

Birkhoff

University of Toronto – Fall 2012

Birkhoff-von Neumann (BvN) Scheduling

BvN decomposition:

- ’ {k,Pk}

BvN weighted random scheduling:

- Pick Pkwith probability k

Theorem:

- BvN scheduling achieves 100% throughput

University of Toronto – Fall 2012

BvN Example

University of Toronto – Fall 2012

BvN Example – Cont’d

University of Toronto – Fall 2012

BvN Example – Cont’d

University of Toronto – Fall 2012

BvN Example – Cont’d

University of Toronto – Fall 2012

BvN Example – Cont’d

University of Toronto – Fall 2012

Proof: 100% Throughput

- Lindley’s equation:
- Arrival Rate: P(Aij(n)=1) = E[Aij(n)] = ij
- Departure Rate:
- Arrival rate < departure rate 100% throughput

University of Toronto – Fall 2012

Outline

- Uniform traffic
- Uniform cyclic
- Random permutation
- Wait-until-full
- Non-uniform traffic, known traffic matrix
- Birkhoff-von-Neumann
- Unknown traffic matrix
- Maximum Size Matching
- Maximum Weight Matching

University of Toronto – Fall 2012

Unknown Traffic Matrix

- We want to maximize throughput
- Traffic matrix unknown cannot use BvN
- Idea: maximize instantaneous throughput
- In other words: transfer as many packets as possible at each time-slot
- Maximum Size Matching (MSM) algorithm

University of Toronto – Fall 2012

Maximum Size Matching (MSM)

- MSM maximizes instantaneous throughput
- MSM algorithm: among all maximum size matches, pick a random one

Q11(n)>0

Maximum

Size Match

QN1(n)>0

Bipartite Match

Request Graph

University of Toronto – Fall 2012

Implementing MSM

- How can we find maximum size matches?
- We do so by recasting the problem as a network flow problem

University of Toronto – Fall 2012

10

10

1

10

1

10

1

Network Flowsa

c

Source

s

Sink

t

b

d

- Let G = [V,E] be a directed graph with capacity cap(v,w) on edge [v,w].
- A flow is an (integer) function, f, that is chosen for each edge so that
- We wish to maximize the flow allocation.

University of Toronto – Fall 2012

10

10

1

10

1

10

1

a

c

10, 10

Source

s

Sink

t

10, 10

1

10, 10

10

1

10

b

d

1

Flow is of size 10

Maximum Network Flow Example – An Inspectiona

c

Source

s

Sink

t

b

d

Step 1:

University of Toronto – Fall 2012

obvious

Maximum flow:

a

c

10, 9

Source

s

Sink

t

10, 10

1,1

10, 10

1,1

10, 2

b

d

10, 2

1, 1

Flow is of size 10+2 = 12

A Maximum Network Flow ExampleStep 2:

a

c

10, 10

Source

s

Sink

t

10, 10

1

10, 10

1

10, 1

b

d

10, 1

1, 1

Flow is of size 10+1 = 11

University of Toronto – Fall 2012

Ford-Fulkerson Method of Augmenting Paths

- Set f(v,w) = -f(w,v) on all edges.
- Define a Residual Graph, R, in which res(v,w) = cap(v,w) – f(v,w)
- Find paths from s to t for which there is positive residue.
- Increase the flow along the paths to augment them by the minimum residue along the path.
- Keep augmenting paths until there are no more to augment.

University of Toronto – Fall 2012

Example of Residual Graph

a

c

10, 10

10, 10

1

10, 10

s

t

10

1

10

b

d

1

Flow is of size 10

Residual Graph, R

res(v,w) = cap(v,w) – f(v,w)

a

c

10

10

10

1

s

t

10

1

10

b

d

1

Augmenting path

University of Toronto – Fall 2012

Example of Residual Graph

Step 2:

a

c

10, 10

s

t

10, 10

1

10, 10

1

10, 1

b

d

10, 1

1, 1

Flow is of size 10+1 = 11

Residual Graph

a

c

10

s

t

10

10

1

1

1

1

b

d

9

1

9

University of Toronto – Fall 2012

In general, it is possible to find a solution by considering at most |V|.|E| paths, by picking shortest augmenting path first.

There are many variations, such as picking most augmenting path first.

Complexity of Network Flow ProblemsUniversity of Toronto – Fall 2012

How do we find the maximum size match?

A

1

2

B

3

C

4

D

5

E

6

F

Finding a Maximum Size MatchUniversity of Toronto – Fall 2012

Finding a maximum size bipartite matching is equivalent to solving a network flow problem with capacities and flows of size 1.Network Flows and Bipartite Matching

A

1

2

B

Sink

t

Source

s

3

C

4

D

5

E

6

F

University of Toronto – Fall 2012

Example: Maximum Size MatchingFord-Fulkerson method

Residual Graph for first three paths:

A

1

2

B

t

s

3

C

4

D

5

E

6

F

University of Toronto – Fall 2012

Example: Maximum Size MatchingFord-Fulkerson method

Residual Graph for next two paths:

A

1

2

B

t

s

3

C

4

D

5

E

6

F

University of Toronto – Fall 2012

Example: Maximum Size MatchingFord-Fulkerson method

Residual Graph for augmenting path:

A

1

2

B

t

s

3

C

4

D

5

E

6

F

University of Toronto – Fall 2012

Example: Maximum Size MatchingFord-Fulkerson method

Residual Graph for last augmenting path:

A

1

2

B

t

s

3

C

4

D

5

E

6

F

Note that the path augments the match: no input and output

is removed from the match during the augmenting step.

University of Toronto – Fall 2012

Example: Maximum Size MatchingFord-Fulkerson method

Maximum flow graph:

A

1

2

B

t

s

3

C

4

D

5

E

6

F

University of Toronto – Fall 2012

Example: Maximum Size MatchingFord-Fulkerson method

Maximum Size Matching:

A

1

2

B

3

C

4

D

5

E

6

F

University of Toronto – Fall 2012

Question

- Is the intuition right?
- Answer: NO!
- There is a counter-example for which, in a given VOQ (i,j), ij < ij but MSM does not provide 100% throughput.

University of Toronto – Fall 2012

matches, S(n):

Counter-example- Consider the following non-uniform traffic pattern, with Bernoulli IID arrivals:
- Consider the case when Q21, Q32 both have arrivals, w. p. (1/2 - )2.
- In this case, input 1 is served w. p. at most 2/3.
- Overall, the service rate for input 1, 1 is at most
- 2/3.[(1/2-)2] + 1.[1-(1/2- )2]
- i.e.1 ≤ 1 – 1/3.(1/2- )2 .
- Switch unstable for ≤ 0.0358

University of Toronto – Fall 2012

Simulation of Simple 3x3 Example

University of Toronto – Fall 2012

References

- “Achieving 100% Throughput in an Input-queued Switch (Extended Version)”. Nick McKeown, Adisak Mekkittikul, Venkat Anantharam and Jean Walrand. IEEE Transactions on Communications, Vol.47, No.8, August 1999.
- “A Practical Scheduling Algorithm to Achieve 100% Throughput in Input-Queued Switches.”. Adisak Mekkittikul and Nick McKeown. IEEE Infocom 98, Vol 2, pp. 792-799, April 1998, San Francisco.

University of Toronto – Fall 2012

Download Presentation

Connecting to Server..