Flow control
This presentation is the property of its rightful owner.
Sponsored Links
1 / 57

Flow Control PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on
  • Presentation posted in: General

Flow Control. Z. Lu / A. Jantsch / I. Sander [email protected] Dally and Towles, Chapter 12, 13. Overview. Interconnect Network Introduction. Topology (Regular, irregular). Deadlock, Livelock. Router Architecture (pipelined, classic) . Routing (algorithms, mechanics). Network Interface

Download Presentation

Flow Control

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Flow control

Flow Control

Z. Lu / A. Jantsch / I. Sander

[email protected]

Dally and Towles, Chapter 12, 13


Overview

Overview

Interconnect Network Introduction

Topology

(Regular, irregular)

Deadlock, Livelock

Router Architecture

(pipelined, classic)

Routing

(algorithms, mechanics)

Network Interface

(message passing, shared memory)

Flow Control

(Circuit, packet switching (SAF, wormhole, virtual channel)

Performance Analysis and QoS

Implementation

Evaluation

Summary

Concepts

SoC Architecture


Outline

Outline

  • Flow control basics

    • What is flow control?

    • Resources and allocation units

  • Network-level flow control

    • Bufferless flow control

    • Buffered flow control

  • Link-level (switch-to-switch) flow control

SoC Architecture


Flow control1

Flow Control

  • Flow Control determines how the resources of a network, such as channel bandwidth and buffers, are allocated to packets traversing a network.

  • Goal is to use resources as efficient as possible to allow a high throughput

  • An efficient flow control is a prerequisite to achieve a good network performance

SoC Architecture


Flow control2

Flow Control

  • Flow Control can be viewed as a problem of

    • Resource allocation

    • Contention resolution

  • Resources in form of channel bandwidth, buffers and control state must be allocated to each packet

  • If two packets compete for the same channel, flow control can only assign the channel to one packet, but must also deal with the other packet.

SoC Architecture


Resources in a network node

Channel Bandwidth

To travel to the next node, bandwidth has to be allocated for the packet.

Buffer

Packet is stored in a buffer before it is sent to the next node.

Control State

Tracks the resources allocated to the packet in the node and the state of the packet.

Resources in a Network Node

Control States

Channel

Bandwidth

Buffer Capacity

What is the first thing to consider for resource allocation?

SoC Architecture


Units of resource allocation

Units of Resource Allocation

Message

Packet

Header

Packet

RI

SN

Head Flit

Body Flit

Body Flit

Tail Flit

Why different granulatiry?

Isn’t using a single unit much simpler?

1 message = 1 packet?

1 packet = 1 flit = 1 phit?

Flit

Type

VC

Phit

Messages, Packets, Flits and Phits are handled in different layers of the network protocol

SoC Architecture


Units of resource allocation1

Units of Resource Allocation

  • A message is a contiguous (adjacent in time) group of bits that is delivered from source terminal to destination terminal. A message consists of packets.

  • A packet is the basic unit for routing and sequencing. The control state is assigned to a packet. Packets maybe divided into flits.

SoC Architecture


Units of resource allocation2

Units of Resource Allocation

  • A flit (flow control digits) is the basic unit of bandwidth and storage allocation.

    • Head flit, body flit, tail flit, head/tail flit.

    • Head flit allocates channel state for a packet and tail flit de-allocates it.

    • Body/tail flits do not have any routing or sequence information and have to follow the route for the whole packet.

  • A phit (physical transfer digits) is the unit that is transferred across a channel in a single clock cycle.

SoC Architecture


Packets or flits

Packets or Flits?

  • Contradictory requirements on packets

    • Packets should be very large in order to reduce overhead of routing and sequencing

    • Packets should be very small to allow efficient and fine-grained resource allocation and minimize blocking latency

  • Flits try to eliminate this conflict

    • Packets can be large (low overhead)

    • Flits can be small (efficient resource allocation)

SoC Architecture


Size phit flit packet

Size: Phit, Flit, Packet

  • There are no fixed rules for the size of phits, flits and packets

    • Message: arbitrarily long

    • Packets: restricted maximum length (constant or variable)

  • Typical values

    • Phits: 1 bit to 64 bits

    • Flits: 16 bits to 512 bits

    • Packets: 128 bits to 1024 bits

SoC Architecture


Flow control3

Flow Control

Now that the routing can route my packets to destination,

why do we have to control packet movement?

What are the alternatives of controlling the movement?

  • Flow Control can be divided into

    • Bufferless flow control

      • Packets are either dropped or misrouted

      • Circuit switching

        • Decouple transmission from path setup

    • Buffered flow control

      • Packets that cannot be routed via the desired channel are stored in buffers

SoC Architecture


Bufferless flow control

No buffers mean less implementation cost

If more than one packet shall be routed to the same output, one has to be

Misrouted or

Dropped

In this example two packets, A and B, (consisting of several flits) arrive at a network node

Bufferless Flow Control

A

A

A

0

B

B

B

0

SoC Architecture


Dropping flow control

Dropping Flow Control

  • Packet B is dropped and must be resent

  • There must be a protocol that informs the sending node that the packet has been dropped

    • e.g. Resend after no acknowledge has been received within a given time (timeout)

With such kind of re-transmission, what problem may result in at the receiver?

0

A

A

A

0

B

SoC Architecture


Misrouting flow control

Misrouting Flow Control

  • Packet B is misrouted

  • No further action is required here, but

    • at the receiving node packets have to be sorted into original order

What information is needed to enable the correct re-ordering?

0

A

A

A

0

1

B

B

B

0

SoC Architecture


Circuit switching

D

D

D

R

R

R

R

R

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

A

A

A

A

A

T

T

T

T

T

Circuit Switching

  • Circuit-Switching is a bufferless flow control, where several channels are reserved to form a circuit.

  • A request (R) propagates from source to destination, which is answered by an acknowledgement (A)

  • Then data are sent (here two five flit packets (D)) and a tail flit (T) is sent to de-allocate the channels

0

1

Channel

2

3

4

30

0

10

20

Cycle

SoC Architecture


Circuit switching1

D

D

D

R

R

D

D

D

D

D

D

D

D

D

D

R

R

D

D

D

D

D

D

R

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

A

A

A

A

A

T

T

T

T

T

Circuit Switching

  • Circuit-switching does not suffer from dropping or misrouting packets

  • However there are two weaknesses:

    • High zero-load (non-contention) latency: T = 3 H tr + L/b

    • Low throughput, since channel is used to a large fraction of time for signaling and not for delivery of the payload

0

1

Channel

2

3

4

30

0

10

20

Cycle

How can we improve the efficiency of the flow control?

SoC Architecture


Buffered flow control

Buffered Flow Control

  • More efficient flow control can be achieved by adding buffers

    • With sufficient buffers, packets do not need to be misrouted or dropped, since packets can wait for the outgoing channel to be ready.

SoC Architecture


Buffered flow control1

Buffered Flow Control

  • Two main approaches

    • Packet-Buffer Flow Control

      • Store-And-Forward (SAF)

      • (Virtual) Cut-Through

    • Flit-Buffer Flow Control

      • Wormhole flow control

      • Virtual Channel (VC) flow control

SoC Architecture


Saf flow control

H

H

H

H

B

B

B

B

B

B

B

B

B

B

B

B

T

T

T

T

SAF Flow Control

Packet

  • Each node along a route waits until a packet is completely received (stored) and then the packet is forwarded to the next node

  • Two resources are needed

    • Packet-sized buffer in the switch

    • Exclusive use of the outgoing channel during the full packet transmission period

0

1

Channel

2

3

0

5

10

Cycle

SoC Architecture


Saf flow control1

H

H

H

H

B

B

B

B

B

B

B

B

B

B

B

B

T

T

T

T

SAF Flow Control

Packet

  • Advantage: While waiting to acquire resources, no channels are being held idle and only a single packet buffer on the current node is occupied

  • Disadvantage: Very high latency

    • T = H (tr + L/b)

0

1

Channel

2

3

0

5

10

Cycle

At maximum, how many routing cycles are tolerable without compromising performance?

SoC Architecture


Cut through flow control

H

H

H

H

B

B

B

B

B

B

B

B

B

B

B

B

T

T

T

T

Cut-Through Flow Control

Packet

  • Transmission on the next channel starts as soon as possible without waiting for the entire packet to be received. (Otherwise it behaves like Store-Forward)

  • Channel is released after tail flit

0

1

What happens from cycle 2 to 5 for the head flit?

Channel

2

3

0

10

2

5

Cycle

SoC Architecture


Cut through flow control1

H

H

H

H

B

B

B

B

B

B

B

B

B

B

B

B

T

T

T

T

Cut-Through Flow Control

Packet

0

  • Advantages

    • Cut-through reduces the latency

      • T = H tr + L/b

    • Very high channel utilization

  • Disadvantages (also valid for Store-and Forward)

    • Not so good utilization of buffers, since they are allocated in units of packets, requiring packet-sized buffers (also valid for Store-and-Forward).

    • Contention latency is increased, since packets must wait until a whole packet leaves the occupied channel

1

Packet-sized buffer? Flit-sized channel?

How does this work?

Channel

2

3

0

10

Cycle

What if we use flit-sized buffer, flit-sized channel?

SoC Architecture


Wormhole flow control

Wormhole Flow Control

  • Wormhole flow control operates like cut-through, but with channel and buffers allocated to flits rather than packets

  • When the head flit arrives at a node, it must acquire three resources before it can be forwarded to the next node along a route

    • A virtual channel (channel state) for the whole packet

    • One flit buffer

    • Bandwidth corresponding to one flit

  • Body flits use a virtual channel acquired by the head flit and have to acquire one flit buffer and bandwidth corresponding to one flit

  • Tail flits behave like body flits, but release also the channel

SoC Architecture


Example for wormhole flow control

Input

I

H

B

B

T

Example for Wormhole Flow Control

Output

VC state

  • Input virtual channel is in idle state (I)

  • Upper output channel is occupied, allocated to lower channel (L)

Flit buffers

L

SoC Architecture


Example for wormhole flow control1

Input

W

U

B

B

T

Example for Wormhole Flow Control

Output

VC state

  • Input channel enters the waiting state (W)

  • Head flit is buffered

Flit buffers

L

H

SoC Architecture


Example for wormhole flow control2

Input

W

U

B

T

Example for Wormhole Flow Control

Output

VC state

  • Body flit is also buffered

  • No more flits can be buffered, thus congestion arises if more flits want to enter the switch

Flit buffers

L

B

H

SoC Architecture


Example for wormhole flow control3

Input

U

A

H

T

Example for Wormhole Flow Control

Output

VC state

  • Virtual channel enters active state (A)

  • Head flit is output on upper channel

  • Second body flit is accepted

Flit buffers

U

B

B

SoC Architecture


Example for wormhole flow control4

Input

U

A

H

B

Example for Wormhole Flow Control

Output

VC state

  • First body flit is output

  • Tail flit is accepted

Flit buffers

U

B

T

SoC Architecture


Example for wormhole flow control5

Input

U

A

H

B

B

Example for Wormhole Flow Control

Output

VC state

  • Second body flit is output

Flit buffers

U

T

SoC Architecture


Example for wormhole flow control6

Input

I

H

B

B

T

Example for Wormhole Flow Control

Output

VC state

  • Tail flit is output

  • Virtual channel is deallocated and returns to idle state

Flit buffers

SoC Architecture


Wormhole flow control1

H

H

B

B

B

B

T

T

Wormhole Flow Control

  • The main advantage of wormhole to cut-through is that buffers in the routers do not need to be able to hold full packets, but only need to store a number of flits

  • This allows to use smaller and faster routers

0

Channel

1

0

5

Cycle

SoC Architecture


Wormhole flow control2

Wormhole Flow Control

  • Virtual channels hold the state needed to coordinate the handling of flits of a packet over a channel

  • Comparison to cut-through

    • Wormhole flow control makes far more efficient use of buffer space.

      • Typically, cut-through requires at least an order of magnitude more storage than wormhole flow control.

    • Throughput maybe less, since wormhole flow control may block a channel mid-packet.

SoC Architecture


Blocking cut through and wormhole

H

H

B

B

B

B

T

T

BlockingCut-Through and Wormhole

Cut-Through (Buffer-Size 1 Packet)

  • If a packet is blocked, the flits of the wormhole packet are stored in different routers while holding the channel.

Blocked

In both cases, is the channel held by the entire packet? Why?

Wormhole (Buffer-Size 2 Flits)

Blocked

SoC Architecture


Problem with wormhole flow control

B

A

B

Problem with Wormhole Flow Control

Virtual

Channel

  • There is only one virtual channel for each physical channel

  • Packet A is blocked and cannot acquire channel p

  • Though channels p and q are idle, packet A cannot use these channels since B owns channel p.

Idle

Channel

q

Idle

Channel

p

Node 1

Node 2

Node 3

Blocked

How can we improve this situation for higher performance?

SoC Architecture


Virtual channel flow control

Virtual Channel-Flow Control

  • In virtual channel flow-control, several channels are associated with a single physical channel

  • This allows to use the bandwidth that otherwise is left idle when a packet blocks the channel

  • Unlike wormhole flow control, subsequent flits are not guaranteed bandwidth, since they have to compete for bandwidth with other flits

SoC Architecture


Virtual channel flow control1

B

Virtual Channel Flow Control

  • There are several virtual channels (like lanes in our road systems) for each physical channel.

  • Packet A can use a second virtual channel and thus proceed over channel p and q.

A

B

A

Channel

p

Channel

q

A

Node 1

Node 2

Node 3

Blocked

SoC Architecture


Virtual channel allocation

Virtual Channel Allocation

  • Flits must be delivered in order, H, B1, …Bn, T.

    • Only the head flit carries routing information

  • Allocate VC at the packet level, i.e., packet-by-packet

    • The head flit responsible for allocating VCs along the route.

    • Body and tail flits must follow the VC path, and the tail flit releases the VCs.

  • The flits of a packet cannot interleave with those of any other packet in buffers.

  • A

    A

    A

    A

    B

    B

    B

    B

    Node 1

    Node 2

    Node 3

    Why this ordering must be maintained?

    Why the interleaving cannot be allowed?

    SoC Architecture


    Virtual channel flow control fair bandwidth arbitration

    Virtual Channel Flow ControlFair Bandwidth Arbitration

    In1

    AH

    A1

    A2

    A3

    A4

    A5

    A6

    AT

    • They use the shared link, interleaving their flits

    • This results in a high average latency

    Flits in VC Buffer

    1

    1

    2

    2

    3

    3

    3

    3

    3

    3

    3

    2

    2

    1

    1

    0

    0

    In2

    BH

    B1

    B2

    B3

    B4

    B5

    B6

    BT

    Flits in VC Buffer

    1

    2

    2

    3

    3

    3

    3

    3

    3

    3

    3

    3

    2

    2

    1

    1

    0

    Out

    AH

    BH

    A1

    B1

    A2

    B2

    A3

    B3

    A4

    B4

    A5

    B5

    A6

    B6

    AT

    BT

    Out - A

    AH

    A1

    A2

    A3

    A4

    A5

    A6

    AT

    Out - B

    BH

    B1

    B2

    B3

    B4

    B5

    B6

    BT

    SoC Architecture


    Virtual channel flow control winner take all arbitration

    Virtual Channel Flow ControlWinner-Take-All Arbitration

    In1

    AH

    A1

    A2

    A3

    A4

    A5

    A6

    AT

    • A winner-take-all arbitration reduces the average latency with no throughput penalty

    Flits in VC Buffer

    1

    1

    1

    1

    1

    1

    1

    1

    0

    0

    0

    0

    0

    0

    0

    0

    0

    In2

    BH

    B1

    B2

    B3

    B4

    B5

    B6

    BT

    Flits in VC Buffer

    1

    2

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    2

    1

    0

    Out

    AH

    A1

    A2

    A3

    A4

    A5

    A6

    AT

    BH

    B1

    B2

    B3

    B4

    B5

    B6

    BT

    Out - A

    AH

    A1

    A2

    A3

    A4

    A5

    A6

    AT

    Out - B

    BH

    B1

    B2

    B3

    B4

    B5

    B6

    BT

    SoC Architecture


    Virtual channel flow control buffer storage

    Virtual Channel Flow ControlBuffer Storage

    • Buffer storage is organized in two dimensions

      • Number of virtual channels

      • Number of flits that can be buffered per channel

    SoC Architecture


    Virtual channel flow control buffer storage1

    Virtual Channel Flow ControlBuffer Storage

    To allow maximum throughput,

    what is the minimum size per VC? Why?

    • Virtual channel buffer shall at least be as deep as needed to cover round-trip credit latency

    • In general it is usually better to add more virtual channels than to increase the buffer size

    SoC Architecture


    Virtual channel router

    Virtual Channel Router

    • Router has

      • 2 input channels

      • 2 output channels

      • 2 virtual channels

      • 3 flit buffers

    SoC Architecture


    Buffer management

    Buffer Management

    How do I make sure that my downstream buffer will not overflow?

    • In buffered flow control nodes, there is a need for communication between nodes in order to inform about the availability of buffers

    • Backpressure informs upstream nodes that they must stop sending to a downstream node when the buffers of that downstream node are full

    Traffic Flow

    upstream node

    downstream node

    SoC Architecture


    Credit based flow control

    Credit-Based Flow Control

    • The upstream router keeps a count of the number of free flit buffers in each virtual channel downstream

    • Each time the upstream router forwards a flit, it decrements the counter

    • If a counter reaches zero, the downstream buffer is full and the upstream node cannot send a new flit

    • If the downstream node forwards a flit, it frees the associated buffer and sends a credit to the upstream buffer, which increments its counter

    SoC Architecture


    Credit based flow control1

    Credit-Based Flow Control

    SoC Architecture


    Credit based flow control2

    Credit-Based Flow Control

    • The minimum time between the credit being sent at time t1 and a credit sent for the same buffer at time t5 is the credit round-trip delay tcrt

    • The credit round-trip delay including wire delay is a critical parameter for any router because it determines the maximum throughput that can be supported by the flow control mechanism.

    SoC Architecture


    Credit based flow control3

    Credit-Based Flow Control

    • If there is only a single flit buffer, a flit waits for a new credit and the maximum throughput is limited to one flit for each tcrt

    • The bit rate would be then Lf / tcrt where Lf is the length of a flit in bits

    SoC Architecture


    Credit based flow control4

    Credit-Based Flow Control

    • If there are F flit buffers on the virtual channel, F flits could be sent before waiting for the credit, which gives a throughput of F flits for each tcrt and a bit rate of FLf / tcrt

    SoC Architecture


    Credit based flow control5

    Credit-Based Flow Control

    • In order not to limit the throughput by low level flow control, the flit buffer should be at least

      where b is the bandwidth of a channel

    SoC Architecture


    Credit based flow control6

    Credit-Based Flow Control

    • For each flit sent downstream, a corresponding credit is set upstream (one-to-one correspondence).

    • Thus there is a large amount of upstream signaling, which especially for small flits can represent a large overhead!

    How can we improve this?

    SoC Architecture


    On off flow control

    On/Off Flow Control

    • On/off Flow control tries to reduce the amount of upstream signaling

    • An offsignal is sent to the upstream node, if the number of free buffers falls below the threshold Foff

    • An onsignal is sent to the upstream node, if the number of free buffers rises above the threshold Fon

    • With carefully dimensioned buffers on/off flow control can achieve a very low overhead in form of upstream signaling

    SoC Architecture


    On off flow control1

    On/Off Flow Control

    SoC Architecture


    Ack nack flow control

    Ack/Nack Flow Control

    • In ack/nack flow control, the upstream node sends packets without knowing, if there are free buffers in the downstream node

    SoC Architecture


    Ack nack flow control1

    Ack/Nack Flow Control

    • If there is no buffer available

      • the downstream node sends nack and drops the flit

      • the flit must be resent

      • flits must be reordered at the downstream node

    • If there is a buffer available

      • the downstream node sends ack and stores the flit in a buffer

    SoC Architecture


    When to use the flow control schemes

    When to use the flow control schemes?

    • Because of its buffer and bandwidth inefficiency, ack/nack is rarely used.

    • Credit-basedflow control is used in systems with small numbers of buffers.

    • On/off flow control is used in systems that have large numbers of flit buffers.

    SoC Architecture


    Summary

    Summary

    • Bufferless flow control

      • Dropping, misroute packets

      • Circuit switching

    • Buffered flow control

      • Packet-Buffer Flow Control: SAF vs. Cut Through

      • Flit-Buffer Flow Control: Wormhole and Virtual Channel

    • Switch-to-switch (link level) flow control

      • Credit-based

      • On/Off

      • Ack/Nack

    What are the reasonings behind the flow control schemes?

    How do one scheme improves over another?

    What are their pros and cons?

    Why is a link-level flow control necessary?

    What are their pros and cons?

    How to determine the minimum required buffer size?

    SoC Architecture


  • Login