Router architecture
This presentation is the property of its rightful owner.
Sponsored Links
1 / 53

Router Architecture PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on
  • Presentation posted in: General

Router Architecture. S. S. S. S. S. S. S. S. S. S. S. S. S. S. S. S. T. T. T. T. T. T. T. T. T. T. T. T. T. T. T. T. Network-on-Chip. Information in the form of packets is routed via channels and switches from one terminal node to another

Download Presentation

Router Architecture

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Router architecture

Router Architecture


Network on chip

S

S

S

S

S

S

S

S

S

S

S

S

S

S

S

S

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

Network-on-Chip

  • Information in the form of packetsis routed via channels and switches from one terminal node to another

  • The interface between the interconnection network and the terminals (client) is called network interface

SoC Architecture


Router architecture1

Router Architecture

  • The discussion concentrates on a typical virtual-channel router

  • Modern routers are pipelined and work at the flit level

  • Head flits proceed through buffer stages that perform routing and virtual channel allocation

  • All flits pass through switch allocation and switch traversal stages

  • Most routers use credits to allocate buffer space

SoC Architecture


A typical virtual channel router

A typical virtual channel router

  • A routers functional blocks can be divided into

    • Datapath: handles storage and movement of a packets payload

      • Input buffers

      • Switch

      • Output buffers

    • Control Plane: coordinating the movements of the packets through the resources of the datapath

      • Route Computation

      • VC Allocator

      • Switch Allocator

SoC Architecture


A typical virtual channel router1

Routing

VC Allocation

Output Port Allocation

Switch Allocation

Switching

VC Deallocation

A typical virtual channel router

SoC Architecture


A typical virtual channel router2

Routing

VC Allocation

Output Port Allocation

Switch Allocation

Switching

VC Deallocation

A typical virtual channel router

  • The input unit

    • contains a set of flit buffers

    • Maintains the state for each virtual channel

      • G = Global State

      • R = Route

      • O = Output VC

      • P = Pointers

      • C = Credits

SoC Architecture


Virtual channel state fields input

Routing

VC Allocation

Output Port Allocation

Switch Allocation

Switching

VC Deallocation

Virtual channel state fields (Input)

SoC Architecture


A typical virtual channel router3

Routing

VC Allocation

Output Port Allocation

Switch Allocation

Switching

VC Deallocation

A typical virtual channel router

  • During route computation the output port for the packet is determined

  • Then the packet requests an output virtual channel from the virtual-channel allocator

SoC Architecture


A typical virtual channel router4

Routing

VC Allocation

Output Port Allocation

Switch Allocation

Switching

VC Deallocation

A typical virtual channel router

  • Flits are forwarded via the virtual channel by allocating a time slot on the switch and output channel using the switch allocator

  • Flits are forwarded to the appropriate output during this time slot

  • The output unit forwards the flits to the next router in the packet’s path

SoC Architecture


Virtual channel state fields output

Virtual channel state fields(Output)

SoC Architecture


Packet rate and flit rate

Packet Rate and Flit Rate

  • The control of the router operates at two distinct frequencies

    • Packet Rate (performed once per packet)

      • Route computation

      • Virtual-channel allocation

    • Flit Rate (performed once per flit)

      • Switch allocation

      • Pointer and credit count update

SoC Architecture


The router pipeline

Routing

VC Allocation

Output Port Allocation

Switch Allocation

Switching

VC Deallocation

The Router Pipeline

  • A typical router pipeline includes the following stages

    • RC (Routing Computation)

    • VC (Virtual Channel Allocation)

    • SA (Switch Allocation)

    • ST (Switch Traversal)

no pipeline stalls

SoC Architecture


The router pipeline1

The Router Pipeline

  • Cycle 0

    • Head flit arrives and the packet is directed to an virtual channel of the input port (G = I)

no pipeline stalls

SoC Architecture


The router pipeline2

The Router Pipeline

  • Cycle 1

    • Routing computation

    • Virtual channel state changes to routing (G = R)

    • Head flit enters RC-stage

    • First body flit arrives at router

no pipeline stalls

SoC Architecture


The router pipeline3

The Router Pipeline

  • Cycle 2: Virtual Channel Allocation

    • Route field (R) of virtual channel is updated

    • Virtual channel state is set to “waiting for output virtual channel” (G = V)

    • Head flit enters VA state

    • First body flit enters RC stage

    • Second body flit arrives at router

no pipeline stalls

SoC Architecture


The router pipeline4

The Router Pipeline

  • Cycle 2: Virtual Channel Allocation

    • The result of the routing computation is input to the virtual channel allocator

    • If successful, the allocator assigns a single output virtual channel

    • The state of the virtual channel is set to active (G = A)

no pipeline stalls

SoC Architecture


The router pipeline5

The Router Pipeline

  • Cycle 3: Switch Allocation

    • All further processing is done on a flit base

    • Head flit enters SA stage

    • Any active VA (G = A) that contains buffered flits (indicated by P) and has downstream buffers available (C > 0) bids for a single-flit time slot through the switch from its input VC to the output VC

no pipeline stalls

SoC Architecture


The router pipeline6

The Router Pipeline

  • Cycle 3: Switch Allocation

    • If successful, pointer field is updated

    • Credit field is decremented

no pipeline stalls

SoC Architecture


The router pipeline7

The Router Pipeline

  • Cycle 4: Switch Traversal

    • Head flit traverses the switch

  • Cycle 5:

    • Head flit starts traversing the channel to the next router

no pipeline stalls

SoC Architecture


The router pipeline8

The Router Pipeline

  • Cycle 7:

    • Tail traverses the switch

    • Output VC set to idle

    • Input VC set to idle (G = I), if buffer is empty

    • Input VC set to routing (G = R), if another head flit is in the buffer

no pipeline stalls

SoC Architecture


The router pipeline9

The Router Pipeline

  • Only the head flits enter the RC and VC stages

  • The body and tail flits are stored in the flit buffers until they can enter the SA stage

no pipeline stalls

SoC Architecture


Pipeline stalls

Pipeline Stalls

  • Pipeline stalls can be divided into

    • Packet stalls

      • can occur if the virtual channel cannot advance to its R, V, or A state

    • Flit stalls

      • If a virtual channel is in active state and the flit cannot successfully complete switch allocation due to

        • Lack of flit

        • Lack of credit

        • Losing arbitration for the switch time slot

SoC Architecture


Example for packet stall

Example for Packet Stall

Virtual-channel allocation stall

Head flit of A can first enter the VA stage when the tail flit of packet B completes switch allocation and releases the virtual channel

SoC Architecture


Example for flit stalls

Example for Flit Stalls

Switch allocation stall

Second body flit fails to allocate the requested connection in cycle 5

SoC Architecture


Example for flit stalls1

Example for Flit Stalls

Buffer empty stall

Body flit 2 is delayed three cycles. However, since it does not have to enter the RC and VA stage the output is only delayed one cycle!

SoC Architecture


Credits

Credits

  • A buffer is allocated in the SA stage on the upstream (transmitting) node

  • To reuse the buffer, a credit is returned over a reverse channel after the same flit departs the SA stage of the downstream (receiving) node

  • When the credit reaches the input unit of the upstream node the buffer is available can be reused

SoC Architecture


Credits1

Credits

  • The credit loop can be viewed by means of a token that

    • Starting at the SA stage of the upstream node

    • Traveling downwards with the flit

    • Reaching the SA stage at the downstream node

    • Returning upstream as a credit

SoC Architecture


Credit loop latency

Credit Loop Latency

  • The credit loop latency tcrt, expressed in flit times, gives a lower bound on the number of flit buffers needed on the upstream size for the channel to operate with full bandwidth

  • tcrt in flit times is given by

    tcrt = tf + tc + 2Tw + 1

Flit pipeline delay

One-way wire delay

Credit pipeline delay

SoC Architecture


Credit loop latency1

Credit Loop Latency

  • If the number of buffers available per virtual channel is F, the duty factor of the channel will be

    d = min (1, F/ tcrt)

  • The duty factor will be 100% as long as there are sufficient flit buffers to cover the round trip latency

SoC Architecture


Credit stall

Credit Stall

Virtual Channel Router with 4 flit buffers

tf

TW

TW

tf

tf

tf

tc

TW

TW

tc

tf = 4

tc = 2

Tw = 2

=>

tcrt = 11

Credit Transmit

Credit Update

tcrt

White: upstream pipeline stages

Grey: downstream pipeline stages

SoC Architecture


Flit and credit encoding

Flit and Credit Encoding

  • Flits and credits are sent over separated lines with separate width

  • Flits and credits are transported via the same line. This can be done by

    • Including credits into flits

    • Multiplexing flits and credits at phit level

SoC Architecture


Network interface

Network Interface

Slides are adapted from previous slides by

Ingo Sander and Axel Jantsch.


Network on chip1

S

S

S

S

S

S

S

S

S

S

S

S

S

S

S

S

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

Network-on-Chip

  • Information in the form of packetsis routed via channels and switches from one terminal node to another

  • The interface between the interconnection network and the terminals (client) is called network interface

SoC Architecture


Network on chip2

S

S

S

S

S

S

S

S

S

S

S

S

S

S

S

S

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

Network-on-Chip

  • Information in the form of packetsis routed via channels and switches from one terminal node to another

  • The interface between the interconnection network and the terminals (client) is called network interface

SoC Architecture


Network interface1

Network Interface

Network

  • Different terminals with different interfaces shall be connected to the network

  • The network uses a specific protocol and all traffic on the network has to comply to the format of this protocol

Switch

Network

Interface

Terminal

Node

(Resource)

SoC Architecture


Network interface2

Network Interface

  • The network interface plays an important role in a network-on-chip

    • it shall translate between the terminal protocol and the protocol of the network

    • it shall enable the client to communicate at the speed of the network

      • it shall not further reduce the available bandwidth of the network

      • it shall not increase the latency imposed by the network

  • A poorly designed network interface is a bottleneck and can increase the latency considerably

SoC Architecture


Network interfaces

Network Interfaces

  • For message passging: symmetric

    • Processor-Network Interface,

  • For shared memory: un-symmetric, load & store

    • Processor-Network Interface

    • Memory-Network Interface

  • Packet admission/ejection (line-fabric) Interface

    • May reside in a switch or router

    • Input queuing and output queuing

SoC Architecture


Basci functionality of network interfaces

Basci Functionality of Network Interfaces

  • Packetization/depacketization

    • Network deliver packets. It does not know messages and transactions.

    • Sender side: packetization (messages to packets); Receiver side: de-packetization (packets to messages)

  • Multiplexing/demultiplexing

    • Scheduling packets to be sent and receive

    • Multiple threads running

    • Sender: multiplexing; Receiver: de-multiplexing

  • Re-ordering

    • A network servcie may not guarantee order

  • End-to-end flow control

SoC Architecture


Network interfaces for message passing

Network Interfaces for message passing

  • Two-register interface

  • Register-mapped interface

  • Descriptor-based interface

  • Message reception

SoC Architecture


Two register interface

Two-Register Interface

  • For sending, the processor write to a specific Net-out register

  • For receiving, the processor reads a specific Net-in register

  • Pro:

    • Efficient for short messages

  • Cons:

    • Inefficient for long messages

    • Processor acts as DMA controller

    • Not safe, because, for longer messages, the processor may block network resources forever

R0

R1

:

:

R31

Net out

Net in

Network

SoC Architecture


Descriptor based interface

Descriptor Based Interface

  • The processor composes a message in a set of dedicated message descriptor registers

  • Each descriptor contains

    • An immediate value, or

    • A reference to a processor register, or

    • A reference to a block of memory

  • A co-processor steps through the descriptors and composes the messages

  • Safe because the network is protected from the processor’s SW

Send Start

Immediate

RN

Addr

Length

END

R0

R1

:

+

Memory

RN

:

:

:

R31

:

:

:

SoC Architecture


Receiving messages

Receiving Messages

  • A co-processor or a dedicated thread is triggered upon reception of an incoming message

  • It unpacks the message and stores it in local memory

  • It informs the receiving task via an interrupt or a status register update

SoC Architecture


Shared memory interfaces

Shared Memory Interfaces

  • The interconnection network is used to transmit memory read/write transactions between processors and memories

  • We will further discuss

    • Processor-Network Interface

    • Memory-Network Interface

SoC Architecture


Processor network interface

Processor-Network Interface

Request are stored in request register

  • Requests are tagged so that answer can be associated to request

  • In case of a cache miss requests are stored in MSHR (miss status holding register)

SoC Architecture


Processor network interface1

Processor-Network Interface

  • Uncacheable read request would result in a pending read

  • After forming and transmitting the message status changes to read requested

  • When the network returns the message status changes to read complete

  • Completed MSHRs are forwarded to reply register, status changes to idle

SoC Architecture


Processor network interface2

Processor-Network Interface

  • Cache coherence protocols change the operation of the processor-network interface

  • Complete cache lines are loaded into the cache

  • Protocol requires larger vocabulary

    • Exclusive read request

    • Invalidation and updating of cache lines

  • Cache coherence protocol requires interface to send messages and update state in response to received messages

SoC Architecture


Memory network interface

Memory-Network Interface

  • Interfaces receives memory request messages and sends replies

  • Messages received from the network are stored in the TSHR (transaction status holding register)

SoC Architecture


Memory network interface1

Memory-Network Interface

  • Request queue is used to hold request messages, when all THSRs are busy

  • THSR tracks messages in same way as MHSR

  • Bank Control and Message Transmit Unit monitors changes in THSR

SoC Architecture


Memory network interface2

Memory-Network Interface

  • A read request initializes a TSHR with status read pending

  • Subsequent memory access changes status to bank activated

  • Two cycles before first word is returned from memory bank, status is changed to read complete

  • Message transmit unit formats message and injects it into the network and the TSHR entry is marked idle

  • Requests can be handled out of order

SoC Architecture


Memory network interface3

Memory-Network Interface

  • Cache coherence protocols can be implemented with this structure, however TSHR must be extended

SoC Architecture


Packet admission ejection line fabric interface

Packet Admission/Ejection (Line-Fabric) Interface

  • Network has higher bandwidth than the input and output lines, but links may be blocked due to congestion.

  • Packets aiming for different destinations come from the same input port.

  • Queues are needed to store packets that

    • cannot enter the network because of congestion in the network

    • cannot enter the terminal

SoC Architecture


Packet admission ejection interface

Packet Admission/EjectionInterface

  • Why parallel queues rather than a single FIFO?

    • If there are traffic classes with different priorities, there should be a queue for every traffic class

      • high-priority traffic is not blocked by low-priority traffic

      • Alleviate head-of-line blocking

      • Implement an admission/ejection control policy based on priority, rate etc.

SoC Architecture


Summary

Summary

  • Network interfaces bridge processor and processor, processor and memory

    • Messaing passing interfces

    • Shared memory interfaces, complicated by cache coherency.

  • Packet admission and ejection interfaces at the network boundary are also important to use the network better (higher throughput, lower latency).

SoC Architecture


  • Login