Allocator implementations for network on chip routers
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

Allocator Implementations for Network-on-Chip Routers PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on
  • Presentation posted in: General

Allocator Implementations for Network-on-Chip Routers. Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University. Overview. Allocators have major impact on router performance Zero-load latency, t hroughput under load, cycle time

Download Presentation

Allocator Implementations for Network-on-Chip Routers

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Allocator implementations for network on chip routers

Allocator Implementations for Network-on-Chip Routers

Daniel U. Becker and William J. Dally

Concurrent VLSI Architecture Group

Stanford University


Overview

Overview

  • Allocators have major impact on router performance

    • Zero-load latency, throughput under load, cycle time

  • On-chip environment imposes stringent constraints

    • Cycle time, power, no iterative / multi-cycle allocators

  • Main Contributions:

    • RTL-based performance & cost evaluation of virtual channel and switch allocators for NoC routers

    • Sparse VC allocation scheme reduces delay, area & power

    • Pessimistic speculation scheme minimizes delay penalty

Allocator Implementations for NoC Routers


Separable allocators

Separable Allocators

Input-first:

  • Implement allocation as two phases

    • Local arbitration at each input

    • Global arbitration at each output

  • Pros:

    • Straightforward implementation

    • Delay scales logarithmically

  • Cons:

    • Arbiters within each phase are independent

    • Bad choice in first phase can limit matching

Outputs

Inputs

Output-first:

Outputs

Inputs

Allocator Implementations for NoC Routers


Tamir 93 wavefront allocator

[Tamir’93]Wavefront Allocator

  • Consider inputs and outputs together

    • Grant requests on diagonal, kill conflicts

    • Repeat for other diagonals

  • Pros:

    • Tends to generate better matchings

    • Tiled design facilitates full-custom implem.

  • Cons:

    • Delay scales linearly

    • Orig. design has (false) combinational loops

Outputs

Inputs

Allocator Implementations for NoC Routers


Evaluation methodology

Evaluation Methodology

  • Analytical models useful for developing intuition

  • But becoming increasingly inaccurate

    • Wire delay impact, synthesized vs. full-custom logic, …

  • Use two-pronged evaluation approach:

    • Delay & cost via detailed RTL-based evaluation

      • Synthesized using Synopsys Design Compiler in topo mode

      • Commercial 45nm low power library @ worst case

    • Network-level performance via simulation

      • Cycle-oriented interconnection network simulator

      • 64-node networks: 2D mesh & 2D flattened butterfly

      • Request-reply traffic, synthetic traffic patterns

Allocator Implementations for NoC Routers


Virtual channel allocation

Virtual Channel Allocation

  • Virtual channels (VCs) allow multiple packet flows to share physical resources (buffers, channels)

  • Before packets can proceed through router, need to claim ownership of VC buffer at next router

  • VC allocator assigns waiting packets at inputs to output VC buffers that are not currently in use

    • P×V inputs (input VCs), P×V outputs (output VCs)

    • Once assigned, VC is used for entire packet’s duration

Allocator Implementations for NoC Routers


Sparse vc allocation 1

Sparse VC Allocation (1)

  • VCs are used for variety of purposes:

    • Deadlock avoidance

      • Break cyclic dependencies

      • Routing deadlock (within network)

      • Protocol deadlock (at network boundary)

    • Flow control

      • Decouple buffers and channels to avoid head-of-line blocking

  • Idea: Partition set of VCs to restrict legal requests

    • Significantly reduces VC allocator logic complexity

    • Delay/area/power savings of up to 41%/90%/83%

Allocator Implementations for NoC Routers


Sparse vc allocation 2

Sparse VC Allocation (2)

IVC

OVC

64 Requests

32 Requests

24 Requests

NM

P×2 Requests

REQ

P×8 Requests

MIN

P×4 Requests

NM

P×2 Requests

REP

MIN

P×4 Requests

2×2×2 VCs

2×4 VCs

8 VCs

Allocator Implementations for NoC Routers


Vc allocator performance

VC Allocator Performance

[FBfly, 2×2×2 VCs]

Allocator Implementations for NoC Routers


Vc allocator delay

VC Allocator Delay

Allocator Implementations for NoC Routers


Vc allocator cost

VC Allocator Cost

Allocator Implementations for NoC Routers


Switch allocation

Switch Allocation

  • Flits require crossbar access to traverse router

  • VCs at each input port share crossbar input

  • Switch allocator generates crossbar schedule

    • Allocation performed on cycle-by-cycle basis

    • P×V inputs (input VCs), P outputs (output ports)

    • At most one VC per input can be granted in each cycle

  • Speculative allocation reduces zero-load latency

    • Start switch allocation before VC allocation completes

Allocator Implementations for NoC Routers


Pessimistic speculation 1

Pessimistic Speculation (1)

  • Conventional approach:

    • Separate allocators for spec. and non-spec. requests

    • Non-spec. grants mask conflicting spec. grants

    • Conflict detection is on critical path

  • At low load, most requests are granted

  • Idea: Assume all requests will be granted

    • Mask spec. grants with non-spec. requests

    • Overlap conflict detection and allocation

    • Sacrifice speculation accuracy for lower delay

    • But preserve zero-load latency improvement

Allocator Implementations for NoC Routers


Pessimistic speculation 2

Pessimistic Speculation (2)

nonspec. allocator

nonspec.

requests

nonspec.

grants

conflict detection

spec. allocator

spec.

requests

spec.

grants

mask

Allocator Implementations for NoC Routers


Switch allocator performance 1

Switch Allocator Performance (1)

[Mesh, 2×1×1 VCs]

Allocator Implementations for NoC Routers


Switch allocator performance 2

Switch Allocator Performance (2)

[FBfly, 2×2×4 VCs]

>20%

Allocator Implementations for NoC Routers


Switch allocator delay

Switch Allocator Delay

Allocator Implementations for NoC Routers


Switch allocator cost

Switch Allocator Cost

Allocator Implementations for NoC Routers


Speculation performance 1

Speculation Performance (1)

[Mesh, 2×1×1 VCs]

Allocator Implementations for NoC Routers


Speculation performance 2

Speculation Performance (2)

[Fbfly, 2×2×4 VCs]

Allocator Implementations for NoC Routers


Speculation implementation

Speculation Implementation

Allocator Implementations for NoC Routers


Conclusions

Conclusions

  • Network-level performance is largely insensitive to VC allocator implemetation

    • Light effective load facilitates near-ideal matchings

  • Sparse VC allocation can greatly reduce delay & cost

    • Partition set of VCs based on functionality

    • Restrict possible requests allocator must handle

  • For switch allocation, wavefront allocator produces better matchings but increases delay & cost

    • Difference increases with number of ports, VCs

  • Pessimistic speculation reduces switch allocator delay

    • Trade for some performance degradation near saturation

Allocator Implementations for NoC Routers


  • Login