1 / 19

Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips

Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips. Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano. Keio Univ. National Institute of Informatics Toshiba RDC Keio Univ. Keio Univ. Tile-based Multi-Core Core: Execution

garry
Download Presentation

Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano Keio Univ. National Institute of Informatics Toshiba RDC Keio Univ. Keio Univ.

  2. Tile-based Multi-Core Core: Execution Router: Packet delivery RAW 2D Mesh ACM Tree aSoC 2D Mesh Network-on-Chip (NoC) Tile (RISC, RAM, I/O) 0 1 2 [Taylor, Micro2002] 3 4 5 [Furtek, FPL2004] 6 7 8 [Liang, TVLSI2004]

  3. Tile-based Multi-Core Core: Execution Router: Packet delivery RAW 2D Mesh ACM Tree aSoC 2D Mesh MIPS Memory Router Network-on-bChip (NoC) [Taylor, Micro2002] [Furtek, FPL2004] [Liang, TVLSI2004]

  4. Network-on-Chip (NoC) ○ Advantage • Better Wiring Delay • Global wiring • Limited-length Links • Improve Modularity • Standard Network I/F Tile (RISC, RAM, I/O) 0 1 2 3 4 5 × Drawback 6 7 8 • Overhead SoC is growing!  NoC is one of Scalable on-chip interconnects

  5. No Clock for execution Data Module(a) Module(b) • Communication is cycle accurate UTF Model Data UnTimed Functional Module(a) Module(b) Clock High Abstraction BCA Model Bus Cycle Accurate RTL Model Detail Design Stream Processing ~Simulation~ • MPEG, JPEG, Viterbi • System Level Design Application is divided into some Tasks based on Simulation.

  6. (1) (2) (2) (2) Strong access locality !! Physical Tile of NoC (4) (3) (2) (1) (2) (3) (4) (2) Stream Processing ~Map, Route~ • Shared Links • Link Congestion  Throughput is degraded • Optimization (in general) • Mapping: Minimum Communication Length • Routing: Minimal Paths Task Flow Graph Too short to distribute path congestion by Minimal paths.

  7. Packet delivery WH Switching Feature of SAN Feature of NoC • Fixed application, Fixed traffic patterns • System level simulation [Ho, HPCA2003] • Various applications, Various traffic patterns • Non-minimal paths make unstable state Existing Routing ~Is non-minimal path useful?~ Common feature of SAN & NoC • Deadlock freedom • Turn-Model, … Predictable communication  Load balancing with non-minimal

  8. Flee~Non-minimal routing strategy~ • Stream processing in NoCs • Strong access locality !! • Too short to distribute path congestions • Partially non-minimal paths • Path establishment based on Traffic Amount • Heavy Traffic Comm.  Minimal Path • Light Traffic Comm.  Avoiding Congestion Increase # of alternative paths by introducing non-minimal paths Non-minimal paths are basically inefficient…

  9. Analysis Record # srcdst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … Traffic Analysis • 1. For each src-dst pair, • Totalize packet size • E.g., src-dst pair(0,1) • 32 + 32  64 • 2. Sorting in • descending order • In order of TotalSize Src-dst pair with largest TotalSize is in first line Flee~Traffic pattern Analysis~ Traffic Pattern • # time, src, dst, size • 10000 (0) (1) 32 • 10000 (0) (2) 4 • 10000 (0) (3) 4 • 10010 (1) (2) 32 • 10010 (0) (1) 32 • (0) (2) 4 • 10010 (0) (3) 4 • 10020 (2) (3) 32 • 10020 (1) (2) 32 • 10030 (2) (3) 4 Heavy! Each src-dst pair gets a path in order of Analysis Record.

  10. # srcdst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … # srcdst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … # srcdst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … # srcdst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … # srcdst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … # srcdst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … Analysis Record Analysis Record Analysis Record Analysis Record Analysis Record Flee~Establishing Paths~ • In order of Traffic Amount: • Search for lowest cost path • Increase the cost of links selected There will be several alternative paths …  Link with high cost is hotspot … Each link has “Cost” (0) (1) (2) (3) 解析結果 Paths are assigned not to disturb previously established paths

  11. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Simulation Environments • Router Model • 4 ports for adj. Routers • 1 port for Core • Network Topology • 4×4 Mesh • 4×4 Torus 16 node 2D mesh Router Core

  12. Applications for Evaluation • App. Traces • Viterbi Decoder • JPEG Codec • IPsec • Uniform (0) Header Analysis (1) Huffman Decode (2) Inverse Quant. (3) I-DCT for Row (4) (5) Yuv-rgb Convert (6) MCU Mapping (7) I-DCT for Col (8) Rgb-yuv Convert (9) MCU Samping (10) I-DCT for Col (11) I-DCT for Row (12) (13) Stream Gen. (14) Huffman Code (15) Quant. Tile mapping example of JPEG Codec ( for Decoder, for Encoder)

  13. Flee Avg Hop count:2.52 DOR Avg Hop count:1.84 Results ~Viterbi @ 2D Mesh~ (Dimension-Order Routing) Communication in Viterbi trace includes Fork and Join. Y-axis: Latency [cycle] 14.2% Improved X-axis:Accepted Traffic [flit/cycle/node]

  14. Flee Avg Hop count:1.87 DOR Avg Hop count:1.48 Results ~Viterbi @ 2D Torus~ (Dimension-Order Routing) Communication in Viterbi trace includes Fork and Join. Y-axis: Latency [cycle] 22.2% Improved X-axis:Accepted Traffic [flit/cycle/node] Flee improves 22.2% of throughput with non-minimal paths.

  15. Flee Avg Hop count:1.01 DOR Avg Hop count:1.00 Results ~JPEG @ 2D Mesh~ (Dimension-Order Routing) In JPEG trace, data is sequentially process. No fork and join pattern. Y-axis: Latency [cycle] No difference X-axis:Accepted Traffic [flit/cycle/node] Communication is between neighbors  No need non-minimal

  16. Flee Known data amount Flee (Incomplete) Unknown data amount Results ~Effect of Traffic Analysis~ Viterbi @ 2D Mesh  All data transfer size is “1” Y-axis: Latency [cycle] Incomplete Flee: Not Improved X-axis:Accepted Traffic [flit/cycle/node]

  17. Flee Known data amount Flee (Incomplete) Unknown data amount Results ~Effect of Traffic Analysis~ Viterbi @ 2D Torus  All data transfer size is “1” Y-axis: Latency [cycle] Incomplete Flee: Partially Improved X-axis:Accepted Traffic [flit/cycle/node] Communication size is key factor to improve performance.

  18. Summary ~Non-minimal routing strategy~ • Stream Processing in NoCs • Strong access locality !! • Too short to distribute path congestions • Flee: Non-minimal routing strategy • Heavy Traffic Comm.  Minimal Paths • Light Traffic Comm.  Avoiding Congestions • Improve 22.2% of Throughput Increase # of alternative paths by introducing non-minimal paths

  19. Thank you for your listening

More Related