1 / 22

Synthesis of Application-Specific On-Chip Networks

Synthesis of Application-Specific On-Chip Networks. ECE 284 On-Chip Interconnection Networks Spring 2013. Application-Specific NoC Synthesis. Last time Described the application-specific NoC synthesis problem

galia
Download Presentation

Synthesis of Application-Specific On-Chip Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Synthesis of Application-Specific On-Chip Networks ECE 284 On-Chip Interconnection Networks Spring 2013

  2. Application-Specific NoC Synthesis • Last time • Described the application-specific NoC synthesis problem • Presented four algorithms based on the idea of set partitioning and rectilinear Steiner trees • Problems • Number of set partitions grows fast with problem size • Long execution times when problem size is big • Results still can be improved

  3. Reminder: The Specification Model iscan rld vld idct arm iquant vopm pad vopr smem acdc upsamp λ(e1) =70 λ(e2) = 357 λ(e3) =16 arm (1, 2) vld (2, 4) rld (1, 4) iquant (2, 2) idct (4, 2) λ(e4) =362 λ(e6) = 353 λ(e5) =362 iscan (0, 4) acdc (2, 0) upsamp (3, 0) λ(e7) = 27 λ(e8) = 49 λ(e9) =300 vopr (2, 1) smem (1, 0) λ(e11) =313 λ(e10)=500 vopm (0, 2) pad (4, 1) λ(e12)=313 λ(e13)= 94 Communication Demand Graph(VOPD application example) Floorplan CDG: annotated directed hypergraph H(V,E,π,λ) • each node vi: a module port with position πi • each directed hyperedge ek = sD = s{d1,d2,…} : a traffic flow with rate λ(ek) • |D|>1 for multicast flows

  4. RRRM: Ripup-Reroute-and-Router-Merging • Iteratively identify increasingly improving solutions • Use a rip-up and reroute procedure for network synthesis and optimization • Find best topology through multicast flow routing • Formulate as minimum directed spanning tree • Optimize topology by resource sharing • Use a router merging procedure

  5. Initial Network Construction v0 v0 v5 v5 l(e5) = 200 l(e5) = 200 R0 R2 R5 (0,5) (0,5) (4,5) (4,5) v2 v2 l(e6) = 400 l(e6) = 400 (2,4) (2,4) l(e1) = 200 l(e1) = 200 v6 v6 l(e3) = 200 l(e3) = 200 (4,3) (4,3) l(e2) = 200 l(e2) = 200 l(e7) = 100 l(e7) = 100 R1 R3 R4 R6 v4 v4 (2,2) (2,2) v1 v1 v3 v3 (1,0) (1,0) l(e4) = 200 l(e4) = 200 (0,0) (0,0) • Allocate routers at cores • Construct fully connected Router Cost Graph (RCG) R5 R0 R3 R6 R4 R1 R2

  6. Initial Network Construction 200 R0 R2 R5 R0 R2 R5 100 200 100 200 200 400 R1 R3 R4 R6 R1 R3 R4 R6 200 100 • Construct initial topology by using direct connections for flows between routers Connectivity graph after initialization RCG after initialization

  7. Initial Network Construction 200 R0 R2 R5 R0 R2 R5 100 200 100 200 200 400 R1 R3 R4 R6 R1 R3 R4 R6 200 100 • Construct initial topology by using direct connections for flows between routers Connectivity graph after initialization RCG after initialization

  8. Flow Rip-Up and Rerouting • Flows are rip-up and rerouted one by one • The current path is deleted and the resources it occupied are released • Each multicast routing step is formulated as a minimum directed spanning tree problem • Multicast Routing Graph and Multicast Routing Tree are constructed to help routing • Chu-Liu/Edmonds Algorithm is used to find multicast routing tree for each flow

  9. Example 2x1 2x3 2x2 1x1 1x2 2x2 200 200 100 100 400 200 400 400 200 400 400 100 400 2x3 2x1 1x3 1x1 2x3 1x1 0x0 1x1 R4 R4 R3 R3 R2 R2 R6 R1 R1 R6 R0 R0 R5 R5 • When flow is rip-up • The current path is deleted • The resources it occupied are released Connectivity before rip-up e7 Connectivity after rip-up e7

  10. RCG Construction 0.80 0.80 R0 R2 R5 0.75 0.40 … 0.75 0.70 0.47 0.35 0.35 0.70 0.70 0.75 R1 R3 R4 R6 0.40 0.72 0.35 0.25 0.80 0.80 0.80 0.90 0.80 0.90 • RCG related to this flow is constructed • Depends on the current network connectivity and resources occupation • Cost of each edge depends on sizes, connectivity of routers, the routing of traffic routed, and the existence of physical links between routers • Opening a new physical channel has a cost • increases switch size, hence power consumption for this and other previously routed flows • Sharing existing physical channel has a cost too

  11. Routing Path Computation 0.80 0.80 R0 R2 R5 0.75 0.40 … 0.75 0.70 0.47 0.35 0.35 0.70 0.70 0.40 0.75 R2 R5 0.70 R1 R3 R4 R6 0.40 0.72 0.75 0.70 0.75 0.35 0.25 0.35 0.45 0.80 0.80 0.70 0.70 0.80 0.90 0.47 0.80 0.35 0.90 R4 R6 0.72 • Multicast Routing Graph is constructed from RCG RCG MRG

  12. Flow Rerouting 0.40 R2 R5 0.70 0.75 0.70 0.75 0.40 R2 R5 0.35 0.45 0.70 0.70 0.47 0.35 R4 R6 0.35 0.72 0.35 R4 R6 • Multicast routing is formulated as finding a minimum directed spanning tree • Multicast Routing Tree is solved from MRG using Chu-Liu/Edmonds algorithm MRG MRT

  13. Flow Rerouting -- 2 1x1 1x2 2x2 200 400 200 400 1x1 2x2 2x2 300 400 2x3 1x1 0x0 1x1 100 100 400 200 400 400 100 2x3 2x1 1x2 1x1 R3 R4 R3 R4 R2 R2 R1 R6 R6 R1 R0 R0 R5 R5 • After path is determined, routers and links on the chosen path are updated Connectivity before reroute e7 Connectivity after reroute e7

  14. After Rip-Up and Reroute • Path of each flow is decided • Sizes of routers and links, connectivity of routers and links are decided • Network topology is decided • Implementation cost is decided

  15. Router Merging 2x3 2x3 300 300 v5 v5 v0 v0 100 400 100 400 400 200 200 400 R6 R2 R2 R1 R4 R1 R4 v3 v3 v6 100 400 500 400 2x3 1x2 2x1 2x3 2x3 • Adjacent routers are considered merging to reduce implementation cost • Connected routers can be merged to reduce ports and cost • Routers connected to same router can be merged to reduce ports and cost • Routers are merged iteratively until no improvement can be done

  16. Experiment Setup • Used GeoSteiner-4.0 as RST solver • Used Parquet[Adya03] as floorplanner • Benchmarks: • Unicast: 3 groups of benchmarks

  17. Experiment Setup -- 2 • Benchmarks: • 2D Multicast: generated using bandwidth-version of Rent’s rule • Varying hop counts and data rate distribution • 10% are multicast flows with different group sizes • Objective: optimize the total power consumption, including leakage power and switching power

  18. Experiment Setup -- 3 • NoC model library: • Use power simulator Orion [Peh03] to estimate router powers • 70nm, 1GHz frequency, 128bits/flit, 4flits/buffer • Use link power model from [Chen06,Zhang07] • RC wires with repeated buffers

  19. Experiment Setup -- 4 • Algorithms evaluation • Compare CLUSTER, DECOMPOSE, PERTURB, R_PERTURB, and RRRM with a full-mesh implementation • Also compare with “optimized” mesh, meaning we remove routers and links that are not traversed by flows when using XY routing • Evaluate power, performance, area, execution time of each algorithm • Ran on 1.5Ghz Intel P4 processor with 512MB memory

  20. Power Results unicast benchmarks multicast benchmarks

  21. Avg. Hops and Router Area Results Avg. Hop Counts Router Area

  22. Execution Times

More Related