230 likes | 305 Views
This paper presents the design and synthesis of irregular Application-Specific NoCs, highlighting advantages and challenges compared to regular networks, with a focus on minimizing power, latency, and hardware costs.
E N D
Presentation ofDesigning Efficient Irregular Networks forHeterogeneous Systems-on-ChipbyChristian Neeb and Norbert WehnandWorkload Driven Synthesis of Irregular Application Specific NoC'sbyBen Meakin
Introduction N1 N2 N3 N1 N2 N3 • Irregular vs Regular Networks N4 N5 N6 N4 N5 N6 N7 N8 N9 N7 N8 N9 • Advantages: • Efficient for general purpose applications • Easy routing • Easy floor planning/P&R • Disadvantages: • Expensive • Under utilized channels • Avg. latency not as scalable • Advantages: • High performance/low power for specific application • Lower hardware cost of network • Disadvantages: • Difficult routing • Difficult floor planning/P&R
Background and Motivation • NoC power a function of network hops: • Pperflit = (buff + xbar + arb)*Hops + wire*D • NoC latency a function of network hops: • L = Rpipeline * Hops (clock cycles) • NoC hardware cost a function of router radix: • C = sum (k1*radix_n + k2) n = 1....nodes • Minimize hops and router complexity!
Design of Efficient Irregular Networks for Heterogeneous Systems-on-ChipChristian Neeb and Norbert WehnUniversity of Kaiserslautern9th EUROMICRO Conference on Digital System Design 2006
Application Communication Model • Resource Communication Graph (RCG) • Vertex: hardware component • Edge: abstract communication channel • Edge Weight: average communication rate (bandwidth) • Normalized to physical bandwidth of one channel link • Efficient mapping of task communication to RCG is assumed and is not discussed in this paper
INOC Architecture • Nodes are tiles consisting of a chip resource and a router • Routers: Wormhole with virtual channels • Round-Robin switch arbitration • Routing: • Deterministic: one output channel per destination address • Adaptive: > one output channel per destination address
Routing in Irregular Topologies • Channel Dependency Graph (CDG) • Vertex: channel • Edge: pair of channels with a dependency due to routing function • Deadlock free if there are no cycles • Node Numbering • Arbitrary in this paper • Channels Increasing / Channels Decreasing • Acyclic CDGs • Determined by source/destination node numbers • Deadlock Free Routing • Route packets on increasing channels, then decreasing • General form of dimension order routing
Routing in Irregular Topologies • Unsafe paths allowed provided there is a safe path (acyclic) that a packet can escape to • Routing table filled via Dijkstra's algorithm • Virtual channels are used to expand routing function using similar rules as dimension order schemes
Network Generation • Start with a bidirectional connection of all nodes in ascending order • Ensures a correct routing path exists • Add shortcuts based on network traffic estimation and RCG • Provides an optimal routing path that will be used in the common case
Results • Network traffic patterns randomly generated with constraints to model heterogeneous SoC traffic • Simulated with 100 different patterns • Variable flit injections rates
Limitations • Naive implementation of “safe path” increases cost/complexity of network • RCG is the most efficient topology • More efficient deadlock free solutions may exist • Efficient mapping of TCG to RCG is assumed • Can be a difficult problem • Network generation could be linked to a specific programming model or API for simple TCG to RCG mapping
Workload Driven Synthesis of Irregular Application-Specific NoC's Ben Meakinmeakin@cs.utah.edu
Project Objectives • Synthesis tool for INoC's • Generation of optimal network topology • MCAPI workload specification • Synthesis of deadlock free routing • Comparison of irregular to regular NoC performance, power, and cost
Motivation • CAD for ASICS always needs improvement • Reduce design cost, time-to-market, etc. • Synthesis of INoCs could be useful in implementing parallel algorithms in programmable logic • Average router complexity needs to be minimized • Asynch NoCs can realize improvement in average case performance • GALS INoCs could be future direction in heterogeneous SoC • Minimize hops to reduce power and latency
Specification of Workload • Communication channels: ScalarChan N1 N2 256e9 10 Comm Type Sender Receiver BW (bits/sec) Priority • Optimization Effort: E = 100 * ((BW / Norm)/2 + (PR / Norm)/2)
Network Synthesis Algorithm • Sort channels by priority/effort • Add all nodes to network • For each channel: • if sender has links: • while link has not been added • if receiver has links, add link; else increase search depth • else recursive call from next sender neighbor node • while not done • if network is not correct, add link; else done
Network Synthesis Example • Max Radix is 3 • N1 – N2 added • Highest priority link N1 N2
Network Synthesis Example • N1 – N4 added • N1 – N3 added • N3 – N4 added N4 N1 N2 N3
Network Synthesis Example • N1 – N5 added • Max radix of N1 reached, so N2 used as via • N6 – N7 added • N5 – N8 added • Are we done? N4 N1 N2 N6 N8 N3 N5 N7
Network Synthesis Example • N2 – N6 added for correctness • Every node must have a path to every other node N4 N1 N2 N6 N8 N3 N5 N7
Synthesis of Routing Function • Rename nodes with naming algorithm • Depth first, breadth first, root node selection • Most efficient synthesis for application • For each node pair in workload • check if shortest path is correct • if not, add a virtual channel to make it correct • Most correct synthesis for general purpose • For ALL node pairs in network • check if shortest path is correct • if not, add a virtual channel to make it correct • Synthesize ROM for each node
Uses of this Tool • Incorporate with previous MCAPI hardware implementation • Allow user to synthesize a working HDL model based on MCAPI workloads and a set of IP blocks • Modern FPGA's could make this a feasible and cost effective alternative to fabricating ASIC's • Aid for studying optimal topology for known network traffic