1 / 36

A High Throughput Network-on-Chip Architecture for System-on-Chip Interconnect

A High Throughput Network-on-Chip Architecture for System-on-Chip Interconnect. Abdelhafid Bouhraoua and M.E.S El-Rabaa Computer Engineering Department (COE) College of Computer Science and Engineering (CCSE) King Fahd University of Petroleum and Minerals (KFUPM)

mbuss
Download Presentation

A High Throughput Network-on-Chip Architecture for System-on-Chip Interconnect

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A High Throughput Network-on-Chip Architecture for System-on-Chip Interconnect Abdelhafid Bouhraoua and M.E.S El-Rabaa Computer Engineering Department (COE) College of Computer Science and Engineering (CCSE) King Fahd University of Petroleum and Minerals (KFUPM) Dhahran, Eastern Province, Saudi Arabia

  2. Outline • Networks-on-Chips • State of The Art • Fat Tree Network Properties • Modified Fat Tree • Router Architecture • Performance Evaluation • Conclusion

  3. Outline • Networks-on-Chips • State of The Art • Fat Tree Network Properties • Modified Fat Tree • Router Architecture • Performance Evaluation • Conclusion

  4. Semiconductor Industry Future Technology Evolution is Faster than Design Evolution

  5. Chip Design Methodology • Chip Complexity: 100M – 3B. • ASIC Methodology Not Suitable • RTL  Synthesis  Back End • Impossible to handle Full RTL Design for the whole project • IP-Reuse Based Methodology • Opens a Wide Range of Possibilities • IP Blocks Together On a Chip  “System-on-Chip” From ASIC  System-on-Chip Era

  6. SoC Constraints • Very Short Time-To-Market • Compressed Schedule • Very Short Lifecycle • Low Development Cost • Small Team • High Complexity • Available Silicon Resources to Produce Cost-Effective Highly Integrated SoCs. • Broad Range of IP Blocks • Impossibility to know them all

  7. SoC Methodology Main Task: Integration of The IP Blocks • System Level Integration • Data Formatting and Conversion • Protocol Interfacing • Control Interfacing • Interconnection Level Integration • Signal Interfacing • Data Transfer Interfacing • Wire Interconnect and Back-end Integration

  8. Interconnecting The IPs Brute Force Method: • Design The Interface Block for Every Pair of IPs in the SoC • Point-to-Point Communication between IPs Problems: • Design Effort Similar to that of a new IP Block • If 20 Different Blocks  Around 400 New Designs !!!! • Point-to-Point Communication  Wiring Mess • 50 Blocks; 8bits bidirectional  More than 20,000 Globally Routed Wires Should Look For a Better Way

  9. Networks-on-Chips “Route Packets NOT Wires”, William J. Dally Idea: Build a Complete on-Chip Network • Unified Communication Model (Similar to OSI Stack) • No Ad-hoc Effort • Standardized Interfacing (May be provided by IP Vendors) • Unified Network Elements (Routers, Link Interfaces) • No Design required by the SoC Teams • Flexible Interconnect and Reduced Global Wiring

  10. NoC Requirements • Performance • How fast packets are moved across the network? • How much traffic is carried at the same time and for how long? • Overhead • How Big is its required Size (in Gates) ? • Adaptivity • Does it Adapt Easily to new Designs ? • Complexity • How Easy is Interfacing to it ?

  11. Outline • Networks-on-Chips • State of The Art • Fat Tree Network Properties • Modified Fat Tree • Router Architecture • Performance Evaluation • Conclusion

  12. Previous Work • Majority directly derived from other research (Interconnection Networks for Parallel Architectures) • Reproduce what has been learned in the area of inter-chip networks, • Focus on the router architecture alone to achieve certain goals in latency • Asynchronous design of NoCs, mainly GALS • Circuit switching techniques introduced to provide a certain guarantee for the latency. • Did not fully take advantage of the fact that the network is on-chip where the main gain is no-pin limitation. • Router architectures directly derived from inter-chip architectures where the routers were implemented on a single chip  substantial overhead. • Added complexity to achieve guaranteed latency is an overkill in the on-chip context.

  13. Which Network? Most Straightforward  Crossbar • Good Throughput (maxes at 66%) • Non Scalable (Quadratic) • Complexity Of Implementation for Higher Number of I/Os.

  14. 2-D Mesh • Very Popular Topology in NoCs. • Very Suitable for the 2D nature of Chip Floorplanning (Tiling) • Very High Constraints • Inefficient routing algorithms (deadlock-free by construction) • Efficient routing algorithms (Complex implementation) • Poor performance: Saturation reached at 30 %.

  15. Analysis • Low throughput. Means: latency cannot be guaranteed above the maximum throughput levels • Low throughput cause by contention over the output ports of routers among several incoming packets • Cannot prevent contention from happening. Contention makes router architectures more complex because they need to integrate buffering and prioritization logic. • Routers that implement both packet and circuit switching makes the architecture even more complex.

  16. Methodology • Take advantage of the On-Chip Context: • Design frozen before tape out • No internal IO limitations • Aim for a High Throughput Architecture • Circuitry used at 30% of its maximum is NOT an optimal Solution (Clock frequency, power). • Reduced router size • Integrate a large number of routers • Wormhole routing vs. Store and Forward • Reduce required buffers in routers

  17. R R R R R R R R R R R R C C C C C C C C Fat Tree What topology resembles a crossbar? Banyans or Multistage Interconnection Networks. • Bidirectional multistage or folded multistage networks • Bidirectional multistage are two entities: • The Fat Tree (FT) • The butterfly. • Fat Tree better than butterfly (previous work) • n+1 Stages (or rows) • Size is • Routers = n x 2n • Clients = 2n+1 • Diameter = 2logk + 1; n = log k

  18. Outline • Networks-on-Chips • State of The Art • Fat Tree Network Properties • Modified Fat Tree • Router Architecture • Performance Evaluation • Conclusion

  19. UP Router LEFT RIGHT Routing in Fat Tree Routing reduced to routing in a binary tree. Binary Trees Three Routing Directions • UP • RIGHT • LEFT

  20. [0, l[ U [u, 2n+1 -1] Router (r,c) [l, l+2r-1[ [l+2r-1, u[ Routing in Fat Tree • Matrix n rows x 2(n-1) columns. • Router (r,c) • r : row index (rows are indexed from 0 to n-1) • c: column index (columns are indexed from 0 to 2(n+1) -1) • Size of the clients’ address space reachable using the downside ports is equal to 2r • It is always a continuous interval of addresses of the form [l, u[. • Lower bound l : smallest address reached from the router (r,c). • Smallest address within the range obtained by clearing the lowest r bits of the column c. • l = (c/2r) x 2r. • Upper bound u: largest address reached from the router (r,c). • Largest address obtained by adding 2r to the lower bound l. • u = l+ 2r.

  21. R R R R R R R R R R R R C C C C C C C C Routing In Fat Tree “Summit” Routers Alternate Paths Routing UP: Adaptive Routing Down: Deterministic

  22. Outline • Networks-on-Chips • State of The Art • Fat Tree Network Properties • Modified Fat Tree • Router Architecture • Performance Evaluation • Conclusion

  23. Contention in Fat Tree • Packets coming from the UP links are never routed up • Only packets coming from the bottom links are routed up. • Since the number of UP links is equal to the number of bottom links, there cannot be any contention when routing up. • Contention occurs only when going down. • Bottom links are split in RIGHT and LEFT links, deterministic routing of packets will lead to contention. UP Contention on the way down Many Choices for Going UP LEFT RIGHT

  24. R R R R R R R R R R R R C C C C C C C C Modified Fat Tree Doubling of downward links eliminates contention

  25. Outline • Networks-on-Chips • State of The Art • Fat Tree Network Properties • Modified Fat Tree • Router Architecture • Performance Evaluation • Conclusion

  26. k k Right Inputs Left Inputs 2k + 2 2k + 2 Router Architecture • No Crossbar • No Buffers (Pushed to the Clients) • Every downstream input simultaneously connected to two outputs. • Contention eliminated between the inputs going downstream. • Number of outputs is 2k+2 for k inputs (case of when the router is a summit) • Router models differ from each other only by two items: • Number of input and output ports on the down link • Routing function constants (r,c)

  27. Address A B ≥ < LEFT l A B ≥ < RIGHT L A B ≥ < u UP Routing Circuitry • All network elements are constants and frozen at design time. • All lower bound and upper bound values, used to generate the routing functions, are constants for each router. • These constants are entered as inputs into the routing function • Routing Function implemented using comparators. • Constants needed by the routing function are: • l • L = l + 2r-1 • u

  28. FIFO FIFO FIFO FIFO FIFO Client Interface • Buffers pushed to the Client Interfaces • Each incoming link is terminated with a FIFO memory. • The different FIFO memories connected to the client through a single shared bus. Up Link Down Links (from router) Client/IP Block • Bus can be wider to perform data transfers faster than what is received in the FIFOs. • The size of FIFOs customizable by design team according to the specifications

  29. Outline • Networks-on-Chips • State of The Art • Fat Tree Network Properties • Modified Fat Tree • Router Architecture • Performance Evaluation • Conclusion

  30. Simulation Conditions • Uniform Traffic Generation • Uniform Distribution of Destinations • Traffic Rate constant fraction of Maximum Link Bandwidth • Variable Packet Size (within a predetermined range; eg. 64 bytes +/- 10%) • Simulation Platform: Cycle-based C-based. Developed for this purpose

  31. Throughput • More than 90% Throughput achieved • Compare with Regular Fat Tree

  32. Latency

  33. Area and Speed Buffer-less architecture less costly

  34. Client Buffer Utilization • Buffers pushed to the client interfaces. • Considerable number of buffer lanes is necessary for every client interface. • Simulations shows a linear progression of the maximum number of lanes used during operation. • Obtained figures are an order of magnitude lower than the number imposed by the architecture. • Number of buffer lanes in the client interface can be tailored to suit the class of applications at hand while reducing buffering area.

  35. Outline • Networks-on-Chips • State of The Art • Fat Tree Network Properties • Modified Fat Tree • Router Architecture • Performance Evaluation • Conclusion

  36. Conclusion • A contention-free modified FT architecture is proposed. • Proposed architecture achieves maximum theoretical throughput and has smaller latency than conventional FTs. • Latency increases linearly with input load. • Achieved performance is actual performance using a contention-free network. • The area of the network is kept small because of the absence of buffers in the router architecture. • Number of buffer lanes in the client interfaces can be tailored for a specific platform to suit the class of applications at hand while reducing buffering area.

More Related