1 / 20

MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS

This research focuses on reducing power consumption in on-chip networks by applying various power-saving techniques such as clock and signal gating and gate-level optimizations. The study analyzes power consumption breakdown and explores the potential of low-power link technologies and architecture optimizations for further power savings.

elsiee
Download Presentation

MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK

  2. Communication-Centric Architectures • Future performance gains will primarily come from increasing the number of IP cores in a system not their complexity or operating frequency • Many reasons: • Diminishing returns from simply scaling what we have • Energy efficiency • Complexity • Fault tolerance • Economics

  3. On-ChipNetworks • An efficient general purpose chip-wide communication infrastructure is becoming essential • One flexible networking option is to use packet-switched networks with support for virtual-channels

  4. TILE Traffic Generator, Debug & Test Lochside Chip (2004/05) 180nm Technology R The Lochside Router • Router Architecture • Highly parameterised implementation • Packet-switched network with virtual-channel flow-control • Best case latency is one cycle per network hop. • Results presented here are from post P&R simulations targeting a 90nm technology

  5. Exploiting Speculation to Reduce Communication Latency Peh/Dally (2001)

  6. Exploiting Speculation to Reduce Communication Latency

  7. Aims of this work • Apply existing power saving techniques to an on-chip network design • e.g. clock and signal gating, gate-level optimisations etc. • Importance of applying such techniques before making comparisons • Measure power consumption and provide an accurate breakdown of where the remaining power is dissipated • Where is best place to look for future power savings?

  8. Measuring and Optimizing Dynamic Power • Our Test Case • 8mm x 8mm die • 4x4 mesh network • Low-latency routers, best case latency is one cycle per hop (incl. interconnect) • 1.2V, 90nm technology • 4 input-buffers/ VC • 4 VC/ input port • 48 x 80-bit network links • 800MHz @ WC PVT • ~32 FO4 clock period • Results reported at 250MHz

  9. Interconnect Delay/Energy Trade-offs • Power dissipated in network links depends on how links are spaced and buffered • At least a factor of 3 difference in energy consumption over range of potential interconnect options • Could move to low-swing differential schemes for even greater energy savings For results we assume min. spaced wires, opt. energy x delay product

  10. Clock Gating • Clock gating optimisations applied at two levels: • Local Clock Gating • Automated clock gating within router • Some tuning of RTL involved to maximise opportunities for synthesis tool • Router Level Clock Gating • Exploit opportunities to gate clock as it enters the router • Isolates router’s clock completely, only static power consumption remains

  11. Router-Level Clock Gating • Clock gating exposes clock tree insertion delay • Need to know early if router will be required • Generate ‘early valid’ signals in neighbouring routers • Early-valid signals are slightly pessimistic • Based on what is requested not granted

  12. Gate-Level Optimizations and Signal Gating • Automated signal gating and gate-level power optimisations had minimal impact • Inserting signal gating logic manually did reduce input FIFO power requirements significantly • The reported results could be further improved (by 12%) by enabling logic optimisation across module boundaries • This was restricted to accurately determine where power is dissipated

  13. Analysis of Power Consumption Power consumption of a single router and its links • Simple power optimisations can quarter power requirements + many more opportunities to save power • Network is ~5% of core area • Perhaps 10% of system power at present • Don’t make comparisons without optimizing power!

  14. Analysis of Power Consumption • 22% Static power, 11% Inter-Router Links • ~1% Global Clock tree • 65% Dynamic Power • Power Breakdown • ~50% of dynamic power is consumed in local clock tree and input FIFOs • ~30% on router datapath • ~20% on scheduling and arbitration • Scheduling is probably more complex than typical implementations due to speculation

  15. Low-Power On-Chip Networks • Interconnect and static power set to increase • Many low-power link technologies • Low-swing differential techniques • Power gating and other leakage reduction techniques • Potential power savings begin to require lots of different techniques – no one silver bullet?

  16. Low-Power On-Chip Networks • Topology • Don’t want to sacrifice general or at least multi-purpose nature of our networked SoC • Results suggest higher radix routers and longer interconnects could reduce power • Probably not a long term solution • Reduces path diversity, bad for fault-tolerance • Architecture • Scope for minimising memory required to store precomputed router schedule (particular to our router) • Simpler routers • Single cycle routers reduce power? Speculation for low-power?

  17. Supporting Best-Effort (BE) and Guaranteed Services (GS) Efficiently • Current timing of the datapath and link suggests additional GS data could be routed in the same clock cycle • Allocate datapath/link to GS traffic for first ½ of clock cycle • Double capacity of network • Exploit simpler GS circuit-switched routing when possible • Reduce power • Very little additional overhead

  18. Clocking On-Chip Networks • Network system timing issues are interesting • naturally event-driven not synchronous • Work is investigating placing local data-driven clock generators in each network router • Clock is stretched when no data to be routed • Clock matches rate of incoming data streams • Robust synchronisation solution (true GALS) • Also investigating incorporating power gating support • See also Distributed Clock Generator – DCG (Fairbanks/Moore)

  19. Challenges and Future Work • These are early results in a much more rigorous study on the power requirements of networked on-chip comummunication • Much more soon! • Exploiting a general-purpose on-chip network • Exploiting execution diversity to improve energy-efficiency • Multi-use platforms and Virtual-IP • Fault tolerance • Networks of processing elements or networks that process? • Scope for removing unnecessary interfaces and boundaries • Impact of networking on IP and processor core design

  20. Thank You

More Related