1 / 53

By: Jabulani Nyathi Washington State University School of EECS April 30, 2009

Circuits and Architectures to Deliver Low Power and High Speed Systems. By: Jabulani Nyathi Washington State University School of EECS April 30, 2009. Outline. CMOS Scaling Its benefits and The challenges it brings about Various Techniques for Limiting Leakage Currents Their shortfalls

shilah
Download Presentation

By: Jabulani Nyathi Washington State University School of EECS April 30, 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Circuits and Architectures to Deliver Low Power and High Speed Systems. By: Jabulani NyathiWashington State UniversitySchool of EECSApril 30, 2009

  2. Outline CMOS Scaling Its benefits and The challenges it brings about Various Techniques for Limiting Leakage Currents Their shortfalls Bridging the speed-Power Gap The Tunable Body Biasing Scheme Emerging Devices and Technologies Concluding Remarks

  3. CMOS Scaling and its Benefits Aggressive CMOS scaling has been a very positive development allowing: Fast switching devices, thus high speed computing. Massive integration due to miniaturization No longer do we need multiple chips to implement a microprocessor and its peripherals In fact, we can now have multiple computing elements on a single die resulting in system on a chip.

  4. CMOS Scaling and its Challenges CMOS scaling results in: increased leakage currents (5X/node) and Increased dynamic power dissipation. The interconnect does not scale as fast as the transistor thus Highly integrated designs require elaborate clock distribution schemes. IPs within a System on a Chip would be difficult to synchronize with a single clock source.

  5. Scaling Implications Local Interconnects Module1 Scaled Global Interconnects Global Interconnects Module2

  6. Dynamic Vs Leakage Power

  7. Research Motivation • Desire to Bridge the Speed-Power Gap by Exploring the feasibility of optimizing devices to operate effectively in both sub-threshold and above threshold voltages. • Emerging Technologies that are Ultra-Low power can benefit from increased speed. • Wearable computers, sensor networks, implantable medical technology • Emphasis on design for energy-efficiency

  8. Existing Low Power Design Approaches • Solve energy dissipation problem from a region of operation standpoint • Sub-threshold design • DTMOS: shows a 5.5 times increase in current • Dynamic threshold provides energy efficiency • SBB: 4.4 times frequency increase • Above threshold (Super-threshold) design • MTCMOS: high and low threshold devices • VT Scheme: reduce power by 50% using ABB and “sleep”/“active” modes • Architectural • Gating Techniques: 45% of total power

  9. SBB, DTMOS, TBB 1.8 V 600 mV Traditional DTMOS/SBB Output Voltage Clamping

  10. Proposed Approach • Change approach to include all possible operating regions: Tunable Body Biasing (TBB) • Sub-threshold and super-threshold operation bridged • Ultra-low energy and low speed or high energy and high speed • Utilize body biasing to improve performance of sub-threshold operation • Target increased performance at sub-threshold and slightly above threshold. • Save energy by eliminating idle time and process continuously with variable power supplies (perform just in time task completion) • Target applications • Mobile, battery operated (power constrained), variable processing devices • Cell phones, PDAs, notebooks, wireless sensors, embedded systems, ASICs, medical technology, etc.

  11. TBB Implementation • Goals • Attain ON state current gain while minimizing OFF state leakage current increase • Highlight advantages of sub-threshold operation while allowing super-threshold operation if needed • Control bulk terminal to tunable potentials depending on VDD and desired region of operation • MOS Bulk Control Circuits • Multiplexer-based approach • Two transistors per bulk control circuit • Utilizes Vthn0

  12. TBB Bulk Control Circuits • Relies on passing of good/poor logic “1” and logic “0” properties of pass-transistors • Requires external control signals • SubVt and SubVt_b

  13. TBB Bulk Control Circuit Simulation Super-threshold: pBulk = VDD – Vthn0 Sub-threshold: pBulk = 0 V

  14. Device Optimization • TBB encourages varying supply voltages • How will devices be sized for optimal operation at any supply voltage? • Maintain symmetric switching • Examine inverter at varying supply voltages

  15. Device Optimization (Switching Point)

  16. Sub-threshold Noise Margins • Noise Margins significant for proper logic levels • TBB and Traditional static CMOS inverter have comparable noise margins • TBB VIH is 12.5% worse • TBB VIL is 14.3% better

  17. Propagation Delay Gate Traditional Delay TBB Delay % Decrease TG 98 ns 14 ns 86 Inv 125 ns 20 ns 84 NAND 133 ns 18 ns 86 NOR 163 ns 25 ns 85 XOR 289 ns 40 ns 89

  18. Review of SubVth Circuits Benefits • So far, the presentation has shown: • TBB requires control of MOS bulks to span the operating regions of interest. Implementation is successful. • Study of simple logic gates showed: • TBB gives a dramatic speed increase (up to 7x) • Static CMOS design style is suitable for sub-threshold and super-threshold operation • Sizing of efficient devices for the TBB approach is possible • However, how will a complex system perform? • Design with previous knowledge (logic style, sizing) • Analyze post-layout simulations

  19. Complex System-on-Chip Design Using TBB Work addresses the challenges of Global Interconnect Delays Clock distribution Synchronization of unrelated clocks and Power dissipation

  20. Conclusion • TBB scheme has been devised to span all regions of operation from ultra-low power to high-speed. New kind of body biasing • Forward-biasing causes exponential sub-threshold current gain • Leads to 7 times frequency increase in simple logic gates • Focus on sub-threshold and slightly above threshold to utilize leakage • Bulk control circuits are effective • 4% area and 8.9% power dissipation increase • Static CMOS is ideal overall design style • Device sizing at either sub-threshold or super-threshold allows efficient operation with variable supply voltages

  21. Concluding Remarks • Allowing tunable operation allows the designer to choose operating point (kHz, MHz, GHz) – Energy Dissipation is affected. • Other schemes do not offer this flexibility • TBB can lead to significant energy savings • LFSR results show TBB gives: • Maximal 5.7 times speed increase (sub-threshold) • Comparable energy at super-threshold and favorable at sub-threshold • Favorable EDP at all operating regions • Operate at the same speed with less energy dissipation • Idle state leakage current can be minimized by collapsing the supply voltage

  22. Integrating Research Into Instruction • Data Path Circuits • Memory Design • Sub-System • ROUTER CHIP

  23. Incorporating Research into Instruction • A long term objective is to place some of the integrated chips on development boards such as those Digilent Inc produces. • The integrated chips become part of a system and can be used in some of our low level courses. • Most important is the use of these programmable boards to show case the research outcomes, particularly to visiting prospective students. • A sample development board:

  24. Questions and Comments Welcome!

  25. Multiple Clock Domain Synchronization

  26. Reducing Interconnect Delays • Improved latency and bandwidth • Global interconnects are pipelined at or near the rate of computation

  27. Sources of Power Consumption Most straight forward method to reduce power consumption from any source is to reduce VDD Controlling frequency directly manipulates dynamic power Controlling device threshold manipulates leakage current, affecting leakage and short circuit power.

  28. Distributed FIFO Control Circuitry

  29. Traditional vs. Tunable Body Biasing The synchronizer/buffer shows an increase in performance at sub-threshold voltages when using tunable body biasing

  30. Tunable Body Biasing

  31. Pursuit of Low Power Operation It is likely that not all IP blocks in a SoC need to operate at high speed Power dissipation for those IP blocks could be reduced by operating at a lower voltage TBB offers the possibility to dynamically operate at either sub-threshold or super-threshold voltages

  32. Variable Voltage SoC Consider a SoC with 50 IP blocks, each requiring communication at a rate of 10 MHz Each IP could operate at sub-threshold levels The channel could operate at super-threshold voltages while the IP blocks are in sub-threshold Vdd1 Vdd4 Vdd5 Vdd2 Vdd3

  33. Idle vs Operating Power During idle periods, it is advantageous to reduce leakage current by Reducing the power supply voltage or Increasing the threshold voltage (e.g. bulk voltage manipulation)

  34. Speed at Varying VDD TBB 5.7x Faster At 376.2 mV TBB 20% Faster At 1.8 V

  35. Energy-delay Product EDP of TBB outperforms Traditional at ALL operating regions, significantly in super-threshold

  36. 1.1 GHz with 3.85 nJ/cycle 3.9 MHz with 0.6 fJ/cycle 222.2 MHz with 103 fJ/cycle Regions of Operation

  37. Contributions of this work Proposed scheme alleviates the communication bottleneck and offers a way to synchronize SoC multiple clocks Perform data transfers up to 10 GHz Proposed scheme maintains high performance under the influence of any clock skew 6.5 GHz for any process corner and any skew Low power FIFO scheme with a small impact on area when used in SoCs with many modules

  38. Contributions of this work Process corners have a minor impact on performance, resulting in a 10% reduction of speed The optimal voltage for minimum energy consumption per transaction is at 2Vth Introduction of TBB to address leakage and dynamic power dissipation 500% increase in performance at sub-threshold voltages with a modest 80% increase in power 5-10% less power dissipation than traditional body biasing

  39. Summary of Proposed FIFO Scheme Linear FIFO scheme that addresses Signal propagation across communication channel Sustained throughput over long distances Successful Synchronization Synchronizes equal, rational & arbitrary clocks 6.5 GHz sustained performance after process corner analysis using 3 stages. Compared to CN scheme Fewer devices per stage, fewer stages needed 25% higher performance, 12% lower power Operates at both super- and sub-threshold voltages Lower instantaneous power demands from local clocks (less di/dt) Optimal energy per transaction at 0.7V in a 65nm process Sub-threshold reduces power by 3 orders of magnitude Tunable Body Biasing provides 50% increased performance in sub-threshold while maintaining super-threshold operation

  40. At 90 nm, the % difference is much less At 180 nm, TBB sub-threshold static power % is large Total TBB sub-threshold power is large Total TBB sub-threshold power isn’t so large TBB Scalability

  41. LFSR Energy vs. Frequency

  42. TBB Implementation Cont.

  43. TBB Implementation Cont.

  44. Logic Gate Analysis (Power)

  45. Inverter Power Dissipation

  46. Logic Gate Analysis (Energy)

  47. Logic Gate Analysis (EDP)

  48. Logic Gate Analysis (Fan-in)

  49. Logic Gate Analysis (Logic Styles)

  50. LFSR Power Dissipation

More Related