1 / 184

Part I: Low Power SoC Design Methods

Part I: Low Power SoC Design Methods. Sungjoo Yoo Embedded System Architecture Lab. POSTECH. sungjoo.yoo@postech.ac.kr. Agenda. Introduction Low power design issues Solutions overview Clock/power gating, DVFS (dynamic voltage & frequency scaling), …

leoma
Download Presentation

Part I: Low Power SoC Design Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part I:Low Power SoC Design Methods SungjooYoo Embedded System Architecture Lab. POSTECH sungjoo.yoo@postech.ac.kr

  2. Agenda • Introduction • Low power design issues • Solutions overview • Clock/power gating, DVFS (dynamic voltage & frequency scaling), … • Power Consumption Characteristics and Estimation • Processor and hardware blocks • System-level and RTL Lower Power Design Methods • Dynamic power management • System-level clock/power gating management • RTL design for system-level clock/power gating • Dynamic voltage scaling • DRAM Memory and Battery • Recent Trends in Low Power Design • EPI (energy per instruction) throttling • Summary

  3. Power Consumption Issues Range … Columbia River Google Computing Center, Oregon >100W/chip ~10W/chip <<1W/chip (e.g., RFID) ~1W/chip

  4. [J. Hamilton, 2008] Data Center Case

  5. [E. Chung, 2009] Cell Phone Case • Power consumption for running a streaming video applications

  6. Low Power Design Methods: Spectrum • - New device/material • (e.g., HK/MG, FinFET) • Multi-Vt/Tox/Lg • Clock gating • Power gating • Multi-Vdd, DVFS • Parallelization • QoS vs. energy Software Architecture Circuit Process/library

  7. [LPMM, 2007] Switching Power

  8. [LPMM, 2007] (Sub-Threshold) Leakage Power Vth = kT/q (25.9mV at room temp) n = 1.0~2.9 ISUB ~ e-VT ISUB ~ T2*e1/T Inversely proportional to VT (=VT0 – gVBS) Proportional to VBS (body bias) Proportional to T (temperature)

  9. [LPMM, 2007] Trend • Scaling trend • Vdd: 1.2v (90nm)  1.0v (45nm, 1.2v in LP) • # TR’s: 4x in 45nm w.r.t. 90nm • DVT: ~DVdd in ratio Compute density k3 Leakage power density k2.7 Active power density k1.9 ISUB ~ e-VT

  10. Power Density Comparison High-end CPU ~100W/cm2 vs. Human brain 15mW/cm3

  11. [Rabaey, 2009] Human Brain

  12. Low Power Design Approaches • Switching power • Less switching (less work, less power consumption) • Removing redundant work • Clock gating  dynamic power management • Voltage scaling • With more area: parallelization • With more delay: low frequency operation and pipelining • With dynamic voltage scaling • Leakage power • Power gating  dynamic power management • And exploiting stacking effect • Multi-Vth/Tox/Lg design • Body bias • Cooling  dynamic thermal management

  13. CPU 3D Audio Video CPU 3D CPU 3D CPU 3D Audio Video Audio Video Audio Video CPU CPU CPU 3D 3D 3D Audio Audio Audio Video Video Video CPU CPU 3D 3D Audio Audio Video Video Dynamic Power Management: Chip-level View mpeg4 run mpeg4 done game run game done mp3 done mp3 run 1 2 7 8 13 23 28 29 30 t (min)

  14. Dynamic Voltage & Frequency Scaling • Principle: Exploiting Power ~ V2 & V ~ freq Frequency (=Vdd) Frequency (=Vdd) F F F/2 E = F2*D/2 E = (F/2)2*D = F2*D/4 Time Deadline (D) Deadline (D) D/2 Key: workload prediction! Energy reduction by 75%!

  15. [E. Chung, 2009] Dynamic Thermal Management • Reactive DTM • At high temperature, reduce work! • Less instructions, lower clock/Vdd, etc. • These techniques are triggered when a thermal limit is reached

  16. Agenda • Introduction • Low power design issues • Solutions overview • Clock/power gating, DVFS (dynamic voltage & frequency scaling), … • Power Consumption Characteristics and Estimation • Processor and hardware blocks • System-level and RTL Lower Power Design Methods • Dynamic power management • System-level clock/power gating management • RTL design for system-level clock/power gating • Dynamic voltage scaling • DRAM memory and battery • Recent Trends in Low Power Design • EPI (energy per instruction) throttling • Summary

  17. Why Power Estimation? • In order to evaluate the effects of our designs on power consumption • Power estimation at various design levels • Software code optimization • RTL block design • Chip architecture (e.g., bus) design • Chip implementation (e.g., gate-level netlist or layout) Design Power Estimation Optimize Power OK? Done

  18. Power Consumption Characteristics • Processor • E.g., Cortex A8 (1GHz) targets 300mW  0.3mW/MHz • Hardware IPs • E.g., Video codec  100~200mW for 1080p @ 30fps • DDR DRAM • E.g., LPDDR2 800  100~300mW

  19. Processor Power Estimation • Processor power model, ARM926 case • Two power states for core • A more refined model for cache • Hit/miss, SEQ/NS, fill buf, … Active Idle

  20. Power Consumption Characteristics of Cache • Sequential (SEQ) and non-sequential (NS) accesses have different power consumption levels • Power consumption • SEQ with fill buffer hit < SEQ < NS

  21. Cache Power Consumption Example

  22. Processor Power Characterization Flow

  23. Power Estimation Accuracy • 93~98% w.r.t. post-layout gate-level power estimation

  24. Power Consumption Characteristics • Processor • E.g., Cortex A8 (1GHz) targets 300mW  0.3mW/MHz • Hardware IPs • E.g., Video codec  50~100mW for 1080p @ 30fps • DDR DRAM • E.g., LPDDR2 800  50~100mW

  25. Issues in RTL Power Estimation • Commercial solutions, e.g., PowerTheater • Estimation accuracy • RTL power estimation sometimes lacks in considering clock network and aggressive low power design methods, e.g., power/clock gating, multi-Vdd/Vth optimization • Estimation speed • Some commercial RTL power estimators run too slow • Thus, frequently used simple solution • Power consumption = toggle rate * power/gate * gate count • Toggle rate: 5~15%

  26. Chip-level Scenario-based Power Estimation • Video playback • Codec (10M gates, 10%) • LCD (1M gates, 10%) • ARM@200MHz (0.3mW/MHz) • SDRAM@200MHz (50mW) • Total power = 10% * 100mW/MG*(10MG+1MG) + 0.3mW/MHz*200MHz + 50mW = 230mW • Video recording • Audio play • Game

  27. Agenda • Introduction • Low power design issues • Solutions overview • Clock/power gating, DVFS (dynamic voltage & frequency scaling), … • Power Consumption Characteristics and Estimation • Processor and hardware blocks • System-level and RTL Lower Power Design Methods • Dynamic power management • System-level clock/power gating management • RTL design for system-level clock/power gating • Dynamic voltage scaling • DRAM memory and battery • Recent Trends in Low Power Design • EPI (energy per instruction) throttling • Summary

  28. Entering and Exiting Power Modes (States)

  29. Overhead and Break-even Time in Power State Transition • Power State & Transitions Req. Req. Workload Device Busy Busy Idle Working Tsd TWU Working Sleeping Power State t1 t2 t3 t4 Power

  30. Overhead and Break-even Time in Power State Transition • Overhead of shut down and wake up • Break-even time (Tbe) • Tbe iswhen transition energy overhead + low-power state’s energy = high-power state’s energy & Tbe > Tsd + Twu • E.g., HDD suffers from high transition cost due to spin up

  31. How to Achieve Effective Dynamic Power Management? • Rule of thumb in DPM • #1 Turn off unused blocks as soon as possible • #2 The gain needs to be greater than power state transition overhead (in energy and runtime) • How to achieve effective DPM? • Fine-grained power states (or power/clock control regions) • Low power state transition overhead • If the transition time point is explicit, then a direct control (mostly by OS) • Else, we need good idle period prediction policies

  32. IBM HDD: Conventional (before ‘99) Power States and DPM • Idle: Still spinning, but servo stopped, head parked, etc., 40ms wakeup time • Standby: Not spinning, most of electronics off, 1.5 ~ 5 sec wakeup time • Sleep: Entered by a specific command (by OS), ~0.1W

  33. DPM with More Power States • With a few power states • Can only exploit long idle periods • If more fine-grained power states are used • power state transition overhead tends to be reduced • short idle period can be exploited!

  34. IBM Paper: DPM with Fine-Grained Power States • Performance idle: No entry delay. Some electronics off • Fast idle: 40ms delay, head parked, servo control off • Low power idle: 400ms delay, head unloaded • Standby: conventional Standby, but can be self-managed

  35. Impact of Fine-grain Power States 40ms delay Difference in energy comsumption 30.8J vs. 23J  25% reduction! How to create fine-grained power states? Clock/power gating Voltage and frequency scaling Etc. 2.2 2.2 2.2 0.9 0.9 0.9 Power 0 5 10 15 0 5 10 15

  36. Coarse-Grained Clock Gating:Clock Domain • Clock gating targets power consumption due to clock network while subsystems are idle • Clock gating is one of the easiest ways to obtain low power states • Wakeup overhead is a few clock cycles

  37. [Rabaey, 2009] Clock Gating Effect

  38. Power Gating and Power Domain • Power gating targets leakage power consumed even after clock is gated • Power domain: unit of power gating often called voltage island

  39. [LPMM, 2007] Power Gating Clock gating still gives leakage power consumption Power gating reduces leakage power consumption with power on/off transition overhead

  40. [LPMM, 2007] Effects of Clock & Power Gating 90nm generic ARM926-based chip

  41. Power States Related with Power Gating • Active state • Power is applied to logic gates as well as memory (including FF’s) • Retention • Power is applied only to (some) memory • Vretention (e.g., 0.6V) < Vdd (1.1V) • Used to reduce the transition delay to (re)store states on power-on/off transitions

  42. Putting All Together • There can be multiple clock domains inside of a power domain Power domain

  43. CPU 3D Audio Video CPU 3D CPU 3D CPU 3D Audio Video Audio Video Audio Video CPU CPU CPU 3D 3D 3D Audio Audio Audio Video Video Video CPU CPU 3D 3D Audio Audio Video Video Dynamic Power Management: Architectural View mpeg4 run mpeg4 done game run game done mp3 done mp3 run 1 2 7 8 13 23 28 29 30 t (min)

  44. Case Study of Dynamic Power Gating • Linux 2.6.x on a high-performance mobile application processor (AP) User space (Power-aware) applications Applications can control clock/power (optional) Kernel DPM daemon (kdpmd) HW independent HW dependent Notify Clock/power control Device drivers Clock HAL Power domain HAL Kernel space Hardware Power domain control registers Clock control registers

  45. codec IP2 IP3 IP4 codec IP2 IP3 IP4 codec IP2 IP3 IP4 Clock/Power On Sequence Application Device Driver DPM Daemon Clock/Power HAL Power Domain Notify to daemon the usage of codec and wait open(codec) is called Turn on power domain Power domain wake-up delay (power on + state restore) Device driver waits for an event from the daemon Event Turn on codec’s clk open() returns Application starts execution When the codec IP needs to start, if the power domain has been off, the power domain is turned on first. Sungjoo Yoo, 2007

  46. How to Achieve Effective Dynamic Power Management? • Rule of thumb in DPM • #1 Turn off unused blocks as soon as possible • #2 The gain needs to be greater than power state transition overhead (in energy and runtime) • How to achieve effective DPM? • Fine grain power states (or power/clock control regions) • Low power state transition overhead • If the transition time point is explicit, then a direct control (mostly by OS) • Else, we need good idle period prediction policies

  47. Idle Period Prediction • If the idleness is not explicit, then idle period is predicted (when a new idle period starts) • E.g., when SSD does not have any request from host CPU, when can it enter a low power state?  Predict a future idle period, then make a decision of whether to enter the low power state or not

  48. An Illustration of Time-Out Method • A widely used simple method • After an idle period of TO time units, enter a low power state • Energy reduction vs. performance overhead • A fixed (large) TO may lose opportunities for further energy reduction • A small TO will cause too frequent power on/off transitions. We may suffer from large performance overhead due to power on/off transition delay TO Time

  49. Idle Period Detection/Prediction • Prediction of start time and duration • Start time • Time-based • Fixed, variable (e.g., day or night), or adaptive (e.g., p[i+1] = a*I[n] + (1-a)*p[n]) • Rate-based • Declare an idle period when its rate estimate falls below a threshold • Moving average (over a fixed time period) • Event window (moving average over n previous events) • Duration • Infinity or fixed • (Filtered) Moving average (higher than a threshold, i.e., break-even time) • (Filtered) Backoff: e.g., Geometric increase/Arithmetic decrease • Autocorrelation  joint probability of event arrival rates (periods)

  50. Idle Period Detection/Prediction(Cont’d) • Hierarchical prediction, e.g., multiple experts • Takes in the results from multiple predictors, and give one final prediction • Low-pass, e.g., discard results shorter than a threshold, i.e., break-even time. • Quality, e.g., choose the result from the predictor with the highest accuracy [Helmbold, 2000] • Stochastic prediction [Benini, 2000] • Good for stationary patterns that can be modeled in terms of rate

More Related