1 / 56

System-level Power Estimation and Optimization

System-level Power Estimation and Optimization. 2006.09.03 Chong-Min Kyung KAIST. Contents. Introduction System-level Power Estimation System-level Power Optimization. Introduction. Power classification Static power ≈ leakage power Dynamic power Switching power Short-circuit power

wirt
Download Presentation

System-level Power Estimation and Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. System-level Power Estimationand Optimization 2006.09.03 Chong-Min Kyung KAIST

  2. Contents • Introduction • System-level Power Estimation • System-level Power Optimization

  3. Introduction • Power classification • Static power ≈ leakage power • Dynamic power • Switching power • Short-circuit power • Glitch power

  4. Introduction • Power calculation • Static Power • Ptotal_leakage = ∑ Pcell_leakage • Dynamic Power • Pinternal = Func (Cload,TRinput,output) • TR: toggle rate • Pglitch = V2dd ∑(Cloadnet * fglitch * τ) • fglitch: the frequency of glitch • τ: the factor of the width of glitch • Pswitching = ½V2dd ∑(Cload * TRoutput) • Ratio • > 0.1um: switching power: 70~90% • < 0.07um: leakage power: > 50% • Data intensive application • Switching power is a dominant factor.

  5. Introduction • Opportunities for power reduction • For low power design 1. Power model generation 2. Power estimation 3. Power optimization Academic research Commercial tool

  6. System-level Power Estimation

  7. Contents • Power Model Generation • Analytical Method • Empirical Method • System-level Power Estimation • Hardware Power Estimation • Software Power Estimation • Bus Power Estimation

  8. Power Model Generation 1. Analytical method • Use average values of design parameters without different circuit styles, clock strategies and layout techniques consideration • Average capacity, equivalent gate count, primary input number, etc. • Mainly used for behavior-level power estimation • when there is no information about technology library and implementation information • Very low accuracy 2. Empirical method • Use the parameters measured by existing implementations 2-1. Fixed-activity model 2-2. Activity-sensitive model

  9. Index Switch Capacitance (pF) Previous input vector Current input vector 01 … 0n 01 … 0n Cap0 01 … 0n 01 … 1n Cap1 … … … 11 … 1n 11 … 11n Cap2n-1 Power Model Generation 2-1. Fixed-activity model • Use data sheet of a specific hardware block • Pprocessor = Cprocessor x VDD2 x freq • Cprocessor = Pdata_sheet / (Vdata_sheet2 freqdata_sheet) • Low accuracy • Mainly used for coarse-grained system-level power estimation 2-2. Activity-sensitive model • Use signal activity or its statistics which depends on testbench • Transition-sensitive model • Power model is a Look-Up Table (LUT). • Very high accuracy • Statistical activity model • Power model is a LUT or an equation. • High accuracy

  10. Macro Modeling Method • Macro modeling method • Raise abstraction of power model by characterizing macro cell • Mainly used to reduce power model complexity in activity-sensitive power model generation • Macro cell • 32-bit adder, multiplier, MUX, etc. • Reduced computation complexity at the cost of accuracy • Macro cell characterization • Synthesize macro cell with basic cell library • Estimate power value of macro cell with various testbench • Generate power model and reduce its complexity • This concept can be used for raising abstraction of power model in hardware or software-level power estimation.

  11. Macro Modeling Method • Power model of macro modeling method • Statistical activity model • LUT-based model • For each bus component, build 3-D LUT (with axes of Pin, Din, Dout) • Fill power value at each point (Pin, Din, Dout) • Requires a lot of memory space • Equation-based model • Build a polynomial approximating power consumption. • From a large number of input patterns, perform analysis to determine the coefficients. • Requires little memory space • Pin: average input signal probability • Din : average input switching activity • Dout: average output zero delay switching activity

  12. System-level Power Estimation • Estimation speed and power model • Trade-off between estimation speed and accuracy of power model • Abstraction of power estimation • System-level power estimation • Software-level power estimation • Hardware-level power estimation • Behavior-level, RT-level, gate-level, circuit-level Relative power results Absolute power results

  13. System-level Power Estimation • System-level power estimation • Relative value of power consumption is important. • Objective • Power profiling and design exploration • System-level power estimation is composed of 1. Hardware power estimation 2. Software power estimation in processor 3. Bus power estimation

  14. Hardware Power Estimation • RT-level power estimation • Dynamic simulation-based power estimation with coarse-grained net model from power macro model database and testbench

  15. Hardware Power Estimation Tool • There are some commercial tools for hardware power estimation • RT-level • Synopsys Power CompilerTM • Gate-level • Synopsys Prime PowerTM, Synopsys Power CompilerTM • Circuit-level • SPICE, Synopsys PowerMillTM, Cadence VoltageStormTM

  16. Software Power Estimation • Estimation • Processor is too complex to estimate in RT-level. • Power consumption is related to each instruction and instruction sequence. • Estimation method • Power model is added to ISS for instruction-level power profiling. Bi: energy consumption of inst. i Ni: number of execution of inst. i Oij: energy consumption when inst. i is followed by inst. j Nij: number of pair inst. i and inst. j Sk: other inst. Effect such as cache misses, pipeline stall, etc.

  17. Software Power Estimation • Power model • Instruction-level power model • Inter-instruction effect consideration • Dynamic effect (cache miss, branch prediction, etc) • Power modeling method 1) White-box approach 2) Black-box approach

  18. Accuracy Speed High Low Low High White-box Approach • Power model • Activity-sensitive model • Characterization • Use macro modeling method • Process • Run gate-level simulation • Find predominant parameter • Reduce power model complexity • Simple equation or reduced LUT • Make instruction-level power model • Accuracy is degraded and estimation speed is increased by reducing the power model complexity.

  19. V : Oscilloscope : Ammeter principle ( r << R ) Black-box Approach • Characterization flow • Measurement • Characterization Measurement I(t) R V r V Characterization Instruction-level Power Model

  20. Black-box Approach • Measurement • By current measurement of real chip • Power model • Activity-sensitive power model • Statistical activity model • Characterization process • Current is estimated using real chip with multiple iterations of subroutine • Compare measured value with ISS including dynamic effects • Find a power equation which is similar to the measured power graph • Decide coefficients of power equation by experimental iteration  It is important to find the closest equation to the measurement results.

  21. Pulse/Pattern Generator synchronization signal Digital Sampling Oscilloscope Interrupt signal current signal clock Target Chip under Measurement Black-box Approach • Measurement method • Program under measurements are isolated by using interrupt signal, NOP instruction and processor wait state for finding exact measurement position and for synchronization. R. Muresan and C. Gebotys, “Current dynamics-based macro-model for power simulation in a complex VLIW DSP processor”, IEE proc.-Comput. Digit. Tech., 2002

  22. Index Switch Capacitance (pF) Previous input vector Current input vector 01 … 0n 01 … 0n Cap0 01 … 0n 01 … 1n Cap1 … … … 11 … 1n 11 … 11n Cap2n-1 Software Power Estimation Toolfor Research Purpose • SimplePower • Functional simulator • SimplePower core based on SimpleScalar ISA • Power model • Activity sensitive power model • Direct simulation and profiling based on input transitions • Generate switch capacitance tables Cycle-accurate activation information Implementation-based signal generation

  23. Software Power Estimation Tool for Research Purpose • Wattch • Architecture-level power estimation • Functional simulator • SimpleScalar: cycle-level performance simulator • Power model • Fixed activity power model • Categories • Array structure • Fully associative CAM • Combinational logic and wires • Clocking logic • Example: Array structure • Power = C1 + C2 * A + C3 * B • A: Bit line number, B: Word line number • C1: Diffusion cap., C2: Gate cap., C3: Metal cap.

  24. Bus Power Estimation • Power consumed on the bus consists of two parts • Bus component power • Power consumed internally in the bus components • Arbiter, decoder, muxes • Interconnection power • Power consumed on the bus wires that connect the master and slave interfaces and the bus components • Address bus, data bus, control signals

  25. Bus Component Power Estimation • At System level, only the structural information about bus architecture can be obtained. • Bus interconnection • Bus width • Global bus power model is used for estimation • Characterized power model of bus component is in the global bus power model • Arbiter, decoder, multiplexer • Behavior, FSM Memory IP # 2 IP # 1 Processor Global Bus Power Model bus

  26. Bus Component Characterization • Macro model • Pre-calculated power cubic • Useful to apply on system level power estimation. • Input parameter of the macro models • Data and address bus width, or the operating frequency • The number of masters and slaves • Input/output data characteristics • The switching activity, the probability of signal or the Hamming distance of two successive data

  27. Arbiter Master #1 Slave #1 M U X Master #2 Slave #2 M U X Master #3 Slave #3 Decoder Bus Power Analysis • AMBA AHB bus power analysis • A standard for on-chip communication • Power analysis process • Bus structure decomposition • Arbiter • Decoder • Multiplexer • Build macro model of eachcomponent • Bus behavior decomposition and build power FSM • IDLE, READ, WRITE, and IDLE with handover • Monitor bus signal activity • Power analysis through power FSM Global bus power model

  28. Interconnection Power Estimation • Power consumption on each wire • P = ½ Vdd2 ·C · f ·α • Vdd : voltage swing between the logic level 1 and 0. • C: capacitance of the wire. • f : clock frequency. • α : switching activity. • Vdd and f is given as fixed value. • We need to find C and α. • C can be obtained from wire capacitance model. • α can be obtained from system level simulation.

  29. Interconnection Power Estimation • Wire capacitance model • * • εox : constant, 3.45 x 10-13F/cm, permittivity of SiO2 • xint : oxide thickness underneath the interconnect • W : interconnect width • L : interconnect length • W, xint can be obtained from the technology parameter. • L can be estimated from the area of the chip • (where A is area of the chip) * J. P. Uyemura, ‘Circuit Design for CMOS VLSI’ Kluwer Academic Publishers 1992.

  30. Interconnection Power Estimation • Switching activity model • Switching activity can be obtained from bus transactions. • Bus model monitors bus transition and counts bus switching. CPU Bus model mem DSP IP Monitoring bus transition System level simulation

  31. Bus Power Estimation • Power estimation • Application example is simulated in system level simulator. • Power estimator reports power consumption using the power model of the bus components and interconnection. • Monitored values in the bus transition are used as the input of the power estimator. CPU Bus model mem Power Estimator DSP IP System level simulator

  32. System-level Power Optimization

  33. Contents • Low Power System Implementation Techniques • Circuit level • Clock gating • MTCMOS • Multiple voltage supply • Architecture level • Memory Optimization • Bus Optimization • Dynamic Power Management in System Level • Introduction to DPM • Structure of DPM • Component-level DPM scheme • DPM Policy • Dynamic Voltage Scaling

  34. Circuit Level Low Power System Implementation Techniques • Clock gating • Most popular method for power reduction of clock signals • Need circuit to generate enable signal • Increases complexity of control logic • Timing critical to avoid clock glitches at AND gate output • Additional gate delay on clock signal

  35. Circuit Level Low Power System Implementation Techniques • MTCMOS • Low VTH devices in logic to maintain performance when active. • High VTH current switch (header or footer) to cutoff leakage path when sleep. • Scheduling algorithm which controls sleep signal is important. VDD header sleep Virtual VDD Logic Input Output Virtual GND sleep footer

  36. Circuit Level Low Power System Implementation Techniques • Multiple Voltage Supply • Slows down non-critical path with lower voltage supply • Two or more power grids • Need high-efficiency voltage converters for dynamic voltage scaling • Dynamic power scheduling algorithm is important. In * + Critical path: need high speed logic Low voltage supply + High voltage supply - +

  37. Architecture Level Low Power System Implementation Techniques • Memory Optimization • Code density optimization • Goal • Minimize program memory occupation to reduce the bandwidth of processor-memory communication • Approaches • Custom instruction sets • Object code compression

  38. Memory Optimization • Custom instruction set • Shorter size instruction sets than regular instruction sets • Example : ARM Thumb code (16bit instruction) • Need a specific architecture for 16 bit instruction support Inst 5 Inst 4 Inst 4 Inst 5 Inst 3 Inst 2 Inst 3 Inst 2 In this case, 3/5 bandwidth reduction Inst 1 Inst 1 32bit 32bit

  39. Memory Optimization • Object code compression • The size of all instructions is same, but some or all instructions are encoded and saved in instruction memory. • Available solution for embedded processors • A specific architecture for different type of instruction support is not needed. • Exploit the small subset of instructions used by firmware code • Approaches • Full code compression • Selective code compression

  40. Memory Optimization • Full code compression • Replace all instructions with binary patterns of minimum width. • [log2 N], where N is the number of instructions • Advantage • Memory bandwidth for instruction is decreased. • Disadvantage • Size of IDT may be very large because N is not small. • log2 N may not be a multiple of 8. Memory Memory Addr. Core Core Addr. Inst. Inst. IDT log2N k k k bits log2N bits IDT : Instruction Decompression Table

  41. Memory Optimization • Selective Code Compression • Almost program traces are covered by a small subset of instructions. • Compression only such subset – instructions that maximize program coverage • Program is a mix of compressed and uncompressed instructions. Memory Addr. Core Buffer k k Inst. IDT 8 8 bits Controller

  42. Memory Optimization • Advantage • Size of IDT is fixed and limited. • Instruction fetching/decompression logic has reduced complexity. • Disadvantage • Requires a controller to handle instruction fetching

  43. Memory Optimization • Data density optimization • Same principle as code density optimization • For the purpose of reducing memory traffic • dynamic size of the data-set • More complex than code compression, because both compression and decompression are required • Hardware compression/decompression unit needed • Design trade-off between speed and power

  44. Architecture Level Low Power System Implementation Techniques • Bus power optimization • A large amount of power is dissipated in data communication over heavily-loaded on-chip or off-chip busses. • Reduce switching activity on busses via signal encoding for power saving • Approaches • Bus-invert coding • Gray code addressing PBus = n x C x Vdd2 x freq x activity , for an n-bit bus

  45. Bus Optimization • Bus-invert coding • Add redundant line INV to bus • When INV = 0 • Data is equal to remaining bus lines • When INV = 1 • Data is complement of remaining bus lines • At each cycle decide whether sending the true or compliment signal leads to fewer toggles Source data Data bus Received data INV signal Polarity Decision logic

  46. Bus Optimization • Gray code addressing • Most instruction addresses are consecutive • Use Gray code to address • Word-oriented machines • Increments by 4 (32 bit) or by 8 (64bit) • Modify Gray code to switch 1 bit per increment • Gray code adder needed for jump i : increment

  47. Introduction to DPM • Dynamic Power Management (DPM) • DPM controls power consumption of components based on its usage. • Prediction of component usage is essential. • Methods • Shutdown (clock gating, power gating) • Slowdown (frequency scaling, voltage scaling, VTH scaling) f VDD f VDD idle VDD 0.6 VDD T/2 T

  48. Structure of DPM • Levels of embodiments of DPM • Component level • Circuit, Block • Power mode • System level • Policy • The procedure which controls the power level of each module in a system System Policy power mode power mode request request Block 1 Block n … … Circuit Circuit … Circuit Circuit

  49. Component Level DPM Scheme • Circuit level • Clock off by clock gating • Power off by footer/header of MTCMOS • Multiple voltage supply • Block level • Power off by shutdown of power supply to IPs • When power off pattern of two block are similar, shutdown together. Virtual VDD IP #1 Virtual GND VDD source IP #2 GND source

  50. Power mode Each state has combination of enabled DPM technique. ex) The case that system uses clock gating and block shutdown Transitions between modes of operation have a cost. Component Level DPM Scheme P=400mW Run 90μs 10μs 10μs 160ms P=50mW P=0.16mW Sleep 90μs Idle Wait for interrupt Wait for wake-up event Power state machine for the StrongARM processor SA-100 Microprocessor Technical Reference Manual, Intel, 1998

More Related