270 likes | 611 Views
Low Power Processor Design. VLSI Systems Lab. 3 월 28 일 박 봉 일. Introduction. Processor power consumption. Power. Strategy. Cost. Low power processor. none. none. 1 Watt. heat sink, air flow. 3-5 Watt. $1-5. Laptop Computer. 5-15 Watt. fan sink. $10-15. 15+ Watt. exotic.
E N D
Low Power Processor Design VLSI Systems Lab. 3월 28일 박 봉 일
Introduction • Processor power consumption Power Strategy Cost Low power processor none none 1 Watt heat sink, air flow 3-5 Watt $1-5 Laptop Computer 5-15 Watt fan sink $10-15 15+ Watt exotic $50+
Process level low voltage low capacitance Circuit level TR sizing adiabatic circuit low power arithmetic components Logic level precomputation logic logic synthesis retiming System level frequency reduction voltage reduction power management mode Power Reduction Technique
System Level • Execution unit idle time(PowerPC 603)
System Level • Power management support
Power Estimation • Simulation-based techniques • circuit simulation • switch level simulation : IRSIM • transistor level simulation : PowerMill • gate level simulation • Monte Carlo simulation
Power Estimation • Probabilistic techniques • combinational circuits • zero delay model • real delay model • sequential circuits
Logic Level • Logic Synthesis • precomputation logic • retiming • state assignment • path balancing • technology mapping • gate resizing R1 A R3 R2 g R g R
Architectural selection select as little of the array as possible dynamically powering up sense amp. Clocking only as needed Memory Memory Block Diagram 1/2 Cell Array Row De- coder 1/2 Cell Array 1/2 Column Decoder 1/2 Column Decoder Addr Data W/R Data W/R Split Array, half of columns active
Fast transition time and low skew consume lots of power 10~20% of total chip power Clock power management clock branches are segmented and can be enabled as needed Clock PLL
Instruction Type # of instructions # of 0 to 1 # of 1 to 0 Switching Factor Shift 3175 1.98 4.63 0.21 ADD/SUB 4937 3.41 4.31 0.24 EA calculation 15496 3.15 2.37 0.17 MUL/DIV 1070 1.50 1.56 0.10 Control Register 192 1.44 2.26 0.12 Compare 2349 3.81 4.11 0.25 Branch 58 5.03 13.50 0.58 Total 27277 3.05 3.13 0.19 Datapath Signal Activity • PowerPC 601
Use enabling logic Enable only the adder needed reduce the signal activities Minimizing temporal bit transition activity gray coding bus inversion coding Traditional Method A B C D clk clk Adder Adder control control clk Adder control clk MUX control A B C D
Adder Type (32 bit) Delay (in gate units) # of gates # of transitions (average) Ripple Carry 68 288 182 Carry Skip(1) 33 304 392 Carry Skip(2) 19 350 437 Carry Lookahead 14 401 405 Carry Select 14 597 711 Conditional Sum 15 857 1323 Datapath Components:Adder • 특징 • 다양한 구조에 따른 transition의 변화가 심함
Multiplier Type (32 bit) Delay (in gate units) # of gates # of transitions (average) Modified Array 98 2405 7348 Wallace/Dadda 51 2569 3874 Datapath Components: Multiplier • 특징 • 많은 transition • transition이 일어날 확률이 1회/1clock인 노드가 50%이상임
Future Works • 저전력 프로세서 설계 • Arithmetic component에 대한 분석 • 저전력을 위한 arithmetic component의 제안 • 저전력 프로세서 구조의 제안 • 다양한 구조에 대한 전력측면에서의 분석 To be continued...
Continued Story:ACCENT_Light VLSI Systems Lab. KAIST. Mar. 28. 1998. You-Sung Chang
Previous Work • Dr. Bong has done! • Everyone knows well now. • Nothing to explain.
Feature of Accent • Highly integrated CISC Processor-Core • 4-stage Pipelined Architecture • Configuration • Pre-fetch Cache • Decode • Execution • Memory Management • Micro-code • External Interface • Embedded DRAM • . . .
Low Power in Accent • Support Programmable Very Complex Code • Micro-code based Stripe Power Control • Pre-charging Biasing in Mask-ROM • Inverse Data Store in Embedded DRAM • Minimizing switching in BUS transfer
Very Complex Code • Maximize the advantage of CISC micro-code approach • Adaptive Programmable Micro-code • Program analyzer extract application specific instruction • Compile micro-code ROM and decoder • A small loop is translated into a complex instruction • Small code size • Give more idle time to pre-fetch and decode units • Enable low power from the small code size and the clock blocking for the induced idle time of pre-fetch and decode units
Stripe Power Control Gated clock 1 • Clocking only as needed • Obvious! • How? • Cut data-path in strips • Power control using micro-code field information • Request enables clocking for peripheral units Pass Latch Func1 Gated clock 2 F/F Gated clock 3 Pass Latch Func2 Gated clock 4 F/F
Mask ROM • Selective pre-charging/discharging for Micro-code ROM. • Using the static statistics, assemble Micro-code ROM cell column by column. • Simulation shows • Not so effective for Micro-code ROM • Some potential for constant ROM
Embedded DRAM • Full voltage pre-charging • Does not need half voltage generator • Single-ended type • Read/Write word by word • To save power, minimize switching in the bit-line • Store inverse data if ‘0’ is dominated with indicator. Pre-charged (high cap.) Reference Pre-charged (low cap.) Sense Amplifier
BUS Transfer • One-Hot Coding • Gray Coding • Bus Inversion Coding (1994 stan) BUS N+1 1: inversion indicator N N
Self Evaluation • Evaluation of Anticipation • Support Programmable Very Complex Code (H) • Micro-code based Stripe Power Control (M) • Pre-charging Biasing in Mask-ROM (M) • Inverse Data Store in Embedded DRAM (H) • Minimizing switching in BUS transfer (X) H: high M: medium L: low X: X, its dedicated signification
Further Work • Complete power estimation for each block • Functional Blocks in Data-path • Pre-fetch and Decoder • Inspect physical constraints in pre-charging biasing • Estimate the power advantage of inverse data store • Target : Workshop ~ • Task force : Caviar, Woosee, Bipark