SoC 저전력 설계 기법

SoC 저전력 설계 기법 조 준 동 SungKyunKwan University VADA Lab.

·Content • Introduction • SOC Design Trends • System Level Low Power Design • Architecture Level Low Power Design • Conclusion

·SOC Design Trends • Expected to integrate more and more complex • Web-browsing, real-time video processing, speech recognition and synthesis • Average operating power at or below 100mW and standby power levels at or below 2mW • Performance levels must increase from 300 million operations per second (MOPS) today to 2500 MOPS in 2016

Achieving functionality while maximizing battery life and minimizing size GPS Noise cancellationheadphones Cochlear implant Cellular phone Medicalwatch Hearing aid Portable audio Digital still camera Digital radio

QoS vs. Power • How accurate should I make my FDCT?

SOC Design Characteristics • The new version of ITRS predicts that Moore’s law will continue on a two to three year cycle throughout this period (2001-2016) • One of the key design challenges is to effectively use the dramatically increasing transistor counts, given certain power and productivity constraints • “Bottom-up” - based on system constraints “Top-down” - based on design resource constraints

Energy-Flexibility Gap 1000 신호처리 ASIC 200 MOPS/mW 100 에너지 효율 (MOPS/mW) 재구성 구조 10-80 MOPS/mW 10 신호처리 프로세서 ASIPs, DSPs 3 MOPS/mW 1 임베디드 프로세서(ARM) 0.5 MOPS/mW 0.1 가용성 6

Radio systems • WiFi – 10-100Mbits/sec unlicensed band • OFDM, M-ary coding • 3G – .1-2 Mbits/sec wide area cellular • CDMA, GMSK • Bluetooth – .8 Mbit/sec cable replacement • Frequency hop • ZigBee – .02-.2 Kbits/sec low power, low cost • QPSK • UWB – Recently allowed by FCC • Short pulses (no carrier), bi-phase or PPM

Data rate UWB 100 Mbit/sec 802.11g 802.11a 802.11b 10 Mbit/sec 1 Mbit/sec 3G Bluetooth ZigBee 100 kbits/sec ZigBee 10 kbits/sec UWB 0 GHz 1GHz 2 GHz 3 GHz 4 GHz 5 GHz 6 GHz

Cost (projections) $1000 3G $100 802.11a 802.11b,g UWB $10 Bluetooth ZigBee ZigBee $1 UWB $ .10 0 GHz 1GHz 2 GHz 3 GHz 4 GHz 5 GHz 6 GHz

Power Dissipation 10 W 802.11a 802.11bg 3G 1 W 100 mW Bluetooth UWB ZigBee 10 mW ZigBee UWB 1 mW 0 GHz 1GHz 2 GHz 3 GHz 4 GHz 5 GHz 6 GHz

Why Low-Power Devices? • Practical reasons (Reducing power requirements of high throughput portable applications) • Financial reasons (Reducing packaging costs and achieving memory savings) • Technological reasons (Excessive heat prevents the realization of high density chips and limits their functionalities)

Different Constraints for Different Application Fields • Portable devices: Battery life-time • Telecom and military: Reliability (reduced power decreases electromigration, hence increases reliability) • High volume products: Unit cost (reduced power decreases packaging cost)

Driving Forces for Low-Power: Deep-Submicron Technology • ADVANTAGES • Smaller geometries • Higher clock frequencies • DISADVANTAGES • Higher power consumption • Lower reliability

Dynamic Power Consumption • Average power consumption by a node cycling at each period T: (each period has a 01 or a 1 0 transition) • Average power consumed by a node with partial activity • (only a fraction  of the periods has a transition)

·Power Model • Power dissipation in logic blocks, consists of both dynamic (switching) and static (standby)

·Power Model • Memory power is due primarily to row/column decoders and bit and word line switching activity • Consider the power dissipated when the bitlines are switched by approximately VDD during write cycles

·Chip Composition (Future) • Low-power digital SOC designs of the future will be 90-95% memory and 5-10% logic, including overhead • Future chips may be dominated by memory due to power and resource constraints

Three Factors affecting Energy • Reducing waste by Hardware Simplification: redundant h/w extraction, Locality of reference,Demand-driven / Data-driven computation,Application-specific processing,Preservation of data correlations • All in one Approach(SOC): I/O pin and buffer reduction • Voltage Reducible Hardwares • 2-D pipelining (systolic arrays) • Parallel processing

저전력 설계 기법들… • Voltage and process scaling • Design methodologies • Power-aware design flows and tools, trade area forlower power • Architecture Design • Power down techniques • Clock gating, dynamic power management • Dynamic voltage scaling based on workload • Power conscious RT/ logic synthesis • Better cell library design and resizing methods • Cap. reduction, threshold control, transistor layout

SoC Design Flow

Power Analysis • Fast and accurate analysis in the design process • Power budgeting • Knowledge-based architectural and implementation decisions • Package selection • Power hungry module identification • Detailed and comprehesive analysis at the later stages • Satisfaction of power budget and constraints • Hot spots

Power Savings

Estimation Expectations

System Level Power Optimization • Algorithm selection / algorithm transformation • Identification of hot spots • Low Power data encoding • Quality of Service vs. Power • Low Power Memory mapping • Resource Sharing / Allocation

Flow • C/C++ Compilation • Program Execution • Building design representation • Loading profiling data • Setting constraints • Power estimation • Identification of Hot Spots

IBM’s PowerPC • Optimum Supply Voltage through Hardware Parallel, Pipelining ,Parallel instruction execution • five instruction in parallel (IU, FPU, BPU, LSU, SRU) , RISC • FPU is pipelined so a multiply-add instruction can be issued every clock cycle • Low power 3.3-volt design • 603e provides four software controllable power-saving modes. • Copper Processor with SOI • IBM’s Blue Logic ASIC :New design reduces of power by a factor of 10 times

Silicon-on-Insulator • How Does SOI Reduce Capacitance ? • Eliminated junction capacitance by using SOI (similar to glass) is placed between the impuritis and the silicon substrate • high performance, low power, low soft error

Why Copper Processor? • Motivation: Aluminum resists the flow of electricity as wires are made thinner and narrower. • Performance: 40% speed-up • Cost: 30% less expensive • Power: Less power from batteries • Chip Size: 60% smaller than Aluminum chip

Factors Influencing Ceff • Circuit function • Circuit technology • Input probabilities • Circuit topology

Some Basic Definitions • Signal probability of a signal g(t) is given by • Signal activity of a logic signal g(t) is given by where ng(t) is the number of transitions of g(t) in the time interval between –T/2 and T/2.

Factors Influencing Ceff: Circuit Function • Assume that there are M mutually independent signals g1, g2,...gM each having a signal probability Pi and a signal activity Ai, for i  n. • For static CMOS, the signal probability at the output of a gate is determined according to the probability of 1s (or 0s) in the logic description of the gate P1 P1 P1P2 1-(1-P1)(1-P2) P1 1-P1 P2 P2

Factors Influencing Ceff: Circuit Function (Static CMOS) • Transistors connected to the same input are turning on and off simultaneously when the input changes • CLof a static CMOS gate is charged to VDD any time a 01 transition at the output node is required. • CL of a static CMOS gate is discharged to ground any time a 1 0 transition at the output node is required. NOR Gate

Factors Influencing Ceff:Circuit Function (Static CMOS) • State transition diagram of the NOR gate

Factors Influencing Ceff:Input Probabilities (Static CMOS) • Signal activity calculation: Boolean Difference • It signifies the condition under which output f is sensitized to input xi • If the primary inputs to function f are not spatially correlated, the signal activity at f is

Power Reduction Methods:Architecture Driven Supply Voltage Scaling • Strategy: 1. Modify the architecture of the system so as to make it faster. 2. Reduce VDD so as to restore the original speed. Power consumption has decreased. • The most common architectural changes rely on the exploitation of parallelization and pipelining. • Drawback: The additional circuitry required to compensate the speed degradation may dominate, and the power consumption may increase. • Consequence: Parallelism and pipelining do not always pay-off.

Parallel Architectures Ppar=0.36Pref

Parallel-Pipelined Architectures Ppar=0.2Pref

Loop unrolling • The technique of loop unrolling replicates the body of a loop some number of times (unrolling factor u) and then iterates by step u instead of step 1. This transformation reduces the loop overhead, increases the instruction parallelism and improves register, data cache or TLB locality. Loop overhead is cut in half because two iterations are performed in each iteration. If array elements are assigned to registers, register locality is improved because A(i) and A(i +1) are used twice in the loop body. Instruction parallelism is increased because the second assignment can be performed while the results of the first are being stored and the loop variables are being updated.

Loop Unrolling (IIR filter example) Two output samples are computed in parallel based on two input samples. Neither the capacitance switched nor the voltage is altered. However, loop unrolling enables several other transformations (distributivity, constant propagation, and pipelining). After distributivity and constant propagation, The transformation yields critical path of 3, thus voltage can be dropped.

Loop Unrolling for Low Power

0 0 0 0 0 1 0 1 0 0 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 1 0 Encoding • Bus-invert (BI) code • Appropriate for random data patterns • Redundant code (1 extra bus line) • Reduce avg. transitions up to 25% X Z 0 0 0 0 1 0 1 0 0 1 0 0 1 1 1 1 1 0 1 0 0 1 0 0 1 1 0 1 0 0 1 1 D Majority voter inv D Z X inv R. J. Fletcher, “Integrated circuit having outputs configured for reduced state changes,” May 1987, U.S. Patent 4667337. M. R. Stan and W. P. Burleson, “Bus-invert coding for low-power I/O,” IEEE Tr. on VLSI Systems, Mar. 1995, pp. 49-58.

Different Supply Voltages for Different Units • Partition the chip into multiple sub-units each of which is designed to operate at a specific supply voltage 3V 5V 5V SLOW 3V FAST 5V SLOW SLOW 3V 3V SLOW 3V

Eureka 147/KDMB을 위한 COFDM 모뎀 블록도

DMB 변복조부 국내․외 현황

저전력 소모 기술 개발 현황

VADA Lab’s 저전력 IP’s Low-Power Equalizer for xDSL 21% 전력 감소, SNR=40dB 스마트 카드용 차세대 저전력 보안 프로세서 칩 설계 ECC, Rijndael, DES, SHA Maximizing Memory Data Reuse for Lower Power Motion Estimation 33% 전력 감소, 52Mhz 2.1배 면적증가 (SCI 논문) OFDM-based high-speed wireless LAN platform 20.7Mhz, 237000 gates IS-95 기반 CDMA의Double Dwell Searcher저전력 및 co-design 설계 67% 전력 감소, 41% 면적감소 Fast and Low Power Viterbi Search Engine using Inverse Hidden Markov Model 68% 전력 감소, 71%속도개선, 1.9배면적증가 삼성 휴먼 테크 우수논문상, ‘02 High-Flexible Design of OFDM Tranceiverfor DVB-T (개발 중)

SoC 저전력 설계 기법

SoC 저전력 설계 기법

Presentation Transcript