**ISLPED 2004** 8/10/2004 Power-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and Krste Asanović Computer Architecture Group, MIT CSAIL

**Traditional Pipelining** • Goal: Maximum performance Vdd Setup Clk-Q Propagation Delay Clk Clk Clk

**Pipelining as a Low-Power Tool** • Goal: Low-Power, Fixed Throughput Vdd Setup Clk-Q Propagation Delay Clk Time Slack Clk Time Slack Clk

**Pipelining as a Low-Power Tool** • Goal: Low-Power, Fixed Throughput Vdd Setup Clk-Q Propagation Delay Clk Time Slack Clk Traded for Power (supply voltage scaling) Time Slack Clk

**Pipelining as a Low-Power Tool** Power * Clock frequency fixed Flip-flop Power Overhead Pipelining Time slack Delay

**Pipelining as a Low-Power Tool** Power * Clock frequency fixed Power Saving Supply voltage scaling Delay

**Power-Optimal Pipelining** • Power reduction from pipelining limited by power overhead of increased number of flip-flops Power-Optimal Pipelining

**Power-Optimal Pipelining** • Power reduction from pipelining limited by power overhead of increased number of flip-flops Power-Optimal Pipelining Power Too shallow pipelining Delay

**Power-Optimal Pipelining** • Power reduction from pipelining limited by power overhead of increased number of flip-flops Power-Optimal Pipelining Power Too deep pipelining Too shallow pipelining Delay

**Power-Optimal Pipelining** • Power reduction from pipelining limited by power overhead of increased number of flip-flops Power-Optimal Pipelining Power Too deep pipelining Too shallow pipelining Optimal pipelining Optimal Power Saving Delay

**Contribution** • Pipelining is an old idea. • Research focus has been on performance impact of pipelining. • Idea of using pipelining [Chandrakasan ’92] to lower power has not been fully explored in deep submicron technology. • Analysis and circuit-level simulation of Power-Optimal Pipelining for different regimes of Vth, activity factor, clock gating

**Bottom-to-Top Approach** • Impact of pipelining on power component • Impact of pipelining on total power (with/without clock-gating) Power Total Power (clock-gated) active active inactive Time Idle Power Component Leakage Power Component Switching Power Component

**Bottom-to-Top Approach** • Impact of pipelining on power component • Impact of pipelining on total power (with/without clock-gating) Power Total Power (not clock-gated) active active inactive Time *Idle power = power consumed when circuit is idle and not clock-gated Idle Power Component Leakage Power Component Switching Power Component

**Methodology** • Target digital system: Fixed throughput, Highly parallel computation, Logic-dominant • Test bench • BPTM (Berkeley Predictive Technology Model) 70nm process: • LVT(0.17/-0.2), MVT(0.19/-0.22), HVT(0.21/-0.24) • Hspice simulation at 100°C, Clock = 2 GHz Baseline N FO4 inverters (N = 2 ~ 24) TG flip-flops TG flip-flops One Pipeline Stage

**Pipelining and Switching Power:Analytical Trend** Optimal Saving O(N2) Flip-flop overhead Switching Power Quadratic reduction of logic switching power O(1/N) Vdd2 N2 Optimal FO4 Number of FO4 per stage, N

**Pipelining and Leakage Power:Analytical Trend** Optimal Saving O(1/N) O(N ) (1<< 2) Flip-flop overhead Superlinear reduction of logic leakage power Leakage Power Optimal FO4 Vdd * e(Vdd) N DIBL effect Number of FO4 per stage, N

**Pipelining and Idle Power:Analytical Trend** • Clock-gating is not always possible • Increased control complexity • insufficient setup time of clock enable signal • Leakage Power + Flip-flop Switching Power • Between leakage power scaling and flip-flop switching power scaling depending on leakage level

**Pipelining and Idle Power:Analytical Trend** Leakage Power Scale Flip-flop Switching Power Scale Optimal Saving Idle Power Optimal Saving O(N) Optimal FO4 Linear reduction of Flip-flop switching power Relative Power O(N ) (1<< 2) O(1/N) 1/N * Vdd2 N Optimal FO4 O(1/N) Number of FO4 per stage, N Number of FO4 per stage, N

**Simulation Results:Power Components** Fixed Throughput @ 2 GHz N = Number of FO4 inverters per stage (Not including flip-flop delay) N* = Optimal N Saving* = Optimal power saving by pipelining

**Optimal Power Saving** Optimal FO4 = 6 Optimal FO4 = 6~8 No Clock Gating Clock Gating *2 GHz *Flip-flop delay not included in optimal FO4 relative power relative power activity factor activity factor

**Optimal Power Saving** Optimal FO4 = 6 Optimal FO4 = 6~8 No Clock Gating Clock Gating Idle Power Leakage Power relative power relative power Switching Power Switching Power activity factor activity factor

**Optimal Power Saving** Optimal FO4 = 6 Optimal FO4 = 6~8 No Clock Gating Clock Gating relative power relative power LVT activity factor activity factor

**Discussion** • LVT can be fast and power-efficient • enables lower Vdd • Flip-flop delay more important than flip-flop power for power-optimal pipelining

**Limitation of This Work**

**Conclusion** • Pipelining is an effective low-power tool when used to support voltage scaling in digital system implementing highly parallel computation. • Optimal Logic Depth: 6-8 FO4 • ~ 8-10 FO4 including flip-flop delay • Optimal Power Saving: 55 – 80% • It depends on Vth, AF, Clock-Gating • Insights: • Pipelining is more effective with High AF • Pipelining is most effective at saving switching power • Pipelining is more effective with lower Vth • Except for when leakage power is dominant. • Pipelining is more effective with clock-gating • reduced flip-flop overhead.

**Acknowledgments** • Thanks to SCALE group members and anonymous reviewers • Funded by NSF CAREER award CCR-0093354, NSF ITR award CCR-0219545, and a donation from Intel Corporation.

**BACKUP SLIDES**