Nikhil Jayakumar Sunil P. Khatri Presented by Ayodeji Coker

# Nikhil Jayakumar Sunil P. Khatri Presented by Ayodeji Coker

## Nikhil Jayakumar Sunil P. Khatri Presented by Ayodeji Coker

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. An Algorithm to Minimize Leakage through Simultaneous Input Vector Control andCircuit Modification Nikhil Jayakumar Sunil P. Khatri Presented by Ayodeji Coker Texas A&M University, College Station, TX, USA

2. Contribution of Leakage Power • Leakage is a major contributor to total power consumption. • “Standby / Sleep” leakage reduction is crucial for portable electronics. • Some popular techniques are: • MTCMOS / sleep transistor • Body biasing • Input Vector Control (IVC)

3. Intuition Behind Input Vector Control • Stack Effect : As many series cut-off transistors as possible reduces leakage. • Leakage can be about 2 orders of magnitude lower than maximum. • Cannot set all gates to minimum leakage state due to logical interdependencies • NAND3 : min leakage state = 000 • NOR3 : min leakage state = 111 Leakage of a NAND3 gate

4. Traditional Input Vector Control • Find the Minimum Leakage Vector (MLV) at the primary inputs. • NP-hard problem. • Several heuristics to find an optimal MLV. • Apply inputs through scan-chain or through MUXes at primary inputs (flip-flop outputs) during standby / sleep. • Can we do more? • Why restrict ourselves to only primary inputs?

5. Previous Approaches • “Leakage Current Reduction in CMOS VLSI Circuits by Input Vector Control” (TVLSI ‘04)– Abdollahi et.al. • Similar to our approach – use control points and IVC. • Our choice of gate variants allows greater flexibility at control points. • “Enhanced Leakage Reduction by Gate Replacement” (DAC ‘05) – Yuan et.al. “A Fast Simultaneous Input Vector Generation and Gate Replacement Algorithm for Leakage Power Reduction” (DAC ’06) – Cheng et.al. • Use gate replacement like we do, but a gate G is replaced by a gate G’ to reduce leakage of gate Gnot control internal nodes. • Previous approaches have an associated delay penalty to get a reasonable leakage reduction. • We get a significant leakage reduction with no expected delay penalty.

6. Our Approach - Overview • Modify the circuit such that we control internal nodes of the circuit. • Create variants of each gate that replaces the original. • Traverse a circuit from inputs to output and replace gates in the circuit • Reduce leakage through stack effect for the gates in the fanout of a gate. • Do not necessarily reduce leakage of the gate being replaced. • Perform gate replacement such that leakage is reduced but circuit delay is not increased.

7. Variants of a Gate • Regular NAND2

8. Variants of a Gate • sngl1out0: Used when output of gate is 1 in standby, but all the fanout gates required an output of 0.

9. Variants of a Gate • sngl1out1: Used when output of gate is 0 in standby, but all the fanout gates required an output of 1.

10. Variants of a Gate • snglmx0: Used when output of gate is 1 in standby, but some fanout gates require an output of 0.

11. Variants of a Gate • snglmx1: Used when output of gate is 0 in standby, but some fanout gates require an output of 1.

12. Variants of a Gate • dbl variants : Larger counterparts of the sngl variants (devices sized < 2X) • Adds more flexibility to choices for replacement.

13. The Gate Replacement Algorithm • Assume inputs of gates at first level can be set independently • Gates at first level can all be set to their minimum leakage state. • Pick a gate G from the first level. Let g be its output signal. • Find what value all gates in the fanout of G require. • Try to replace gate if there is a net savings in leakage and there is no timing violation.

14. Example • First set gate G to lowest leakage state - 00 • Next look at fanout of gate G – gate J is in its fanout. • If output of G = 1 (the current value) – best state at J possible is 10. • Choose from 10,11 • Best state possible for J is 00. • Choose from 00,01,10,11. • Leakage improvement possible = (Leakage of J at state 00 – Leakage of J at state 10 – Leakage cost of replacing gate G with a sngl1out0 variant). 0 1 G 0 J H

15. Example • First set gate G to lowest leakage state - 00 • Next look at fanout of gate G – gate J is in its fanout. • If output of G = 1 (the current value) – best state at J possible is 10. • Choose from 10,11 • Best state possible for J is 00. • Choose from 00,01,10,11. • Leakage improvement possible = (Leakage of J at state 00 – Leakage of J at state 10 – Leakage cost of replacing gate G with a sngl1out0 variant). 0 0 1 G 0 J H

16. Example • Next set gate H to its lowest leakage state - 00 • Then look at fanout of gate H – gate J is in its fanout. • If output of H = 1 (the current value) – best state at J possible is 01. • 01 is only choice. • Best state possible for J is 00 • Choose from 00,01. • Leakage improvement possible = (Leakage of J at state 00 – Leakage of J at state 01 – Leakage cost of replacing gate H with a sngl1out0 variant). 0 1 0 G 0 J 0 H 0 0 1

17. …Replacement Algorithm • If both logic 0 and logic 1 are required at some node – then try snglmx variants. • If sngl variants cause timing violations – try dbl variants. • Use dbl variantsonly ifleakage improvement is positive. • Traverse circuit from inputs to output in levelized order.

18. Experimental Results • Cell library characterization done in SPICE. • bsim100 Berkeley Predictive Technology Model (BPTM) cards, 1.2V VDD • Algorithm implemented in PERL • Run on 3GHz Pentium 4, 2GB RAM, Fedora Core 3.

19. Experimental Results • On average 30% improvement in leakage over applying MLV at primary inputs alone. • Existing approaches that use IVC and control points to get a similar leakage improvement have a delay penalty of 10 to 15%.

20. Experimental Results • There is never a delay increase. • Delay decreases in some instances • due to use of dbl variants. • sngl1out variants improve delay in one transition. • Runtime is low. • Current implementation is in PERL – expected to speed up when implemented in C/C++.

21. Experimental Results • Total Active area overhead on average = 24%. • Real area overhead would be lower after layout, place and route. • A lot of the area is used by sleep cut-off transistors. • These can be shared – would reduce area, delay and leakage.

22. Experimental Results • dblmx variants did not get used. • sngl1out variants used the most.

23. Conclusion • We extended input vector control to control internal nodes – not just primary inputs. • 30% leakage decrease with no delay penalty • Leakage decrease is over MLV at primary inputs alone. • Delay improvement in many cases. • Active area increase = 24%, but this is mostly sleep cut-off transistor area • Placed and routed area is expected to be much lower. • Dynamic power estimated to increase by 1.5% on average.

24. Thank you Contact info of authors: nikhil_AT_ece_DOT_tamu_DOT_edu sunilkhatri_AT_tamu_DOT_edu