170 likes | 414 Views
64 bit Kogge-Stone Adders in different logic styles – A study. Rob McNish Satyanand Nalam. Objectives. To compare speed and power dissipation of 64-bit Kogge Stone Adder in 3 logic styles: Static CMOS Dynamic Logic Static Output Prediction Logic (OPL)
E N D
64 bit Kogge-Stone Adders in different logic styles – A study Rob McNish Satyanand Nalam
Objectives • To compare speed and power dissipation of 64-bit Kogge Stone Adder in 3 logic styles: • Static CMOS • Dynamic Logic • Static Output Prediction Logic (OPL) • To reduce leakage power dissipation in OPL circuits using MTCMOS techniques.
Technology used – 90nm PTM Hierarchical design Inverting sub-blocks (dot and square) to implement the nodes of the Kogge-Stone tree Implement the basic sub-blocks in static, dynamic and OPL-static styles Minimal changes to the tree netlist to construct the 3 adders Schematic for 16 bit adder is shown. Can be extended for a 64 bit adder. The Adder
Output Prediction Logic (OPL) • Logic style that can be applied to different logic styles to increase speed • Retains attributes of the underlying family (e.g Static, Dynamic, Pseudo-nmos etc.) • Relies on alternating nature of logical output values of a critical path, i.e, for any critical path the outputs of the gates along the paths will be alternating zeros and ones.
OPL predicts that every output will be 1 after the transitions are completed Since all gates are inverting, the predictions will be correct one half of the time => at least 2X speedup Problem: One at the output of every inverting gate is not a stable state OPL Concept
Solution: tri-state each gate with a clock => 1 at input and 1 output is possible. Example shown – 3 input nor gate in OPL-static, where the predicted value is a 1. CLK=0 => gate is tristated, with output precharged to 1. CLK=1 => conventional CMOS gate OPL Example
OPL clocking • Clock separation too less => heavy glitching and precharge value lost • Clock separation too large => minimal glitching, but speedup achieved is limited by the clock, not by the data • Optimal Clock separation => limited glitching and circuit is not clock-blocked.
Delay Plot – Static CMOS • Best case delay for the carry tree is the path for C0 as this consists entirely of inverters. • Best case delay distribution for the static cmos adder is shown. The mean was 144 ps. • The input vectors (in hex) are A=0000 0000 0000 0001 B=0000 0000 0000 0000 -> 0000 0000 0000 0001
Delay Plot – Static CMOS • Delay plot for a random case is shown. • Input vectors are A=8000 0000 0000 0000 B=0000 0000 0000 0000 -> 8000 0000 0000 0000
Delay Plot – Dynamic logic • Delay plot for a random case is shown. • Input vectors are A=8000 0000 0000 0000 B=0000 0000 0000 0000 -> 8000 0000 0000 0000
Power dissipation • Power dissipation was measured for the three adders using spectre for the
The novelty: Using a high VT footer to reduce leakage power in OPL gates • Added High VT footer transistor, in order to reduce leakage power in standby mode for the OPL adder. • Footers added to the basic sub-blocks. • High VT transistor modeled by applying a negative voltage to the bulk of the footer transistor.
Leakage power reduction • 10x reduction in leakage power got by using the high VT footer in the OPL adder.
References 1. A 0.5V, 400MHz, VDD-Hopping Processor with Zero-VTH FD-SOI Technology Hiroshi Kawaguchi, Kouichi Kanda ISSCC 2003 / SESSION 6 / LOW-POWER DIGITAL TECHNIQUES / PAPER 6.3 2. Output prediction logic: a high-performance CMOS design technique McMurchie, L.; Kio, S.; Yee, G.; Thorp, T.; Sechen, C.; Computer Design, 2000. Proceedings. 2000 International Conference on 17-20 Sept. 2000 Page(s):247 - 254 3. 409ps 4.7 FO4 64b adder based on output prediction logic in 0.18um CMOS Sheng Sun; Yi Han; Xinyu Guo; Kian Haur Chong; McMurchie, L.; Sechen, C.; VLSI, 2005. Proceedings. IEEE Computer Society Annual Symposium on 11-12 May 2005 Page(s):52 - 58