html5-img
1 / 27

Mohab Anis, Shawki Areibi *, Mohamed Mahmoud and Mohamed Elmasry

Dynamic and Leakage Power Reduction in MTCMOS Circuits Using an Automated Efficient Gate Clustering Technique. Mohab Anis, Shawki Areibi *, Mohamed Mahmoud and Mohamed Elmasry VLSI Research Group, University of Waterloo, Canada * School of Engineering, University of Guelph, Canada.

dotty
Download Presentation

Mohab Anis, Shawki Areibi *, Mohamed Mahmoud and Mohamed Elmasry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic and Leakage Power Reduction in MTCMOS Circuits Using an Automated Efficient Gate Clustering Technique Mohab Anis, Shawki Areibi *, Mohamed Mahmoud and Mohamed Elmasry VLSI Research Group, University of Waterloo, Canada * School of Engineering, University of Guelph, Canada

  2. Presentation Outline • Low Power Design in DSM • Concept of sleep transistors • Previous work • Sizing the sleep transistor • Bin-Packing technique • Set-Partitioning technique • Conclusion and extended work done

  3. Why Low Power Design ? • Growing market of mobile and handheld electronic systems. • Difficulty in providing adequate cooling. Fans create noise and add to cost. • Heat dissipation impacts packaging technology and cost • Increasing standby time of portable devices. In DSM regimes, leakage power has become as big a problem as dynamic power

  4. LVT Logic Block Concept of sleep transistors MTCMOS technology is an increasingly popular technique to reduce leakage power Proper ST sizing is a key issue ST size Area , Pdynamic , Pleakage ST size Delay LVT Logic Block VX VX R I SLEEP HVT Modeling of a sleep transistor as a resistor

  5. LVT Logic Circuit First Approach [1] Single ST to support whole circuit Increase in interconnect resistance for distant blocks ST size to compensate added resistance Area Pdynamic Pleakage More significant in the DSM regime [1] S.Mutah et al. “1-V Power Supply High-Speed Digital Circuit Technology with Multi-Threshold Voltage CMOS,” IEEE J. of Solid-State Circuits, pp.847-853, 1995. HVT SLEEP

  6. Second Approach [2] G1 G4 G6 G8 G2 G7 G9 G10 G5 G3 Single ST is sized according to a mutual exclusive discharge pattern algorithm. ST assignments are wasteful. Increase in interconnect resistance for distant blocks. ST size to compensate added resistance. Pdynamic Pleakage More significant in the DSM regime. [2] J.Kao et al. “MTCMOS Hierarchical Sizing Based on Mutual Exclusive Discharge Patterns”, in Proc. of 35th DAC, pp. 495-500, Las Vegas, 1998

  7. Sizing the sleep transistor • Objective: Constant ST size, causing 5% degradation in circuit speed. • (W/L)sleep =Isleep 0.05 n Cox (Vdd-VtL)(Vdd-VtH) Isleep is chosen to be 250 A. (W/L)sleep  6 for 0.18 m CMOS technology VtL = 350mV, VtH = 500mV

  8. 4-bit CLA Adder

  9. Preprocessing of Gate Currents Random I/Ps to CLA adder are applied, highest current discharge is monitored, and multiplied by corresponding switching activity Monitor the peak current value and time of occurrence + duration Currents are combined into single current Ieq = max{Ii}, when  Ii in time  max{Ii}

  10. F0=2 G1 F0=4 T1 G2 T2 65 I1 (G1) T1=80psec 79 time I2 (G2) 120psec T1+T2=210psec time 260psec I1 (G1): 0 0 11 22 33 43 54 65 54 43 33 22 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 12 18 24 30 37 43 49 55 61 67 73 79 73 67 61 55 49 43 37 30 24 18 12 6 0 0 0 0 0 0 0 I2 (G2): Timing Diagram

  11. Preprocessing Heuristic • Initialize current vectors • Set all Gates free; to move to sub-cluster; • 3. For all gates in circuit • If gate i is not clustered yet • assign gate i to new cluster k • update cluster current vector • calculate max current, start, end time • For all other gates in circuit • If (gate j is not clustered yet) • add current of gate j to cluster k • If (combination  max current) • append gate to cluster • update cluster info • set gate j locked in cluster k • End For • End For • 4. Return all clusters formed.

  12. Bin-Packing Technique Objective: Minimize the No. of used STs. Subject to: 1. IeqImax for any ST. 2. Ieq are assigned only once.

  13. Currents Assignment 1 2 Sleep Transistors Equivalent Currents IEQ3IEQ4IEQ7 IEQ1IEQ2IEQ5 IEQ6 Assigned Gates G5G6G7 G8G14G16G18G23 G1 G2G3 G4G9G10 G11 G12G13G15G17G19 G20G21 G22 G24 G25G26 G27G28  Currents (A) 250 240

  14. Clustering of CLA adder

  15. Cell Lmin Sleep Device cavity Ground rail Vdd Cell Height G1 G6 G7 G2 G3 G5 G8 G4 gnd G11 G12 G19 G9 G10 G14 G16 G13 G15 G17 G18 Vdd G22 G27 G23 G28 G26 G21 G25 G24 G20 gnd Set-Partitioning Technique

  16. Cost Function Cj = ( w1 . Cj1 ) + ( w2 . Cj2 ) Cj1 = Sleep_Transistor max_current -  currentii Cj2 =  duv in a group Sj Sj Gv dvw duv Gw dwu Gu

  17. Clustering Heuristic • Create_Clusters ( ) • Calculate distances between all gates; • Initialize maxgates_per_cluster=n; • Create clusters with Single gates; • For cl=2; cl  maxgates_per_cluster • Create_n_Gate_Cluster (cl) • For all clusters created calculate_cost ( ) • Create_n_Gate_Clusters (cl) • For cluster of type cl • create_new_cluster ( ) • While not done • Choose Gate with minimum distances • If sum of currents  capacity • append gate to newly created cluster • End If • If total gates within cluster  limit • break; • End While • End For • 2. Return newly created cluster

  18. Set-Partitioning Technique • Objective: Minimize CjSj • Subject to: 1. of currents for Sj Imax 2. Groups must cover all gates with no repetition.

  19. Grouping of gates Cell Lmin Sleep Device cavity Ground rail Vdd Cell Height G1 G6 G7 G2 G3 G5 G8 G4 gnd G11 G12 G19 G9 G10 G14 G16 G17 G13 G15 G18 Vdd G22 G27 G23 G28 G26 G21 G25 G24 G20 gnd

  20. Computational Time BP/SP CPU TIME SP CPU Time BP CPU Time 2000 1800 1600 1400 1200 1000 Time (secs) 800 600 400 200 0 -200 28 30 31 61 160 204 Number of Gates

  21. Results (% Savings) REF Benchmark 4-bit CLA adder 32-bit Parity Checker 6-bit Multiplier 4-bit 74181 ALU 32-bit Single Error Correcting C499 27-channel interrupt controller C432 No. of gates 28 31 30 61 202 160 BP 14 % 12 % 96 % 93 % 95, 92 % 18 % 16 % 92 % 85 % 92, 85 % 31 % 23 % 95 % 78 % 95, 78 % 17 % 14 % 93 % 83 % 93, 83 % 20 % 19 % 95 % 89 % 95, 89 % 2 % 0 % 99 % 89 % 99, 88 % Pdynamic to [1] Pdynamic to [2] Pleakage to [1] Pleakage to [2] ST_Area [1],[2] SP Pdynamic to [1] Pdynamic to [2] Pleakage to [1] Pleakage to [2] ST_Area [1],[2] 7 % 5 % 87 % 78 % 87, 77 % 9 % 6 % 85 % 70 % 84, 69 % 19 % 9 % 85 % 35 % 85, 34 % 11 % 8 % 86 % 66 % 86, 67 % 9 % 8 % 87 % 71 % 86, 70 % 2 % 0 % 98 % 77 % 98, 76 %

  22. % Power Savings (Bin-Packing)

  23. % Power Savings (Set-Partitioning)

  24. % ST Area Saving (Bin-Packing)

  25. % ST Area Saving (Set-Partitioning)

  26. Conclusion • BP technique cluster gates in MTCMOS circuits. Pdynamicand Pleakage are reduced by 15% and 90% compared to [1] and [2] respectively. • SP takes routing complexity into consideration. Pdynamic and Pleakage are reduced by 11% and 77% compared to [1] and [2] respectively.

  27. Extended Work Done • A hybrid clustering technique that combines the BP and SP techniques is devised, to produce a more efficient and faster solution. • Noise associated with ground bounce is taken as taken as a design criterion (< 50mV). • Investigating effect of different ST sizes on circuit parameters. • Investigating effect of the cost function weights w1 and w2 on circuit parameters.

More Related