1 / 23

An Energy Efficient Two-Phase Clocking Scheme

An Energy Efficient Two-Phase Clocking Scheme. Brad Bridgeman Yanqing Zhang. Outline. Overview of Place and Route -Main steps -Introduction to our problem Two-Phase Clock Explanation -Explanation of two-phase clock -Benefits of two-phase clock Implementation

caddison
Download Presentation

An Energy Efficient Two-Phase Clocking Scheme

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Energy Efficient Two-Phase Clocking Scheme Brad Bridgeman Yanqing Zhang

  2. Outline • Overview of Place and Route • -Main steps • -Introduction to our problem • Two-Phase Clock Explanation • -Explanation of two-phase clock • -Benefits of two-phase clock • Implementation • -Step by step project implementation • Conclusions • -Rules in using the two-phase clock • Summary • Questions, Thoughts, Problems

  3. Overview of Place and Route Steps 1. Import a synthesized netlist -We will be using a PIC formerly done 2. Define chip core size -This is where we will place the cells 3. Draw power rings and rails -Supply and ground rails are placed around the core 4. Place standard cells -The tool automatically places standard cells within the defined core

  4. Overview of Place and Route Steps 5. 1st/Trial route -The tool does a fictitious route to get an idea of how routing will look 6. Clock tree generation/synthesis -The tool creates the clock tree and inserts clock buffers in the design 7. Timing Closure -The tool inserts/deletes buffers to solve hold/setup time violations 8. Nanoroute -With the netlist finalized, the tool actually does the routing

  5. Our Problem • Clock Buffering -339 buffers are placed to drive the clock and generate an H-tree • Hold time violation fixes -Buffers are placed between logic paths with too small a delay which violate hold time constraints Our problem is, it’s a lot of wasted area and power that essentially does nothing. We think we can do better than this.

  6. An Intro to Our Clock Buffering Scheme • We propose a two-phased clocking scheme. Our motivation is that this may reduce overall area and power in the design. • The idea that our approach is centered around is that the two-phased clock will eliminate hold time violations:

  7. An Intro to Our Clock Buffering Scheme So how will this help our problem of reducing buffer area and power?

  8. Benefits of Two-Phased Clock Scheme First, let’s not forget what the buffers were for: 1. To fix hold time violations Two-Phased Clocking Scheme On a micro-architectural level: The two-phased clock negates the need for hold time buffers. Of course, the cost of the 2nd phase generator, and the cost to ‘adapt’ the registers to the two-phase clock must be taken into account. This is discussed later… Old Clocking Scheme

  9. Benefits of Two-Phased Clock Scheme First, let’s not forget what the buffers were for: 2. To drive the clock signal at sinks (clock input in registers) 3. To balance paths in the H-tree Old Clocking Scheme Two-Phased Clocking Scheme On a macro-architectural level: We may be able to reduce clock buffers because the clock load is reduced. We also may be able to take some of the buffers out at the deeper levels of the H-tree. Considering that the pulse generator is able to eliminate skew problems of up to 300ps, we can allow the skew in paths to be close to 300ps, which can reduce the buffer requirement. (These benefits are only conceptually shown)

  10. Implementation 1. Design of the 2nd phase clock generator (pulse generator)

  11. Implementation Estimated Costs: Area: 35 u2 Power: 0.06uW 2.6ns 1.1ns 0.8ns

  12. Implementation 2. Verification of hold time fix using designed pulse generator

  13. Implementation Latches correctly on the next clock cycle

  14. Implementation 3. We find the paths that violate hold time. They are shift register paths R20_reg_0 -> PC_run_reg_10 R27_reg_3 -> STACKLEVEL_reg_1 R26_reg_7 -> PC_run_reg_9 R23_reg_5 -> PC_run_reg_8 R10_reg_2 -> STACKLEVEL_reg_0 R19_reg_6 -> W_int_reg_6 R19_reg_5 -> W_int_reg_5 R19_reg_2 -> W_int_reg_2 R19_reg_4 -> W_int_reg_4 R19_reg_3 -> W_int_reg_3 R21_reg_1 -> STATUS_int_reg_2

  15. Implementation 4. We compare how the hold time violation is fixed by the tool and by our method: VS Power saved by using our method: ∆P=-0.002 uW This means we’ve wasted power in this example…

  16. Implementation We compare for every path, and see which path(s) benefit from our clocking scheme: R20_reg_0 -> PC_run_reg_10 R27_reg_3 -> STACKLEVEL_reg_1 R26_reg_7 -> PC_run_reg_9 R23_reg_5 -> PC_run_reg_8 R10_reg_2 -> STACKLEVEL_reg_0 R19_reg_6 -> W_int_reg_6 R19_reg_5 -> W_int_reg_5 R19_reg_2 -> W_int_reg_2 R19_reg_4 -> W_int_reg_4 R19_reg_3 -> W_int_reg_3 R21_reg_1 -> STATUS_int_reg_2 Can’t win them all…

  17. Conclusion 1 On a micro-architectural level: Qualitatively: -We only use our clocking scheme where it overcomes the cost for buffering. The clocking scheme becomes attractive when the imbalance in the H-tree increases and thus skew increases. Quantitatively: -In this case, the clocking scheme saves power when more than 3 buffers are placed to fix hold time violations

  18. Implementation 5. We make the following analysis: The reason our clocking scheme does not save much power is because few paths are imbalanced that they need a lot of buffering. Driving JUST ONE path is costly. Around the area where a pulse generator was originally needed, we can take out some buffers near the end of the H-tree of other registers to CREATE skew, and have the pulse generator drive those registers as well. However, how much can a pulse generator drive?... Reg11 Reg12 Path with less skew, We make it more skewy Reg21 Reg22 Path with more skew Pulse Gen

  19. Implementation 6. We simulate to estimate how many registers one pulse generator can drive We find that it can drive 3

  20. Implementation On the brink of failing

  21. Conclusion 2 On a macro-architectural level: Qualitatively: -We can take buffers out of other path(s) to create the greatest tolerable skew, and have the pulse generator drive that path as well. Quantitatively: -In this case, the pulse generator can at most drive 2 other paths, and the maximum tolerable skew is 300ps.

  22. Summary Steps to improve buffering conditions: Search for the path(s) that violate hold time constraint Replace excessive register to register buffering with a 2nd phase clock pulse generator driving the downstream register On the same branch but different path in the H-tree, remove buffers driving the upstream register in that path until there is maximum tolerable skew in that path Have the pulse generator drive the downstream register in that path

  23. Questions, Thoughts, Problems 1. We didn’t have the sub-vt models for our technology -This project was meant for sub-vt, but our models broke down at Vdd=450 mV 2. Better pulse generator? -Our pulse generator costs a lot of power/area, also, not a good generator in sub-vt 3. Simulation conditions -Didn’t simulate within the whole design, hard to figure out the inputs needed

More Related