1 / 28

Low-power Clock Trees for CPUs

Low-power Clock Trees for CPUs. Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan. Outline. Motivation and challenges Modeling and objectives Local skew with variation Local-skew slack Modeling process variation Proposed methodology and techniques

doane
Download Presentation

Low-power Clock Trees for CPUs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan ICCAD 2010, Dong-Jin Lee, University of Michigan

  2. Outline Motivation and challenges Modeling and objectives Local skew with variation Local-skew slack Modeling process variation Proposed methodology and techniques Initial tree construction and buffer insertion Robustness improvements Wire snaking and delay buffer insertion Empirical validation Summary ICCAD 2010, Dong-Jin Lee, University of Michigan

  3. Motivation Clock networks Contribute a significant fraction of dynamic power A limiting factor in high-performance CPUs and SoCs Challenges Interconnect is lagging in performance while transistors continue scaling Multi-objective optimization Traditional clock network synthesis constraints The increasing impact of process variation Power-performance-cost trade-offs ICCAD 2010, Dong-Jin Lee, University of Michigan

  4. Tree vs Mesh Objectives Minimize skew of a high-performance clock tree Minimize the impact of PVT variations Clock trees vs meshes, subject to skew < 7.5ps Robustness Ideal clock networks Meshes Trees Power efficiency ICCAD 2010, Dong-Jin Lee, University of Michigan

  5. Our Contributions The notion of local-skew slack for clock trees A tabular technique to estimate the impact of variations A path-based technique to enhance the robustness A time-budgeting algorithm for clock-tree tuning with minimal power resources Fine tuning of clock trees : accurate, fast, power efficient Implementation : Contango2.0 Strong empirical results : low skew, robustness, low power ICCAD 2010, Dong-Jin Lee, University of Michigan

  6. Modeling and Objectives ICCAD 2010, Dong-Jin Lee, University of Michigan

  7. Local Skew Main objective (concept) Minimize local skew in the presence of variation Definition: Skew Ψ : Clock tree λ(si) : the clock latency (insertion delay) at sink si∈ Ψ Definition: Global Skew (ωΨ) ICCAD 2010, Dong-Jin Lee, University of Michigan

  8. Local Skew Definition: The worst nominal local skew (ωΨΔ) Δ : local skew distance bound dist(si,sj) : Manhattan distance between si and sj∈ Ψ Definition: The worst local skew with variation (ωΨΔ,ν,y ) ν : variation model y : yield (0 <y ≤ 1) f(t) : the cumulative distribution function of ωΨΔ,ν ICCAD 2010, Dong-Jin Lee, University of Michigan

  9. Modeling and Objectives - Example Worst local skew with variation (ωΨΔ,ν,y) Probability density function of ωΨΔ,ν ΩΔ = 7.5ps, y = 95%, ωΨΔ,ν,y<ΩΔ ωΨΔ,ν,y = 6.05ps PDF Inverse CDF PDF CDF ωΨΔ,ν,y ΩΔ ωΨΔ,ν,y= 6.05ps y = 0.95 ps ICCAD 2010, Dong-Jin Lee, University of Michigan

  10. Optimization Objectives Building variation-tolerant clock trees such that ωΔ,ν,y < ΩΔ(ΩΔ–local skew limit) subject to slew constraints Minimizing clock-tree power ωΨΔ,ν,y ΩΔ ps ICCAD 2010, Dong-Jin Lee, University of Michigan

  11. Local-skew Slack σ(s) for sink s ∈ Ψ Definition σ(s) is the minimum amount of additional delay for s, so that the tree satisfiesωΨΔ< ΩΔ Example (Ωδ= 5ps) ICCAD 2010, Dong-Jin Lee, University of Michigan

  12. Modeling Process Variation Impact of variation on skew(si,sj) depends on tree path length(si,sj), num. buffers(si,sj) and type buffers(si,sj) Notation T : technology node B : buffer and wire library v : variation model Variation-estimation table ΞT,B,ν,y[w,b,t] worst-case increase in skew (with probability y) between two sinks connected by a tree path of length wwith b buffers and the buffer type t A B C D w : tree path length b : num. of buffers (2) t : buffer type ICCAD 2010, Dong-Jin Lee, University of Michigan

  13. Modeling Process Variation varEst(si,sj) the worst case variationalskew(si,sj) Key constraint ICCAD 2010, Dong-Jin Lee, University of Michigan

  14. Initial Tree Construction ZST-DME algorithm* based on Elmore delay A simple and robust technique for obstacle avoidance** Initial buffer insertion t0 : the initial buffer type for initial buffer insertion Use variation-estimation table with path lengths from initial tree Once t0is determined, we adapt the fast variant of van Ginneken’s algorithm*** for initial buffer insertion Minimize insertion delay, reliable slew rate * : J.-H. Huang et al, “On Bounded-Skew Routing Tree Problem,” DAC‘95 ** : D.-J. Lee et al, “Contango: Integrated Optimization of SoC Clock Networks,” DATE‘10 *** : W. Shi et al, “A Fast Algorithm for Optimal Buffer Insertion,” Trans. on CAD 24(6),2005 ICCAD 2010, Dong-Jin Lee, University of Michigan

  15. Robustness Improvement Improve robustness after initial buffer insertion so that ωΨΔ,ν,y < ΩΔholds after skew optimization The target buffer type for a tree-path between sink si and sj, t(si,sj) is defined as the smallest t such that choosing smaller buffers reduces capacitance ICCAD 2010, Dong-Jin Lee, University of Michigan

  16. Local Skew Optimization : Wire Snaking Local-skew optimization techniques based on the optimal tuning amount from the slack computation algorithms with varEst(si,sj) Improved wire snaking algorithm speed, accuracy and routing resources T1target(e) : 11ps Ttarget(e) : 11ps T1actual(e) : 7ps Tactual(e) : 7ps T2target(e) : 4ps T2actual(e) : 3ps T3target(e) : 1ps T3actual(e) : 1ps e Titarget(e) ≥Tiactual(e) Ttarget(e) : 11ps Tactual(e) : 11ps Tactual(e) : 7ps Tactual(e) : 10ps ICCAD 2010, Dong-Jin Lee, University of Michigan

  17. Delay Model for Wire Snaking α : to keep Tiactual(e) ≤ Titarget(e) efficiently Delay model for wire snaking aims for Tiactual(e) to satisfy the above inequality with the highest αpossible Look-up tables for length estimation to enhance the quality of estimation by wire snaking a set of SPICE simulations for each technology environment which includes technology model, types of buffers and wires, variation specification We achieved αvalues between 60% and 70% for the ISPD 2010 CNS contest benchmarks ICCAD 2010, Dong-Jin Lee, University of Michigan

  18. Optimal Node Selection for Wire Snaking Wire snaking at buffer outputs is more accurate than at other nodes Limiting wire snaking to buffer outputs reduces # of SPICE calls Example ICCAD 2010, Dong-Jin Lee, University of Michigan

  19. Delay Buffer Insertion Highly unbalanced sink capacitances or layout obstacles may result in significant local skew Delay buffer insertion Skew can be reduced by the delay of the inserted buffer Further precise wire snaking is possible because the inserted buffer isolates the target node Example ICCAD 2010, Dong-Jin Lee, University of Michigan

  20. ISPD’10 Clock Network Synthesis Contest 45nm 2GHz CPU benchmarks from IBM and Intel Evaluation Monte-Carlo SPICE simulations with PVT variations Skew and slew constraints (7.5ps, 100ps) Objective : total capacitance — proxy for dynamic power A rare opportunity to compare multiple strategies for clock-network synthesis ICCAD 2010, Dong-Jin Lee, University of Michigan

  21. Example of Our Clock Tree ispd10cns07 ICCAD 2010, Dong-Jin Lee, University of Michigan

  22. Empirical Validation ISPD 2010 benchmarks 2.6ps nominal local skew Smaller capacitance than CNSrouter and NTUclockby 4.22× and 4.13× resp. Our clock trees yield > 95%, while CNSrouter violates yield constraints on 3 benchmarks and NTUclock on 7 ICCAD 2010, Dong-Jin Lee, University of Michigan

  23. ICCAD 2010 Proceedings Local skew constraints are all cleared Smaller capacitance than NTU and CUHK by 2.09×and 4.24× resp. More robust with smaller capacitance ICCAD 2010, Dong-Jin Lee, University of Michigan

  24. Skew Profiles for Contango2 & CNSrouter Probability density functions (PDF) for skew on ISPD’10 benchmarks ICCAD 2010, Dong-Jin Lee, University of Michigan

  25. Trade-off - Power vs Robustness to Variations When tight local skew constraints, large buffers ensure robustness, increasing capacitance Much capacitance can be saved when local skew constraints are loose Experiments on ispd10cns08 ICCAD 2010, Dong-Jin Lee, University of Michigan

  26. Summary A tree solution for CPU clock routing Improves power consumption under tight skew constraints in the presence of variation Clock trees can be tuned to have nominal skew below 5 ps and low total skew in the presence of variation 4x capacitance improvement on average over mesh structures Our clock trees have a higher yield than meshes meshes are not as easy to tune for nominal skew ICCAD 2010, Dong-Jin Lee, University of Michigan

  27. Questions and Answers Thank you!! Questions? ICCAD 2010, Dong-Jin Lee, University of Michigan

  28. Questions and Answers ICCAD 2010, Dong-Jin Lee, University of Michigan

More Related