1 / 87

Clocked Storage Elements for High-Performance and Low-Power Systems ICCD 2001 Tutorial

Clocked Storage Elements for High-Performance and Low-Power Systems ICCD 2001 Tutorial. Vojin G. Oklobdzija University of California Davis http://www.ece.ucdavis.edu/acsel Integration Corp. Berkeley, CA 94708 http://www.integration-corp.com. Outline.

ember
Download Presentation

Clocked Storage Elements for High-Performance and Low-Power Systems ICCD 2001 Tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clocked Storage Elements for High-Performance and Low-Power SystemsICCD 2001 Tutorial Vojin G. Oklobdzija University of California Davis http://www.ece.ucdavis.edu/acsel Integration Corp. Berkeley, CA 94708 http://www.integration-corp.com

  2. Outline • Importance of Clocked Storage Elements (CSE) • Basic Definitions • Difference between Latch and Flip-Flop • Timing and Power metrics • Representative designs used in High-Performance Microprocessors • Comparison • Conclusion, New Directions and Some novel designs Prof. V.G. Oklobdzija, University of California

  3. Importance of Clocked Storage Elements (CSE) Prof. V.G. Oklobdzija, University of California

  4. Trends in high-performance systems: Higher clock frequency Prof. V.G. Oklobdzija, University of California

  5. Power vs. Year High-end growing at 25% / year RISC @ 12% / yr X86 @ 15% / yr Consumer (low-end) At 13% / year Prof. V.G. Oklobdzija, University of California

  6. Predictions Source: Shekhar Borkar, Intel Prof. V.G. Oklobdzija, University of California

  7. Recent Interest in Clocked Storage Elements • Trends in high-performance systems • Higher clock frequency: 1.8GHz Pentium 4 • 4GHz logic presented) • More transistors on chip (214 million, ISSCC 2001) • Consequences • Increased Flip-Flop overhead relative to cycle time • Pipeline depth of 20 or more • Cycle time 10 - 20 FO4 delays, F-F overhead 3 - 4 FO4 Prof. V.G. Oklobdzija, University of California

  8. Courtesy: Doug Carmean, Hot-Chips-13 presentation Prof. V.G. Oklobdzija, University of California

  9. Processor Frequency Trend Source: Intel S. Borkar • Frequency doubles each generation • Number of gates/clock reduce by 25% Prof. V.G. Oklobdzija, University of California

  10. Pentium 3 uArchitecture stage stage stage logic register logic register logic register Delay: 0.6 ? 0.3 ? 0.6 ? 0.3 ? 0.6 ? 0.3 ? The total delay from pipeline stage to pipeline stage is 0.9 ns. The maximum clock rate for this design is 1.1 GHz. Prof. V.G. Oklobdzija, University of California

  11. The Pentium 4 Depends on Pipelines logic register logic register logic register logic register logic register logic register Delay: 0.4? 0.4? 0.4? 0.4? 0.4? 0.4? 0.16? 0.16? 0.16? 0.16? 0.16? 0.16? The total delay from pipeline stage to pipeline stage is 560 pS. This design, with twice the stages, has a maximum clock rate of 1.8 GHz. As the design is broken into more pipeline stages, the logic in each stage has less delay, and the registers between stages consume a higher percentage of the delay, causing diminishing returns. At some point the cost of adding more stages, such as branch prediction, causes a very marginal return. The only way out of this bottleneck is a faster register. This is one reason why the P4 is not significantly faster than a slower-clocked P3 for many applications. Prof. V.G. Oklobdzija, University of California

  12. Courtesy: Doug Carmean, Hot-Chips-13 presentation Prof. V.G. Oklobdzija, University of California

  13. Why Interest in Clocked Storage Elements ? • Higher impact of storage element delay • High-speed requires low CSE pipeline overhead: 3 FO4 or less. • Logic embedding property • Limits on performance • FF delays of 10pS - 100pS • Higher impact of clock skew • Ability to control both edges of the clock • Higher power consumption • >100W for recent processors • Clock system burns up to 40%, storage elements up to 20% of total power • Battery-powered applications Prof. V.G. Oklobdzija, University of California

  14. Basic Definitions Prof. V.G. Oklobdzija, University of California

  15. Clock Signals • Clocks are defined as pulsed, synchronizing signals that provide the time reference for the movement of data in the synchronous digital system. • The clocking in a digital system can be either single phase, or multi-phase (usually two-phase). • Clocking strategy is dependent and largely influenced by the choice of the CSE: latch or flip-flop Prof. V.G. Oklobdzija, University of California

  16. Clock Signal Uncertainty • Effects on cycle- time: – maximum delay restriction – violation of set- up time • May cause race – minimum delay restriction – violation of hold time • Uncertainty is: Jitter, Skew, and Duty Cycle Prof. V.G. Oklobdzija, University of California

  17. Jitter • Uncertainty in consecutive edges of a periodic signal • Caused by temporal noise events • Quantified as: – cycle-to-cycle or short-term jitter, tJS – long-term jitter, tJL Prof. V.G. Oklobdzija, University of California

  18. Clock Skew • Time difference between temporally-equivalent or concurrent edges of two periodic signals • Caused by spatial noise events Prof. V.G. Oklobdzija, University of California

  19. Clocking Strategies Single-phase clocking and single latch machine Edge-triggered clocking and Flip-Flop based machine Prof. V.G. Oklobdzija, University of California

  20. Two-phase clocking and two-phase latch machine with single latch Two-phase clocking and two-phase latch machine with double latch Clocking Strategies Prof. V.G. Oklobdzija, University of California

  21. Delay Restrictions • Clock defines hard boundaries for edge-triggered design • Clock boundaries are soft for level sensitive clocking and they are: • Tolerant for clock edge uncertainty • Tolerant to uncertainty of data arrival • Timing slack can voluntarily be passed forward • Time can forcefully be borrowed *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation Prof. V.G. Oklobdzija, University of California

  22. Single-Phase Clocking, Single Latch: Timing Constraints Prof. V.G. Oklobdzija, University of California

  23. Two-Phase Clocking with Two-Phase Double Latch Prof. V.G. Oklobdzija, University of California

  24. Two-Phase Clocking with One-Phase Double Latch Some people refer to this clocking arrangement as: “negative edge Flip-Flop” – erroneously ! Prof. V.G. Oklobdzija, University of California

  25. Difference between Latch and Flip-Flop Prof. V.G. Oklobdzija, University of California

  26. After the transition of the clock data can not change Latch is “transparent” Difference between Latch and Flip-Flop Prof. V.G. Oklobdzija, University of California

  27. How can one recognize the difference without knowing what is inside the “black-box” ? Flip-Flop and M-S Latch Arrangement Prof. V.G. Oklobdzija, University of California

  28. F-F and M-S Latch: Difference Experiment: Prof. V.G. Oklobdzija, University of California

  29. F-F and M-S Latch: Difference Structural Difference: No Clock Flip-Flop M-S Latch Prof. V.G. Oklobdzija, University of California

  30. Flip-Flop vs. Latch • Edge sensitive • Easier to use as frequency increases • Robustness to duty cycle • Simpler logic timing requirements • Fits into CAD tools • Level sensitive • May consume less power for the operation • Better clock skew/jitter characteristics • More difficult clock requirements Prof. V.G. Oklobdzija, University of California

  31. Flip-Flop: Example HLFF (Partovi) Prof. V.G. Oklobdzija, University of California

  32. Flip-Flop: Example HLFF (Partovi) Prof. V.G. Oklobdzija, University of California

  33. Pulse-Based Flip-Flops* *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation Prof. V.G. Oklobdzija, University of California

  34. Flip-Flop: Example D=0 pulse D=1 SAFF DEC Alpha 21264 Prof. V.G. Oklobdzija, University of California

  35. Requirements in the Flip-Flop Design • Small Clk-Output delay, Narrow sampling window • Low power • Small clock load • High driving capability (increased levels of parallelism) • Typical load ranges from 3-4 FO4 to 15-25 FO4. • High driving should be achieved by inserting inverters and following “logical effort” rules starting with minimal size CSE. • Symmetry: balanced D-Q and D-Q/not delay. • Integration of logic into the flop • Multiplexed or clock scan • Cross-talk insensitivity - dynamic/high impedance nodes are affected Prof. V.G. Oklobdzija, University of California

  36. Timing and Power metrics Prof. V.G. Oklobdzija, University of California

  37. Delay • Sum of setup time U and Clk-Q delay is the only true measure of the performance with respect to the system speed • T = TClk-Q + TLogic + Tsetup+ Tskew TClk-Q TSetup TLogic Prof. V.G. Oklobdzija, University of California

  38. Delay vs. Setup/Hold Times Prof. V.G. Oklobdzija, University of California

  39. Timing Characteristics Prof. V.G. Oklobdzija, University of California

  40. Timing parameters, details The best point to pick on delay curve is minimum D-Q Prof. V.G. Oklobdzija, University of California

  41. Simulation Condition and Testbench • Power • Data activity dependence as a FF characteristics • Consumption with 50% (30%)activity adopted as a figure of merit • Dissipation of driving inverters is part of total power consumption Prof. V.G. Oklobdzija, University of California

  42. Simulation Condition and Testbench • Timing • Total FF overhead is setup + clock-to-output time • Circuit optimization towards td-q • Clock skew robustness obtained from observing DQ curve • Power-Delay Product • Overall performance parameter at fixed frequency Prof. V.G. Oklobdzija, University of California

  43. Flip-Flop Performance Comparison Test bench • Total power consumed • internal power • data power • clock power • Measured for four cases • no activity (0000… and 1111…) • maximum activity (0101010..) • average activity (random sequence) • Delay is (minimum D-Q) • Clk-Q + setup time Prof. V.G. Oklobdzija, University of California

  44. The sources of internal power consumption Prof. V.G. Oklobdzija, University of California

  45. Design & optimization tradeoffs • Opposite Goals • Minimal Total power consumption • Minimal Delay • Power-Delay tradeoff • Minimize Power-Delay product (PDPtot) Prof. V.G. Oklobdzija, University of California

  46. Clocked Storage Elements in High-Performance Microprocessors Prof. V.G. Oklobdzija, University of California

  47. Master-Slave Latches • Positive setup times • Two clock phases: • distributed globally • generated locally • Small penalty in delay for incorporating MUX • Some circuit tricks needed to reduce the overall delay Prof. V.G. Oklobdzija, University of California

  48. PowerPC 603 M-S Latch Combination • Used in PowerPC family • Low-power • High speed • Big clock load • Easily embedded scan function Our simulations show PowerPC 603 (Gerosa, JSSC 12/94) • Small internal power consumption • Low-power feedback • Double the clock load compared with other latches • Locally generated second phase (reduces overall clock load) Prof. V.G. Oklobdzija, University of California

  49. mC2MOS M-S Latch • Small clock load (local clock buffering) • Low-power feedback • Big positive setup time • Robustness to clock slope, unlike classic C2MOS structure Our simulations show Y. Suzuki, “Clocked CMOS Calculator Circuitry”, IEEE J. Solid-State Circuits, Dec. 1973 Prof. V.G. Oklobdzija, University of California

  50. Advanced Flip-Flops Prof. V.G. Oklobdzija, University of California

More Related