1 / 107

Vojin G. Oklobdzija University of California Davis ece.ucdavis/acsel

Clocked Storage Elements: Master-Slave Latches and Flip-Flops for High-Performance and Low-Power Systems. Vojin G. Oklobdzija University of California Davis http://www.ece.ucdavis.edu/acsel Integration Corp. Berkeley, CA 94708 http://www.integration-corp.com. Outline.

dquillen
Download Presentation

Vojin G. Oklobdzija University of California Davis ece.ucdavis/acsel

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clocked Storage Elements: Master-Slave Latches and Flip-Flops for High-Performance and Low-Power Systems Vojin G. Oklobdzija University of California Davis http://www.ece.ucdavis.edu/acsel Integration Corp. Berkeley, CA 94708 http://www.integration-corp.com

  2. Outline • Recent interest and importance • Timing and Power metrics • Master-Slave vs. Flip-Flop • Design and optimization tradeoffs • Representative designs • Comparison • Some novel designs • Conclusion Prof. V.G. Oklobdzija, University of California

  3. Prof. V.G. Oklobdzija, University of California

  4. Recent Interest in Storage Elements • Trends in high-performance systems: Higher clock frequency Prof. V.G. Oklobdzija, University of California

  5. Performance 3X / generation Source: ISSCC, uP Report, Hot-Chips Prof. V.G. Oklobdzija, University of California

  6. Total transistors 3X / generation Logic transistors 2X / generation Source: ISSCC, uP Report, Hot-Chips Prof. V.G. Oklobdzija, University of California

  7. Processor Design Challenges • Performance is tracking frequency increase • Where are the transistors contributing ? • 3X per generation growth in transistors seems to be uncompensated as far as performance is concerned Prof. V.G. Oklobdzija, University of California

  8. Power versus Year High-end growing at 25% / year RISC @ 12% / yr X86 @ 15% / yr Consumer (low-end) At 13% / year Prof. V.G. Oklobdzija, University of California

  9. Gloom and Doom predictions Source: Shekhar Borkar, Intel Prof. V.G. Oklobdzija, University of California

  10. Recent Interest in Storage ElementsOr Why Do Computer Architect Care ? • Trends in high-performance systems • Higher clock frequency (1.5GHz Pentium, 4GHz presented) • More transistors on chip (214 million, ISSCC 2001) • Consequences • Increased Flip-Flop overhead relative to cycle time • Pipeline depth of 20 or more • Cycle time 10 - 20 FO4 delays, Flop overhead 3 - 4 FO4 Prof. V.G. Oklobdzija, University of California

  11. Processor Frequency Trend Source: Intel S. Borkar • Frequency doubles each generation • Number of gates/clock reduce by 25% Prof. V.G. Oklobdzija, University of California

  12. Traditional Pentium 3 uArchitecture stage stage stage logic register logic register logic register Delay: 0.6 0.3 0.6 0.3 0.6 0.3 The total delay from pipeline stage to pipeline stage is 0.9 ns. The maximum clock rate for this design is 1.1 GHz. Prof. V.G. Oklobdzija, University of California

  13. The Pentium 4 Depends on Pipelines logic register logic register logic register logic register logic register logic register Delay: 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 The total delay from pipeline stage to pipeline stage is 0.6 ns. This design, with twice the stages, only has a maximum clock rate of 1.67 GHz. As the design is broken into more pipeline stages, the logic in each stage has less delay, and the registers between stages consume a higher percentage of the delay, causing diminishing returns. At some point the cost of adding more stages, such as branch prediction, causes a very marginal return. The only way out of this bottleneck is a faster register. This is one reason why the P4 is not significantly faster than a slower-clocked P3 for many applications. Prof. V.G. Oklobdzija, University of California

  14. Recent Interest in Storage Elements • Difficult to control both edges of the clock • Higher impact of clock skew • Higher cross-talk and substrate coupling • Higher power consumption • Limits on performance • Clock burns up to 40%, storage elements up to 20% of total power • I have even seen 75% recently (ISSCC 2001) Prof. V.G. Oklobdzija, University of California

  15. Solution: Faster Flip-Flops We have developed a new fast register which can be fabricated using the standard microprocessor fabrication lines – several times faster than registers currently used. logic logic logic logic logic logic Delay: 0.3 0.04 0.3 0.04 0.3 0.04 0.3 0.04 0.3 0.04 0.3 0.04 The total delay from pipeline stage to pipeline stage is 0.34 ns. Using our design allows a maximum nominal clock rate of 2.9 GHz. Can you achieve this performance gain with architecture ? Prof. V.G. Oklobdzija, University of California

  16. Clocked Storage Element Requirements • High speed • High-frequency applications require low FF timing overhead • Sub-nanosecond clock periods  x10ps - x100ps FF delays • Low power • Dissipation of >100W for recent processors • Battery-supplied applications • Size • High clock imperfections robustness • Logic embedding property Prof. V.G. Oklobdzija, University of California

  17. Clock Signals • Clocks are defined as pulsed, synchronizing signals that provide the time reference for the movement of data in the synchronous digital system. • The clocking in a digital system can be either single phase, or multi-phase (usually two-phase). • Clocking strategy is dependent and largely influenced by the choice of the storage element: latch or flip-flop Prof. V.G. Oklobdzija, University of California

  18. Clock Signal Uncertainty • Effects on cycle- time: – maximum delay restriction – violation of set- up time • May cause race – minimum delay restriction – violation of hold time • Uncertainty is: Jitter, Skew, and Duty Cycle Prof. V.G. Oklobdzija, University of California

  19. Jitter • Uncertainty in consecutive edges of a periodic signal • Caused by temporal noise events • Quantified as: – cycle-to-cycle or short-term jitter, tJS – long-term jitter, tJL Prof. V.G. Oklobdzija, University of California

  20. Clock Skew • Time difference between temporally-equivalent or concurrent edges of two periodic signals • Caused by spatial noise events Prof. V.G. Oklobdzija, University of California

  21. Clocking Strategies Single-phase clocking and single latch machine Edge-triggered clocking and Flip-Flop based machine Prof. V.G. Oklobdzija, University of California

  22. Two-phase clocking and two-phase latch machine with single latch Two-phase clocking and two-phase latch machine with double latch Clocking Strategies Prof. V.G. Oklobdzija, University of California

  23. Delay Restrictions • Clock defines hard boundaries for edge-triggered design • Clock boundaries are soft for level sensitive clocking and they are: • Tolerant for clock edge uncertainty • Tolerant to uncertainty of data arrival • Timing slack can voluntarily be passed forward • Time can forcefully be borrowed *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation Prof. V.G. Oklobdzija, University of California

  24. Single-Phase Clocking, Single Latch: Timing Constraints Prof. V.G. Oklobdzija, University of California

  25. Two-Phase Clocking with Two-Phase Double Latch Prof. V.G. Oklobdzija, University of California

  26. Two-Phase Clocking with One-Phase Double Latch Some people refer to this clocking arrangement as: “negative edge Flip-Flop” – erroneously ! Prof. V.G. Oklobdzija, University of California

  27. After the transition of the clock data can not change Latch is “transparent” Difference between Latch and Flip-Flop Prof. V.G. Oklobdzija, University of California

  28. How can one recognize the difference without knowing what is inside the “black-box” ? Flip-Flop and M-S Latch Combination Prof. V.G. Oklobdzija, University of California

  29. F-F and M-S Latch: Difference Experiment: Prof. V.G. Oklobdzija, University of California

  30. F-F and M-S Latch: Difference Structural Difference: No Clock Flip-Flop M-S Latch Prof. V.G. Oklobdzija, University of California

  31. Flip-Flop vs. Latch • Edge sensitive • Easier to use as frequency increases • Robustness on duty cycle • Simpler logic timing requirements • Fits into CAD tools • Level sensitive • Consume less power for the operation • Better clock skew/jitter characteristics Prof. V.G. Oklobdzija, University of California

  32. Flip-Flop: Example HLFF (Partovi) Prof. V.G. Oklobdzija, University of California

  33. Flip-Flop: Example HLFF (Partovi) Prof. V.G. Oklobdzija, University of California

  34. Pulse-Based Flip-Flops* *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation Prof. V.G. Oklobdzija, University of California

  35. Flip-Flop: Example D=0 pulse D=1 SAFF DEC Alpha 21264 Prof. V.G. Oklobdzija, University of California

  36. Requirements in the Flip-Flop Design • Small Clk-Output delay, Narrow sampling window • Low power • Small clock load • High driving capability (increased levels of parallelism) • Typical flip-flop load in a 0.18m CMOS ranges from 50fF to over 200fF, with typical values of 100-150fF in critical paths • (rule of thumb number for cap Cgate=2fF/um ) • Integration of logic into the flop • Multiplexed or clock scan • Crosstalk insensitivity - dynamic/high impedance nodes are affected Prof. V.G. Oklobdzija, University of California

  37. Timing Propagation time (clock-to-output) Set-up time Hold time Skew amortization Power consumption Internal power Input power State Element Characterization Prof. V.G. Oklobdzija, University of California

  38. Flip-Flop Delay • Sum of setup time and Clk-output delay is the only true measure of the performance with respect to the system speed • T = TClk-Q + TLogic + Tsetup+ Tskew TClk-Q TSetup TLogic Prof. V.G. Oklobdzija, University of California

  39. Delay vs. Setup/Hold Times Prof. V.G. Oklobdzija, University of California

  40. Timing Characteristics Prof. V.G. Oklobdzija, University of California

  41. Timing parameters, details The best point to pick on delay curve is minimum D-Q Prof. V.G. Oklobdzija, University of California

  42. Latch and Flip-Flop latencies (tDQ ) vs. Data-to-clock Set-up Time (tDC ) *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation Prof. V.G. Oklobdzija, University of California

  43. Clock Skew Considerations • Need for characterization of Flip-Flop behavior in presence of skew/jitter • Soft Clock Edge property only qualitatively describes skew immunity • Still, designers calculate maximum useful time by incorporating all skew into clocking overhead Prof. V.G. Oklobdzija, University of California

  44. Skew Overhead of Ideal Flip-Flop Real Skew Overhead Clock Skew Considerations Prof. V.G. Oklobdzija, University of California

  45. Clock Skew Considerations Prof. V.G. Oklobdzija, University of California

  46. Clock Skew Considerations • Skew Rejection - ratio of total skew and its impact on FF overhead • Shows how circuit reacts to clock edge uncertainty • Helps answering the question to what point to optimize clock distribution network Prof. V.G. Oklobdzija, University of California

  47. Simulation Condition and Testbench • Power • Data activity dependence as a FF characteristics • Consumption with 50% activity adopted as a figure of merit • Dissipation of driving inverters is part of total power consumption Prof. V.G. Oklobdzija, University of California

  48. Simulation Condition and Testbench • Timing • Total FF overhead is setup + clock-to-output time • Circuit optimization towards td-q • Clock skew robustness obtained from observing DQ curve • Power-Delay Product • Overall performance parameter at fixed frequency Prof. V.G. Oklobdzija, University of California

  49. Flip-Flop Performance Comparison Test bench • Total power consumed • internal power • data power • clock power • Measured for four cases • no activity (0000… and 1111…) • maximum activity (0101010..) • average activity (random sequence) • Delay is (minimum D-Q) • Clk-Q + setup time Prof. V.G. Oklobdzija, University of California

  50. OLD TEST BENCH: • Total Power = Drivers Power + Test Unit Power • PDP- Optimized = Equal Trade-off on Power and Delay • Improper Load on Drivers • NEW TEST BENCH: • Drivers: Fixed Gain and Driving Test Unit Only • Data-to-Output Delay • PD2P Optimized = Best for Constant-Field Scaling OLD TEST BENCH NEW TEST BENCH Prof. V.G. Oklobdzija, University of California

More Related