1.07k likes | 1.07k Views
Clocked Storage Elements: Master-Slave Latches and Flip-Flops for High-Performance and Low-Power Systems. Vojin G. Oklobdzija University of California Davis http://www.ece.ucdavis.edu/acsel Integration Corp. Berkeley, CA 94708 http://www.integration-corp.com. Outline.
E N D
Clocked Storage Elements: Master-Slave Latches and Flip-Flops for High-Performance and Low-Power Systems Vojin G. Oklobdzija University of California Davis http://www.ece.ucdavis.edu/acsel Integration Corp. Berkeley, CA 94708 http://www.integration-corp.com
Outline • Recent interest and importance • Timing and Power metrics • Master-Slave vs. Flip-Flop • Design and optimization tradeoffs • Representative designs • Comparison • Some novel designs • Conclusion Prof. V.G. Oklobdzija, University of California
Recent Interest in Storage Elements • Trends in high-performance systems: Higher clock frequency Prof. V.G. Oklobdzija, University of California
Performance 3X / generation Source: ISSCC, uP Report, Hot-Chips Prof. V.G. Oklobdzija, University of California
Total transistors 3X / generation Logic transistors 2X / generation Source: ISSCC, uP Report, Hot-Chips Prof. V.G. Oklobdzija, University of California
Processor Design Challenges • Performance is tracking frequency increase • Where are the transistors contributing ? • 3X per generation growth in transistors seems to be uncompensated as far as performance is concerned Prof. V.G. Oklobdzija, University of California
Power versus Year High-end growing at 25% / year RISC @ 12% / yr X86 @ 15% / yr Consumer (low-end) At 13% / year Prof. V.G. Oklobdzija, University of California
Gloom and Doom predictions Source: Shekhar Borkar, Intel Prof. V.G. Oklobdzija, University of California
Recent Interest in Storage ElementsOr Why Do Computer Architect Care ? • Trends in high-performance systems • Higher clock frequency (1.5GHz Pentium, 4GHz presented) • More transistors on chip (214 million, ISSCC 2001) • Consequences • Increased Flip-Flop overhead relative to cycle time • Pipeline depth of 20 or more • Cycle time 10 - 20 FO4 delays, Flop overhead 3 - 4 FO4 Prof. V.G. Oklobdzija, University of California
Processor Frequency Trend Source: Intel S. Borkar • Frequency doubles each generation • Number of gates/clock reduce by 25% Prof. V.G. Oklobdzija, University of California
Traditional Pentium 3 uArchitecture stage stage stage logic register logic register logic register Delay: 0.6 0.3 0.6 0.3 0.6 0.3 The total delay from pipeline stage to pipeline stage is 0.9 ns. The maximum clock rate for this design is 1.1 GHz. Prof. V.G. Oklobdzija, University of California
The Pentium 4 Depends on Pipelines logic register logic register logic register logic register logic register logic register Delay: 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 The total delay from pipeline stage to pipeline stage is 0.6 ns. This design, with twice the stages, only has a maximum clock rate of 1.67 GHz. As the design is broken into more pipeline stages, the logic in each stage has less delay, and the registers between stages consume a higher percentage of the delay, causing diminishing returns. At some point the cost of adding more stages, such as branch prediction, causes a very marginal return. The only way out of this bottleneck is a faster register. This is one reason why the P4 is not significantly faster than a slower-clocked P3 for many applications. Prof. V.G. Oklobdzija, University of California
Recent Interest in Storage Elements • Difficult to control both edges of the clock • Higher impact of clock skew • Higher cross-talk and substrate coupling • Higher power consumption • Limits on performance • Clock burns up to 40%, storage elements up to 20% of total power • I have even seen 75% recently (ISSCC 2001) Prof. V.G. Oklobdzija, University of California
Solution: Faster Flip-Flops We have developed a new fast register which can be fabricated using the standard microprocessor fabrication lines – several times faster than registers currently used. logic logic logic logic logic logic Delay: 0.3 0.04 0.3 0.04 0.3 0.04 0.3 0.04 0.3 0.04 0.3 0.04 The total delay from pipeline stage to pipeline stage is 0.34 ns. Using our design allows a maximum nominal clock rate of 2.9 GHz. Can you achieve this performance gain with architecture ? Prof. V.G. Oklobdzija, University of California
Clocked Storage Element Requirements • High speed • High-frequency applications require low FF timing overhead • Sub-nanosecond clock periods x10ps - x100ps FF delays • Low power • Dissipation of >100W for recent processors • Battery-supplied applications • Size • High clock imperfections robustness • Logic embedding property Prof. V.G. Oklobdzija, University of California
Clock Signals • Clocks are defined as pulsed, synchronizing signals that provide the time reference for the movement of data in the synchronous digital system. • The clocking in a digital system can be either single phase, or multi-phase (usually two-phase). • Clocking strategy is dependent and largely influenced by the choice of the storage element: latch or flip-flop Prof. V.G. Oklobdzija, University of California
Clock Signal Uncertainty • Effects on cycle- time: – maximum delay restriction – violation of set- up time • May cause race – minimum delay restriction – violation of hold time • Uncertainty is: Jitter, Skew, and Duty Cycle Prof. V.G. Oklobdzija, University of California
Jitter • Uncertainty in consecutive edges of a periodic signal • Caused by temporal noise events • Quantified as: – cycle-to-cycle or short-term jitter, tJS – long-term jitter, tJL Prof. V.G. Oklobdzija, University of California
Clock Skew • Time difference between temporally-equivalent or concurrent edges of two periodic signals • Caused by spatial noise events Prof. V.G. Oklobdzija, University of California
Clocking Strategies Single-phase clocking and single latch machine Edge-triggered clocking and Flip-Flop based machine Prof. V.G. Oklobdzija, University of California
Two-phase clocking and two-phase latch machine with single latch Two-phase clocking and two-phase latch machine with double latch Clocking Strategies Prof. V.G. Oklobdzija, University of California
Delay Restrictions • Clock defines hard boundaries for edge-triggered design • Clock boundaries are soft for level sensitive clocking and they are: • Tolerant for clock edge uncertainty • Tolerant to uncertainty of data arrival • Timing slack can voluntarily be passed forward • Time can forcefully be borrowed *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation Prof. V.G. Oklobdzija, University of California
Single-Phase Clocking, Single Latch: Timing Constraints Prof. V.G. Oklobdzija, University of California
Two-Phase Clocking with Two-Phase Double Latch Prof. V.G. Oklobdzija, University of California
Two-Phase Clocking with One-Phase Double Latch Some people refer to this clocking arrangement as: “negative edge Flip-Flop” – erroneously ! Prof. V.G. Oklobdzija, University of California
After the transition of the clock data can not change Latch is “transparent” Difference between Latch and Flip-Flop Prof. V.G. Oklobdzija, University of California
How can one recognize the difference without knowing what is inside the “black-box” ? Flip-Flop and M-S Latch Combination Prof. V.G. Oklobdzija, University of California
F-F and M-S Latch: Difference Experiment: Prof. V.G. Oklobdzija, University of California
F-F and M-S Latch: Difference Structural Difference: No Clock Flip-Flop M-S Latch Prof. V.G. Oklobdzija, University of California
Flip-Flop vs. Latch • Edge sensitive • Easier to use as frequency increases • Robustness on duty cycle • Simpler logic timing requirements • Fits into CAD tools • Level sensitive • Consume less power for the operation • Better clock skew/jitter characteristics Prof. V.G. Oklobdzija, University of California
Flip-Flop: Example HLFF (Partovi) Prof. V.G. Oklobdzija, University of California
Flip-Flop: Example HLFF (Partovi) Prof. V.G. Oklobdzija, University of California
Pulse-Based Flip-Flops* *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation Prof. V.G. Oklobdzija, University of California
Flip-Flop: Example D=0 pulse D=1 SAFF DEC Alpha 21264 Prof. V.G. Oklobdzija, University of California
Requirements in the Flip-Flop Design • Small Clk-Output delay, Narrow sampling window • Low power • Small clock load • High driving capability (increased levels of parallelism) • Typical flip-flop load in a 0.18m CMOS ranges from 50fF to over 200fF, with typical values of 100-150fF in critical paths • (rule of thumb number for cap Cgate=2fF/um ) • Integration of logic into the flop • Multiplexed or clock scan • Crosstalk insensitivity - dynamic/high impedance nodes are affected Prof. V.G. Oklobdzija, University of California
Timing Propagation time (clock-to-output) Set-up time Hold time Skew amortization Power consumption Internal power Input power State Element Characterization Prof. V.G. Oklobdzija, University of California
Flip-Flop Delay • Sum of setup time and Clk-output delay is the only true measure of the performance with respect to the system speed • T = TClk-Q + TLogic + Tsetup+ Tskew TClk-Q TSetup TLogic Prof. V.G. Oklobdzija, University of California
Delay vs. Setup/Hold Times Prof. V.G. Oklobdzija, University of California
Timing Characteristics Prof. V.G. Oklobdzija, University of California
Timing parameters, details The best point to pick on delay curve is minimum D-Q Prof. V.G. Oklobdzija, University of California
Latch and Flip-Flop latencies (tDQ ) vs. Data-to-clock Set-up Time (tDC ) *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation Prof. V.G. Oklobdzija, University of California
Clock Skew Considerations • Need for characterization of Flip-Flop behavior in presence of skew/jitter • Soft Clock Edge property only qualitatively describes skew immunity • Still, designers calculate maximum useful time by incorporating all skew into clocking overhead Prof. V.G. Oklobdzija, University of California
Skew Overhead of Ideal Flip-Flop Real Skew Overhead Clock Skew Considerations Prof. V.G. Oklobdzija, University of California
Clock Skew Considerations Prof. V.G. Oklobdzija, University of California
Clock Skew Considerations • Skew Rejection - ratio of total skew and its impact on FF overhead • Shows how circuit reacts to clock edge uncertainty • Helps answering the question to what point to optimize clock distribution network Prof. V.G. Oklobdzija, University of California
Simulation Condition and Testbench • Power • Data activity dependence as a FF characteristics • Consumption with 50% activity adopted as a figure of merit • Dissipation of driving inverters is part of total power consumption Prof. V.G. Oklobdzija, University of California
Simulation Condition and Testbench • Timing • Total FF overhead is setup + clock-to-output time • Circuit optimization towards td-q • Clock skew robustness obtained from observing DQ curve • Power-Delay Product • Overall performance parameter at fixed frequency Prof. V.G. Oklobdzija, University of California
Flip-Flop Performance Comparison Test bench • Total power consumed • internal power • data power • clock power • Measured for four cases • no activity (0000… and 1111…) • maximum activity (0101010..) • average activity (random sequence) • Delay is (minimum D-Q) • Clk-Q + setup time Prof. V.G. Oklobdzija, University of California
OLD TEST BENCH: • Total Power = Drivers Power + Test Unit Power • PDP- Optimized = Equal Trade-off on Power and Delay • Improper Load on Drivers • NEW TEST BENCH: • Drivers: Fixed Gain and Driving Test Unit Only • Data-to-Output Delay • PD2P Optimized = Best for Constant-Field Scaling OLD TEST BENCH NEW TEST BENCH Prof. V.G. Oklobdzija, University of California