870 likes | 1.2k Views
Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikoli ć. Timing Issues. January 2003. Synchronous Timing. Timing Definitions. D. Q. Clk. Latch Parameters. the register maximum propagation delay t c-q is clock to output and
E N D
Digital Integrated CircuitsA Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić Timing Issues January 2003
D Q Clk Latch Parameters the register maximum propagation delay tc-q is clock to output and td-q is data to output delay data D must be stable to be properly registered in the latch (no unintended changes when the latch is transparent) T Clk PWm tsu D thold td-q tc-q Q Intended change must come before the latch closesby at least tsu Delays can be different for rising and falling data transitions
Register Parameters D Q Data must be stable before the rising edge of the clock and held sufficiently long to be processed by the register Clk T Clk thold D tsu tc-q Q Delays can be different for rising and falling data transitions
Clock Uncertainties Sources of clock uncertainty
Clock Nonidealities • Clock skew (constant delay) • Spatial variation in temporally equivalent clock edges; deterministic + random, tSK • Clock jitter (random variations) • Temporal variations in consecutive edges of the clock signal; modulation + random noise • Cycle-to-cycle (short-term) tJS • Long term tJL • Variation of the pulse width • Important for level sensitive clocking
Clock Skew and Jitter Clk • Both skew and jitter affect the effective cycle time • Only skew affects the race margin tSK Clk tJS
Clock Skew # of registers Earliest occurrenceof Clk edge Nominal – /2 Latest occurrenceof Clk edge Nominal + /2 Clk delay Insertion delay Max Clk skew
Positive Skew Launching register clock edge arrives before the receiving register clock edge
Negative Skew Receiving register clockedge arrives before the launching register clockedge
Timing Constraints Were cd stands for a contamination or a minimum delay both in register propagation time and combinational logic delay Minimum clock cycle time: T - = tc-q + tsu + tlogic Worst case is when receiving edge arrives early (negative ) thus a negative clock skew reduces the clock frequency
Timing Constraints Hold time constraint: t(c-q, cd) + t(logic, cd) > thold + Worst case is when receiving edge arrives late (positive skew) Race between data and clock is more likely for a positive clock skew
Impact of Jitter Since jitter is a random delay it increases the minimum clock period and increases likelihood for race between clock and data
Longest Logic Path in Edge-Triggered Systems TJI + d TSU Clk TClk-Q TLM T Latest point of launching Earliest arrivalof next cycle TLM - the maximum logic delay
Clock Constraints in Edge-Triggered Systems If the launching edge is late and the receiving edge is early, the data will not be too late if: Tc-q + TLM + TSU < T – TJI,1 – TJI,2 - d Minimum cycle time is determined by the maximum delays through the logic Tc-q + TLM + TSU + d + 2 TJI < T Skew can be either positive or negative
Shortest Path Shortest path effects feedback connections that typically have a negative clock skew Earliest point of launching Clk TClk-Q TLm TLm - the minimum logic delay Clk TH Data must not arrivebefore this time Nominalclock edge
Clock Constraints in Edge-Triggered Systems If launching edge is early and receiving edge is late: Tc-q + TLm – TJI,1 > TH + TJI,2 + d Minimum logic delay Tc-q + TLm > TH + 2TJI+ d For clock skew only we had: t(c-q, cd) + t(logic, cd) > thold +
Flip-Flop – Based Timing Logic propagation must finish before the next clock’s rising edge Skew Flip-flop delay Logic delay f TSU TClk-Q Flip-flop f = 1 f = 0 Logic Clock cycle Representation after M. Horowitz, VLSI Circuits 1996.
Flip-Flops and Dynamic Logic Logic delay TSU TSU TClk-Q TClk-Q f = 1 f = 0 f = 1 f = 0 Logic delay Precharge Evaluate Precharge Evaluate In dynamic logic gates logic propagation must finish before the clock’s falling edge Dual relation holds for the PUN controlled by inverted clocks Flip-flops are used only with static logic
Latch timing When data arrives to a transparent latch tD-Q Latch is a ‘soft’ barrier D Q Clk tClk-Q When data arrives to closed latch Data has to be ‘re-launched’
Single-Phase Clock with Latches f Latch Logic Tskl Tskl Tskt Tskt Clk PW P
Latch-Based Design L1 latch is transparentwhen f = 0 L2 latch is transparent when f = 1 f L1 L2 Logic Latch Latch Logic
L 1 L1 L 2 In CLB_B CLB_A Q Q D D D Q t t a b c d e pd,A pd,B CLK1 CLK2 CLK1 T CLK j k l m CLK1 CLK2 slack passed to next stage shortening the clock period requirement t t t t pd,B pd,A DQ DQ e valid valid a valid valid b c valid d Slack-borrowing
Clock Distribution H-tree balances the clock skew Clock is distributed in a tree-like fashion
More realistic H-tree [Restle98]
The Grid Clock Distribution • Does not require • rc-matching • Large power dissipation • Easier to satisfy metal • density requirement • in fabrication • Good thermal distribution
Example: DEC Alpha 21164 Clock Frequency: 300 MHz - 9.3 Million Transistors Total Clock Load:3.75 nF Power in Clock Distribution network : 20 W (out of 50) Uses Two Level Clock Distribution: • Single 6-stage driver at center of chip • Secondary buffers drive left and right side clock grid in Metal3 and Metal4 Total driver size: 58 cm!
final drivers pre-driver 21164 Clocking tcycle= 3.3ns • 2 phase single wire clock, distributed globally • 2 distributed driver channels • Reduced RC delay/skew • Improved thermal distribution • 3.75nF clock load • 58 cm final driver width • Local inverters for latching • Conditional clocks in caches to reduce power • More complex race checking • Device variation effects symmetry tskew = 150ps trise = 0.35ns Clock waveform Location of clock driver on die
Clock Skew in Alpha Processor Clock skew
tcycle= 1.67ns trise = 0.35ns tskew = 50ps EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS • 2 Phase, with multiple conditional buffered clocks • 2.8 nF clock load • 40 cm final driver width • Local clocks can be gated “off” to save power • Reduced load/skew • Reduced thermal issues • Multiple clocks complicate race checking Global clock waveform
ps 5 10 15 20 25 30 35 40 45 50 ps 300 305 310 315 320 325 330 335 340 345 EV6 Clock Results GCLK Skew (at Vdd/2 Crossings) GCLK Rise Times (20% to 80% Extrapolated to 0% to 100%)
EV7 Clock Hierarchy Active Skew Management and Multiple Clock Domains + widely dispersed drivers + DLLs compensate static and low-frequency variation + divides design and verification effort - DLL design and verification is added work + tailored clocks
Self-timed and Asynchronous Design Functions of clock in synchronous design 1) Acts as completion signal 2) Ensures the correct ordering of events Truly asynchronous design 1) Completion is ensured by careful timing analysis 2) Ordering of events is implicit in logic Self-timed design 1) Completion ensured by completion signal 2) Ordering imposed by handshaking protocol
Synchronous Pipelined Datapath • Make sure that the clock period T is larger than the max delay • T > max(tpd1,tpd2,tpd3 )+tpd,reg • Problems: • Clock skew and jitter • Strong clock currents, induces noise due to package inductance • Power dissipation • Uneven stage delay could be used to support faster processing
Self-Timed Pipelined Datapath Necessary for self-timed logic is a completion signal
LOGIC In Out NETWORK DELAY MODULE Critical path replica Start Done Using Delay Element (e.g. in memories) Completion Signal Generation • Completion signal can be generated by: • Replica delay • Dual-rail coding
Completion Signal Generation Completion signal generation by dual-rail coding requires a redundancy in data representation Below two bits B0 and B1 represent a single bit value B value B
Completion Signal in DCVSL V V DD DD B 0 Start Done B 1 B 0 B 1 In 1 In 1 PDN PDN In 2 In 2 Generation of a completion signal in DCVSL Start
Self-Timed Adder Done signal generated after all carry signals are stable
Completion Signal Using Current Sensing Current sensor outputs a low value when no current flows through the logic and a high value when logic is switching
Hand-Shaking Protocol Two Phase Handshake Sender can cannot change its data once it sends the request signal which finishes its active cycle Receiver reads the data and produces acknowledge signal, this will start a new cycle and sender can process new data Req and Ack signals can be generated in both high-low and low-high transitions
A B F n +1 A 0 0 0 F C 0 1 F n 1 0 F B n 1 1 1 (a) Schematic (b) Truth table Event Logic – The Muller-C Element Implementations of Muller-C element
Data Sender Receiver logic logic Data Ready Data Accepted Req C Ack Handshake logic 2-phase Handshake Protocol Initially Req, Ack, & Data Ready are 0 With Data Ready = 1 Req goes high and Data is transmitted Once this is finished Ack goes high and control is passed to the sender
Out In R 1 R 2 R 3 En Done Req Req i 0 C C C Ack Ack o i Example: Self-timed FIFO Data transferred on positive and negative transmission of En Done is a delayed En signal Examine operation of FIFO by plotting signals