1 / 43

# Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA

Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA. Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory University of California Davis Presentation available at: http://www.ece.ucdavis.edu/acsel. Directions in SoC Clocking.

## Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA

E N D

### Presentation Transcript

1. Issues in System on the Chip ClockingNovember 6th, 2003SoC Design Conference, Seoul, KOREA Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory University of California Davis Presentation available at:http://www.ece.ucdavis.edu/acsel

2. Directions in SoC Clocking • Synchronous / Asynchronous paradigm • Synchronous solutions: • Clock uncertainty absorption • Time borrowing • Skew-Tolerant Domino • Using both edges of the clock • Conclusion Prof. V.G. Oklobdzija, University of California

3. ISSCC-2002 Clock frequency trends Prof. V.G. Oklobdzija, University of California

4. Processor Frequency Trends Courtesy of: Intel, S. Borkar • Frequency doubles each generation • Number of gates/clock reduce by 25% Prof. V.G. Oklobdzija, University of California

5. Multi-GHz Clocking Problems • Fewer logic in-between pipeline stages: • Out of 7-10 FO4 allocated delays, FF can take 2-4 FO4 • Clock uncertainty can take another FO4 • The total could be ½ of the time allowed for computation Prof. V.G. Oklobdzija, University of California

6. Clock Uncertainties Prof. V.G. Oklobdzija, University of California

7. Motivation for Improving on Clocked Storage Elements Example: In a 2.0 GHZ processor T=500pS • Typically clocked storage element D-Q delay is in the order of 100-150pS • If one can design a faster CSE: e.g. 80-100pS D-Q, this represents 10-15% performance improvement • If in addition one can absorb 20pS of clock uncertainties and embedd one level of logic – this can yield up to 20% performance improvement • Try to achieve 10-20% performance improvement by introducing new features in the architecture ! • This is sufficient to turn an architect into a circuit designer ! Prof. V.G. Oklobdzija, University of California

8. Consequences of multi-GHz Clocks • Pipeline boundaries start to blur • Clocked Storage Elements must include logic • Wave pipelining, domino style, signals used to clock ….. • Synchronous design only in a limited domain • Asynchronous communication between synchronous domains Prof. V.G. Oklobdzija, University of California

9. Synchronous / Asynchronous Design on the Chip • 1 Billion transistors on the chip by 2005-6 • 64-b, 4-way issue logic core requires ~2 Million Table 1: Transistor count in typical RISC processors Prof. V.G. Oklobdzija, University of California

10. Synchronous / Asynchronous Design on the Chip 10 million transistors 1 Billion Transistors Chip Prof. V.G. Oklobdzija, University of California

11. Two views of the world: - Asynchronous - Synchronous Prof. V.G. Oklobdzija, University of California

12. Asynchronous Paradigm • Logic Stage can take any time it needs • Max. Speed limited by Handshake overhead • Increased complexity of logic (de-glitching) Prof. V.G. Oklobdzija, University of California

13. Synchronous Paradigm • Max Speed determined by the slowest logic block • Latch / FF timing overhead • Fixed clock frequency (set by longest path) Prof. V.G. Oklobdzija, University of California

14. Synchronous Paradigm • Clocked Storage Elements: Flip-Flops and Latches should be viewed as synchronization elements, not merely as storage elements ! • Their main purpose is to synchronize fast and slow paths: • prevent the fast path from corrupting the state Prof. V.G. Oklobdzija, University of California

15. Synchronous World: Tricks and Solutions • Clocked Storage Elements with clock uncertainty absorption features • Time Borrowing • Incorporation of Synchronization features into the logic • Skew Tolerant Domino • Utilizing both edges of the Clock Prof. V.G. Oklobdzija, University of California

16. Clocked Storage Element Overhead D Q Logic D Q N • The time taken from the pipeline by the CSE is U and Clk-Q delay. Thus, D-Q delay is relevant, not Clk-Q : T = TClk-Q + TLogic + U+ Tskew Clk Clk T TClk-Q TLogic U TD-Q=TClk-Q + U Tskew Prof. V.G. Oklobdzija, University of California

17. 350 300 Minimum Data-Output 250 200 Clk-Output [ps] 150 Setup Hold 100 50 0 -200 -150 -100 -50 0 50 100 150 200 Data-Clk [ps] Delay vs. Setup/Hold Times Sampling Window Prof. V.G. Oklobdzija, University of California

18. Prof. V.G. Oklobdzija, University of California

19. Clock Uncertainty Absorption Prof. V.G. Oklobdzija, University of California

20. Single-Ended Skew Tolerant Flip-Flop Nedovic, Oklobdzija, Walker, ISSCC 2003 Prof. V.G. Oklobdzija, University of California

21. Clock Uncertainty Absrobtion Worst-case D DQ Nominal D D-Clk D Clock uncertainty t CU Early D D-Clk Late D D-Clk T =0 Nominal Clk Q D DQm D DQM Prof. V.G. Oklobdzija, University of California

22. Clock Uncertainty Absorption t =30ps t =100ps CU CU Clk Clk U =-5ps Opt D D 3ps 44ps U =30ps Q Q Opt D =261ps D =220ps DQM DQM (b) t =100ps ( a =56% ) ( a ) t =30ps ( a =90% ) CU CU CU CU Prof. V.G. Oklobdzija, University of California

23. Synchronous World: Tricks and Solutions • Clocked Storage Elements with clock uncertainty absorption features • Time Borrowing • Incorporation of Synchronization features into the logic • Skew Tolerant Domino • Utilizing both edges of the Clock Prof. V.G. Oklobdzija, University of California

24. Time Borrowing Prof. V.G. Oklobdzija, University of California

25. Prof. V.G. Oklobdzija, University of California

26. Critical Path with Time Borrowing Prof. V.G. Oklobdzija, University of California

27. Latches as synchronizers • The purpose of CSE it is to synchronize data flow. • We need to insert CSE to prevent “fast paths” from reaching the next logic stage too early. • If the signal arrives late – it is allowed to borrow time from the next stage • However, borrowing can not go for ever ….. Prof. V.G. Oklobdzija, University of California

28. Using Single Pulsed Latch Prof. V.G. Oklobdzija, University of California

29. Single Pulsed Latch *Courtesy of D. Markovic & Intel MRL Prof. V.G. Oklobdzija, University of California

30. Optimal Single Latch Clocking Single Latch System (Unger & Tan ‘83): Pm=P ≥ DLM+DDQM {miminal clock period} DLm>DLmB≥W+TT+TL+H-DCQm {shortest path} Wopt=TL+TT+U+DCQM-DDQM {minimal clock width} Example: 0.10mTechnology FO4=25-40pS, FF=80pS, Tunc=25-35pS, fmax=2.5-4. GHz, T=250-400pS Wopt~2Tunc~50-70pS DLm~4Tunc+H-DCQm~100-140pS {this is close to ½ of a cycle} Prof. V.G. Oklobdzija, University of California

31. Synchronous World: Tricks and Solutions • Clocked Storage Elements with clock uncertainty absorption features • Time Borrowing • Incorporation of Synchronization features into the logic • Skew Tolerant Domino • Utilizing both edges of the Clock Prof. V.G. Oklobdzija, University of California

32. Skew-Tolerant Domino(a.k.a. Opportunistic Time Borrowing)Intel Patent No.5,517,136 May 14, 1996 Prof. V.G. Oklobdzija, University of California

33. CMOS Domino as Memory Element • After the input changes – output remembers it • Pre-charge destroys the information • Proper phasing of the clock can allow passing the information from stage to stage Prof. V.G. Oklobdzija, University of California

34. Skew-Tolerant Domino Prof. V.G. Oklobdzija, University of California

35. Synchronous World: Tricks and Solutions • Clocked Storage Elements with clock uncertainty absorption features • Time Borrowing • Incorporation of Synchronization features into the logic • Skew Tolerant Domino • Utilizing both edges of the Clock Prof. V.G. Oklobdzija, University of California

36. Dual-Edge Triggered CSE • DET-CSE samples the input data on both edges of the clock • Reducing power consumption • Half of the original clock frequency for the same data throughput • Half of clock generation/distribution/SE-clock-related power is saved • However, it may introduce an overhead Prof. V.G. Oklobdzija, University of California

37. Dual-Edge Triggered Storage Element Topologies • Structurally, there are two different designs • Latch-Mux (LM) • Flip-Flop (FF) DET-Flip-Flop Non-transparency achieved by MUX DET-Latch Prof. V.G. Oklobdzija, University of California

38. Comparison with Single Edge SEs Prof. V.G. Oklobdzija, University of California

39. Comparison with Single Edge CSEs Prof. V.G. Oklobdzija, University of California

40. Single and Double Edge Triggered SE: Power Consumption (a=50%) Prof. V.G. Oklobdzija, University of California

41. Fo4=2.9 Prof. V.G. Oklobdzija, University of California

42. Symmetric Pulse Generator Flip-Flop (SPG-FF) Nedovic, Oklobdzija, Walker, ESSCIRC 2002 Prof. V.G. Oklobdzija, University of California

43. Conclusion • Clocking is the next challenge. Current clocking techniques may hold up to 10 GHz. Afterwards the pipeline boundaries start to vanish while more exotic clocking techniques will find their use. Synchronous design will be possible only in limited domains on the chip. A mix of Synchronous and Asynchronous design may emerge even in digital logic. • Synchronous Design: • Has not exhausted all the tricks • Asynchronous Design: • Has not solved all the problems • We need solutions from both for a successful SoC Design Prof. V.G. Oklobdzija, University of California

More Related