issues in system on the chip clocking november n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA PowerPoint Presentation
Download Presentation
Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA

play fullscreen
1 / 43

Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA

0 Views Download Presentation
Download Presentation

Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Issues in System on the Chip ClockingNovember 6th, 2003SoC Design Conference, Seoul, KOREA Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory University of California Davis Presentation available at:http://www.ece.ucdavis.edu/acsel

  2. Directions in SoC Clocking • Synchronous / Asynchronous paradigm • Synchronous solutions: • Clock uncertainty absorption • Time borrowing • Skew-Tolerant Domino • Using both edges of the clock • Conclusion Prof. V.G. Oklobdzija, University of California

  3. ISSCC-2002 Clock frequency trends Prof. V.G. Oklobdzija, University of California

  4. Processor Frequency Trends Courtesy of: Intel, S. Borkar • Frequency doubles each generation • Number of gates/clock reduce by 25% Prof. V.G. Oklobdzija, University of California

  5. Multi-GHz Clocking Problems • Fewer logic in-between pipeline stages: • Out of 7-10 FO4 allocated delays, FF can take 2-4 FO4 • Clock uncertainty can take another FO4 • The total could be ½ of the time allowed for computation Prof. V.G. Oklobdzija, University of California

  6. Clock Uncertainties Prof. V.G. Oklobdzija, University of California

  7. Motivation for Improving on Clocked Storage Elements Example: In a 2.0 GHZ processor T=500pS • Typically clocked storage element D-Q delay is in the order of 100-150pS • If one can design a faster CSE: e.g. 80-100pS D-Q, this represents 10-15% performance improvement • If in addition one can absorb 20pS of clock uncertainties and embedd one level of logic – this can yield up to 20% performance improvement • Try to achieve 10-20% performance improvement by introducing new features in the architecture ! • This is sufficient to turn an architect into a circuit designer ! Prof. V.G. Oklobdzija, University of California

  8. Consequences of multi-GHz Clocks • Pipeline boundaries start to blur • Clocked Storage Elements must include logic • Wave pipelining, domino style, signals used to clock ….. • Synchronous design only in a limited domain • Asynchronous communication between synchronous domains Prof. V.G. Oklobdzija, University of California

  9. Synchronous / Asynchronous Design on the Chip • 1 Billion transistors on the chip by 2005-6 • 64-b, 4-way issue logic core requires ~2 Million Table 1: Transistor count in typical RISC processors Prof. V.G. Oklobdzija, University of California

  10. Synchronous / Asynchronous Design on the Chip 10 million transistors 1 Billion Transistors Chip Prof. V.G. Oklobdzija, University of California

  11. Two views of the world: - Asynchronous - Synchronous Prof. V.G. Oklobdzija, University of California

  12. Asynchronous Paradigm • Logic Stage can take any time it needs • Max. Speed limited by Handshake overhead • Increased complexity of logic (de-glitching) Prof. V.G. Oklobdzija, University of California

  13. Synchronous Paradigm • Max Speed determined by the slowest logic block • Latch / FF timing overhead • Fixed clock frequency (set by longest path) Prof. V.G. Oklobdzija, University of California

  14. Synchronous Paradigm • Clocked Storage Elements: Flip-Flops and Latches should be viewed as synchronization elements, not merely as storage elements ! • Their main purpose is to synchronize fast and slow paths: • prevent the fast path from corrupting the state Prof. V.G. Oklobdzija, University of California

  15. Synchronous World: Tricks and Solutions • Clocked Storage Elements with clock uncertainty absorption features • Time Borrowing • Incorporation of Synchronization features into the logic • Skew Tolerant Domino • Utilizing both edges of the Clock Prof. V.G. Oklobdzija, University of California

  16. Clocked Storage Element Overhead D Q Logic D Q N • The time taken from the pipeline by the CSE is U and Clk-Q delay. Thus, D-Q delay is relevant, not Clk-Q : T = TClk-Q + TLogic + U+ Tskew Clk Clk T TClk-Q TLogic U TD-Q=TClk-Q + U Tskew Prof. V.G. Oklobdzija, University of California

  17. 350 300 Minimum Data-Output 250 200 Clk-Output [ps] 150 Setup Hold 100 50 0 -200 -150 -100 -50 0 50 100 150 200 Data-Clk [ps] Delay vs. Setup/Hold Times Sampling Window Prof. V.G. Oklobdzija, University of California

  18. Prof. V.G. Oklobdzija, University of California

  19. Clock Uncertainty Absorption Prof. V.G. Oklobdzija, University of California

  20. Single-Ended Skew Tolerant Flip-Flop Nedovic, Oklobdzija, Walker, ISSCC 2003 Prof. V.G. Oklobdzija, University of California

  21. Clock Uncertainty Absrobtion Worst-case D DQ Nominal D D-Clk D Clock uncertainty t CU Early D D-Clk Late D D-Clk T =0 Nominal Clk Q D DQm D DQM Prof. V.G. Oklobdzija, University of California

  22. Clock Uncertainty Absorption t =30ps t =100ps CU CU Clk Clk U =-5ps Opt D D 3ps 44ps U =30ps Q Q Opt D =261ps D =220ps DQM DQM (b) t =100ps ( a =56% ) ( a ) t =30ps ( a =90% ) CU CU CU CU Prof. V.G. Oklobdzija, University of California

  23. Synchronous World: Tricks and Solutions • Clocked Storage Elements with clock uncertainty absorption features • Time Borrowing • Incorporation of Synchronization features into the logic • Skew Tolerant Domino • Utilizing both edges of the Clock Prof. V.G. Oklobdzija, University of California

  24. Time Borrowing Prof. V.G. Oklobdzija, University of California

  25. Prof. V.G. Oklobdzija, University of California

  26. Critical Path with Time Borrowing Prof. V.G. Oklobdzija, University of California

  27. Latches as synchronizers • The purpose of CSE it is to synchronize data flow. • We need to insert CSE to prevent “fast paths” from reaching the next logic stage too early. • If the signal arrives late – it is allowed to borrow time from the next stage • However, borrowing can not go for ever ….. Prof. V.G. Oklobdzija, University of California

  28. Using Single Pulsed Latch Prof. V.G. Oklobdzija, University of California

  29. Single Pulsed Latch *Courtesy of D. Markovic & Intel MRL Prof. V.G. Oklobdzija, University of California

  30. Optimal Single Latch Clocking Single Latch System (Unger & Tan ‘83): Pm=P ≥ DLM+DDQM {miminal clock period} DLm>DLmB≥W+TT+TL+H-DCQm {shortest path} Wopt=TL+TT+U+DCQM-DDQM {minimal clock width} Example: 0.10mTechnology FO4=25-40pS, FF=80pS, Tunc=25-35pS, fmax=2.5-4. GHz, T=250-400pS Wopt~2Tunc~50-70pS DLm~4Tunc+H-DCQm~100-140pS {this is close to ½ of a cycle} Prof. V.G. Oklobdzija, University of California

  31. Synchronous World: Tricks and Solutions • Clocked Storage Elements with clock uncertainty absorption features • Time Borrowing • Incorporation of Synchronization features into the logic • Skew Tolerant Domino • Utilizing both edges of the Clock Prof. V.G. Oklobdzija, University of California

  32. Skew-Tolerant Domino(a.k.a. Opportunistic Time Borrowing)Intel Patent No.5,517,136 May 14, 1996 Prof. V.G. Oklobdzija, University of California

  33. CMOS Domino as Memory Element • After the input changes – output remembers it • Pre-charge destroys the information • Proper phasing of the clock can allow passing the information from stage to stage Prof. V.G. Oklobdzija, University of California

  34. Skew-Tolerant Domino Prof. V.G. Oklobdzija, University of California

  35. Synchronous World: Tricks and Solutions • Clocked Storage Elements with clock uncertainty absorption features • Time Borrowing • Incorporation of Synchronization features into the logic • Skew Tolerant Domino • Utilizing both edges of the Clock Prof. V.G. Oklobdzija, University of California

  36. Dual-Edge Triggered CSE • DET-CSE samples the input data on both edges of the clock • Reducing power consumption • Half of the original clock frequency for the same data throughput • Half of clock generation/distribution/SE-clock-related power is saved • However, it may introduce an overhead Prof. V.G. Oklobdzija, University of California

  37. Dual-Edge Triggered Storage Element Topologies • Structurally, there are two different designs • Latch-Mux (LM) • Flip-Flop (FF) DET-Flip-Flop Non-transparency achieved by MUX DET-Latch Prof. V.G. Oklobdzija, University of California

  38. Comparison with Single Edge SEs Prof. V.G. Oklobdzija, University of California

  39. Comparison with Single Edge CSEs Prof. V.G. Oklobdzija, University of California

  40. Single and Double Edge Triggered SE: Power Consumption (a=50%) Prof. V.G. Oklobdzija, University of California

  41. Fo4=2.9 Prof. V.G. Oklobdzija, University of California

  42. Symmetric Pulse Generator Flip-Flop (SPG-FF) Nedovic, Oklobdzija, Walker, ESSCIRC 2002 Prof. V.G. Oklobdzija, University of California

  43. Conclusion • Clocking is the next challenge. Current clocking techniques may hold up to 10 GHz. Afterwards the pipeline boundaries start to vanish while more exotic clocking techniques will find their use. Synchronous design will be possible only in limited domains on the chip. A mix of Synchronous and Asynchronous design may emerge even in digital logic. • Synchronous Design: • Has not exhausted all the tricks • Asynchronous Design: • Has not solved all the problems • We need solutions from both for a successful SoC Design Prof. V.G. Oklobdzija, University of California