1 / 25

VLSI Design Challenges for Gigascale Integration

VLSI Design Challenges for Gigascale Integration. Shekhar Borkar Intel Corp. October 25, 2005. Outline. Technology scaling challenges Circuit and design solutions Microarchitecture advances Multi-everywhere Summary. How do you get there?. Goal: 10 TIPS by 2015.

louis
Download Presentation

VLSI Design Challenges for Gigascale Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005

  2. Outline • Technology scaling challenges • Circuit and design solutions • Microarchitecture advances • Multi-everywhere • Summary

  3. How do you get there? Goal: 10 TIPS by 2015 Pentium® 4 Architecture Pentium® Pro Architecture Pentium® Architecture 486 386 286 8086

  4. GATE DRAIN SOURCE BODY Technology Scaling GATE Xj DRAIN SOURCE D Tox BODY Leff Scaling will continue, but with challenges!

  5. Technology Outlook

  6. 90nm MOS Transistor Gate 1.2 nm SiO2 Si 50nm The Leakage(s)…

  7. Technology, Circuits, and Architecture to constrain the power Must Fit in Power Envelope ) 1400 2 SiO2 Lkg 10 mm Die 1200 SD Lkg Active 1000 800 Power (W), Power Density (W/cm 600 400 200 0 90nm 65nm 45nm 32nm 22nm 16nm

  8. Solutions • Move away from Frequency alone to deliver performance • More on-die memory • Multi-everywhere • Multi-threading • Chip level multi-processing • Throughput oriented designs • Valued performance by higher level of integration • Monolithic & Polylithic

  9. Planar Transistor Gate electrode Tri-gate Transistor Gate 3.0nm High-k 1.2 nm SiO2 Silicon substrate Silicon substrate Leakage Solutions For a few generations, then what?

  10. Slow Fast Slow High Supply Voltage Multiple Supply Voltages Low Supply Voltage Throughput Oriented Designs Vdd Vdd/2 Freq = 0.5 Vdd = 0.5 Throughput = 1 Power = 0.25 Area = 2 Pwr Den = 0.125 Freq = 1 Vdd = 1 Throughput = 1 Power = 1 Area = 1 Pwr Den = 1 Logic Block Logic Block Logic Block Active Power Reduction

  11. Body Bias Stack Effect Sleep Transistor Vbp Vdd +Ve Logic Block Equal Loading Vbn -Ve 5-10X Reduction 2-1000X Reduction 2-10X Reduction Leakage Control

  12. Optimum 10 10 Sub-threshold Leakage increases exponentially 8 8 Power 6 6 Efficiency 4 4 2 2 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Relative Frequency Relative Pipeline Depth Process Technology Pipeline Depth 10 Performance 8 6 Diminishing Return 4 2 0 1 2 3 4 5 6 7 8 9 10 Relative Frequency (Pipelining) Pipeline & Performance Optimum Frequency • Maximum performance with • Optimum pipeline depth • Optimum frequency

  13. Memory Latency CPU Cache Memory Small ~few Clocks Large 50-100ns Assume: 50ns Memory latency Cache miss hurts performance Worse at higher frequency

  14. Increase on-die Memory Large on die memory provides: Increased Data Bandwidth & Reduced Latency Hence, higher performance for much lower power

  15. Multi-threading Thermals & Power Delivery designed for full HW utilization Single Thread Full HW Utilization Wait for Mem ST Multi-Threading Wait for Mem MT1 Wait MT2 MT3 Multi-threading improves performance without impacting thermals & power delivery

  16. Single Core Power/Performance Moore’s Law  more transistors for advanced architectures Delivers higher peak performance But… Lower power efficiency

  17. Chip Multi-Processing C1 C2 Cache C3 C4 • Multi-core, each core Multi-threaded • Shared cache and front side bus • Each core has different Vdd & Freq • Core hopping to spread hot spots • Lower junction temperature

  18. Cache Cache Core Core Core Dual Core Rule of thumb In the same process technology… Voltage = 1 Freq = 1 Area = 1 Power = 1 Perf = 1 Voltage = -15% Freq = -15% Area = 2 Power = 1 Perf = ~1.8

  19. Cache Large Core Small Core C1 C2 Cache C3 C4 Multi-Core Power Power = 1/4 4 Performance Performance = 1/2 3 2 2 1 1 1 1 4 4 Multi-Core: Power efficient Better power and thermal management 3 3 2 2 1 1

  20. Special Purpose Hardware TCP/IP Offload Engine 2.23 mm X 3.54 mm, 260K transistors Opportunities: Network processing engines MPEG Encode/Decode engines, Speech engines Special purpose HW provides best Mips/Watt

  21. Performance Scaling Amdahl’s Law: Parallel Speedup = 1/(Serial% + (1-Serial%)/N) Serial% = 6.7% N = 16, N1/2 = 8 16 Cores, Perf = 8 Serial% = 20% N = 6, N1/2 = 3 6 Cores, Perf = 3 Parallel software key to Multi-core success

  22. 144 Cores 12 Cores 24 Cores From Multi to Many… 13mm, 100W, 48MB Cache, 4B Transistors, in 22nm

  23. General Purpose Cores GP GP GP C GP C C C GP SP GP C SP C C C Special Purpose HW C C C C SP GP GP SP Interconnect fabric C C C C GP GP GP GP Future Multi-core Platform Heterogeneous Multi-Core Platform

  24. Multi-Threaded, Multi-Core Multi Threaded Era of Thread & Processor Level Parallelism Special Purpose HW Speculative, OOO Super Scalar 486 386 Era of Instruction Level Parallelism 286 8086 Era of Pipelined Architecture The New Era of Computing Multi-everywhere: MT, CMP

  25. Summary • Business as usual is not an option • Performance at any cost is history • Must make a Right Hand Turn (RHT) • Move away from frequency alone • Future mArchitectures and designs • More memory (larger caches) • Multi-threading • Multi-processing • Special purpose hardware • Valued performance with higher integration

More Related