1 / 29

Thousand Core Chips A Technology Perspective

Thousand Core Chips A Technology Perspective. Shekhar Borkar Intel Corp. June 7, 2007. Outline. Technology outlook Evolution of Multi—thousands of cores? How do you feed thousands of cores Future challenges: variations and reliability Resiliency Summary. Technology Outlook.

nonnie
Download Presentation

Thousand Core Chips A Technology Perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thousand Core ChipsA Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007

  2. Outline • Technology outlook • Evolution of Multi—thousands of cores? • How do you feed thousands of cores • Future challenges: variations and reliability • Resiliency • Summary

  3. Technology Outlook

  4. Terascale Integration Capacity Total Transistors, 300mm2 die ~100MB Cache ~1.5B Logic Transistors 100+B Transistor integration capacity

  5. 300mm2 Die Scaling Projections Freq scaling will slow down Vdd scaling will slow down Power will be too high

  6. Why Multi-core? –Performance Ever increasing single cores yield diminishing performance in a power envelope Multi-cores provide potential for near-linear performance speedup

  7. Cache Cache Core Core Core Why Dual-core? –Power Rule of thumb In the same process technology… Voltage = 1 Freq = 1 Area = 1 Power = 1 Perf = 1 Voltage = -15% Freq = -15% Area = 2 Power = 1 Perf = ~1.8

  8. Cache Large Core Small Core C1 C2 Cache C3 C4 From Dual to Multi— Power Power = 1/4 4 Performance Performance = 1/2 3 2 2 1 1 1 1 4 4 Multi-Core: Power efficient Better power and thermal management 3 3 2 2 1 1

  9. General Purpose Cores GP GP GP C GP C C C GP SP GP C SP C C C Special Purpose HW C C C C SP GP GP SP Interconnect fabric C C C C GP GP GP GP Future Multi-core Platform Heterogeneous Multi-Core Platform—SOC

  10. Vdd 0.7xVdd Cores with critical tasks Freq = f, at Vdd TPT = 1, Power = 1 f f/2 0 f Non-critical cores Freq = f/2, at 0.7xVdd TPT = 0.5, Power = 0.25 f/2 0 f f/2 0 f f/2 0 f f/2 0 f Cores shut down TPT = 0, Power = 0 Fine Grain Power Management

  11. Performance Scaling Amdahl’s Law: Parallel Speedup = 1/(Serial% + (1-Serial%)/N) Serial% = 6.7% N = 16, N1/2 = 8 16 Cores, Perf = 8 Serial% = 20% N = 6, N1/2 = 3 6 Cores, Perf = 3 Parallel software key to Multi-core success

  12. 144 Cores 12 Cores 48 Cores From Multi to Many… 13mm, 100W, 48MB Cache, 4B Transistors, in 22nm

  13. 288 Cores 24 Cores 96 Cores From Many to Too Many… 13mm, 100W, 96MB Cache, 8B Transistors, in 16nm

  14. On Die Network Power 300mm2 Die • A careful balance of: • Throughput performance • Single thread performance (core size) • Core and network power

  15. Observations • Scaling Multi— demands more parallelism every generation • Thread level, task level, application level • Many (or too many) cores does not always mean… • The highest performance • The highest MIPS/Watt • The lowest power • If on-die network power is significant, then power is even worse Now software, too, must follow Moore’s Law

  16. Memory BW Gap Busses have become wider to deliver necessary memory BW (10 to 30 GB/sec) Yet, memory BW is not enough Many Core System will demand 100 GB/sec memory BW How do you feed the beast?

  17. IO Pins and Power State of the art: 100 GB/sec ~ 1 Tb/sec = 1,000 Gb/sec  25mw/Gb/sec = 25 Watts Bus-width = 1,000/5 = 200, about 400 pins (differential) Too many signal pins, too much power

  18. High speed busses Busses are transmission lines L-R-C effects Need signal termination Signal processing consumes power > 5mm Chip Chip Bus <2mm Solutions: Reduce distance to << 5mm R-C bus Reduce signaling speed (~1Gb/sec) Increase pins to deliver BW 1-2 mw/Gbps Chip Chip Solution 100 GB/sec ~ 1 Tb/sec = 1,000 Gb/sec  2mw/Gb/sec = 2 Watts Bus-width = 1,000/1 = 1,000 pins

  19. Heat-sink Heat Si Chip Power Signals Package Anatomy of a Silicon Chip

  20. Si Chip Si Chip Package System in a Package Limited pins: 10mm / 50 micron = 200 pins Limited pins Signal distance is large ~10 mm – higher power Complex package

  21. Heat-sink Temp = 85°C High temp, hot spots Not good for DRAM Junction Temp = 100+°C CPU DRAM Package DRAM on Top

  22. Heat-sink DRAM CPU Package DRAM at the Bottom Power and IO signals go through DRAM to CPU Thin DRAM die Through DRAM vias The most promising solution to feed the beast

  23. Wider Extreme device variations Soft Error FIT/Chip (Logic & Mem) Burn-in may phase out…? Time dependent device degradation Reliability

  24. Implications to Reliability • Extreme variations (Static & Dynamic) will result in unreliable components • Impossible to design reliable system as we know today • Transient errors (Soft Errors) • Gradual errors (Variations) • Time dependent (Degradation) Reliable systems with unreliable components —Resilient mArchitectures

  25. Implications to Test • One-time-factory testing will be out • Burn-in to catch chip infant-mortality will not be practical • Test HW will be part of the design • Dynamically self-test, detect errors, reconfigure, & adapt

  26. 100 Billion Transistors 100 BT integration capacity Billions unusable (variations) Some will fail over time Intermittent failures In a Nut-shell… Yet, deliver high performance in the power & cost envelope

  27. C C C C C C C C C C C C C C C C Resiliency with Many-Core • Dynamic on-chip testing • Performance profiling • Cores in reserve (spares) • Binning strategy • Dynamic, fine grain, performance and power management • Coarse-grain redundancy checking • Dynamic error detection & reconfiguration • Decommission aging cores, swap with spares • Dynamically… • Self test & detect • Isolate errors • Confine • Reconfigure, and • Adapt

  28. Summary • Moore’s Law with Terascale integration capacity will allow integration of thousands of cores • Power continues to be the challenge • On-die network power could be significant • Optimize for power with the size of the core and the number of cores • 3D Memory technology needed to feed the beast • Many-cores will deliver the highest performance in the power envelope with resiliency

More Related