1 / 24

Energy Efficient Designs with Wide Dynamic Range

Energy Efficient Designs with Wide Dynamic Range. Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs. Thermal. V-F. Load. Cores. Workload. Power. Ambient. Active. High Wide. Throughput Parallel HPC-RAS. Server. Controlled Cool.

ranee
Download Presentation

Energy Efficient Designs with Wide Dynamic Range

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

  2. Thermal V-F Load Cores Workload Power Ambient Active High Wide Throughput Parallel HPC-RAS Server Controlled Cool Med/High Wide High Fan (-less) Burst Serial-Parallel Stream Graphics Med-High Wide Controlled Varying Med/High Wide Desktop Med Fan (-less) Med-High Wide Burst Serial-Parallel Stream Graphics Uncontrolled Extreme Med/High Wide Mobile Low Burst Serial-Parallel Stream Graphics Low-Med Wide Very low Passive Fan-less Low/Med Wide Uncontrolled Extreme MID SOC Platform segment characteristics Few designs must serve all segments

  3. Dynamic platform control Workload-based core activation & shutdown Dynamic V/F control Independent V/F control regions Scenario-based power allocation Maximize performance & efficiency Deliver best user experience under constraints

  4. Processor Voltage Control Change V Frequency Control Change F Dynamic Control Unit Reconfigure Configuration Control Sensors Thermal Sensor Voltage Sensor Current Sensor Aging Sensor Dynamic adaptation & reconfiguration Adapt & reconfigure for best power-performance

  5. Resiliency framework Applications Programming System OS VM Less recovery overhead Firmware Lower error rate Less silicon overhead Microcode Microarchitecture Circuit & Design Resilient platforms Error detection Fault diagnosis Fault confinement Error correction Resilient platform features System recovery System adaptation System reconfiguration Resiliency for performance, efficiency & reliability

  6. Fine-grain power management TFLOPS at 62 watts Temporal domain Slow voltage & frequency change Coarse-grain management TODAY • V/F change latency • Active-sleep transition latency • Wake-up latency • Wake-up prediction • State transition energy • Supply noise Fast voltage & frequency change Fine-grain management FUTURE

  7. Fine-grain power management Spatial domain Fine-grain management Coarse-grain management • Each core/cluster at optimum voltage • Each core/cluster at optimum frequency • Same voltage to all cores • Same frequency for all cores • V/F domain interfaces • Synchronization overhead • Clock generation/distribution • Power grid routing • Optimum V/F for non-cores • Sub-core clock/leakage gating FUTURE TODAY

  8. Control Distribution Conversion • Fast & efficient • Load adaptive • Independent rails • Lower loss • Higher fidelity • Simpler • Area efficient • Scalable • Persistent rail Low loss Fine-grain Efficient Advanced voltage regulators VR innovations for fine-grain power management

  9. Research testchip 12.64mm I/O Area single tile 1.5mm 2.0mm 21.72mm PLL PLL TAP TAP I/O Area I/O Area

  10. FP Engine 1 FP Engine 1 Sleeping: 90% lesspower Data Memory Data Memory Sleeping:57% less power Instruction Memory Instruction Memory Sleeping:56% less power FP Engine 2 FP Engine 2 Sleeping: 90% lesspower Router Router Sleeping: 10% less power (stays on to pass traffic) Many-core power management 21 sleep regions per tile (not all shown) • Dynamic sleep • STANDBY: • Memory retains data • 50% less power/tile • FULL SLEEP: • Memories fully off • 80% less power/tile Energy efficiency of 19.4 Gflops/Watt

  11. 256 KB SRAM per core 4X C4 bump density 8490 thru-silicon vias Processor with Cu bump Processor Memory Thru-SiliconVia Package Memory integration 3D stacking with thru-silicon vias

  12. Voltage-frequency range limiters Vmax/Fmax limiters • Reliability • Thermals • Power delivery Vmax Voltage Fmax Vmin limiters Vmin • Circuit functional failures • Soft errors • Steep frequency roll-off • Aging Frequency Reliability & functional failures limit range

  13. V variation T variation F margin F margin V margin V margin IR drop Voltage Voltage Inductive droops Nominal T Load line variations Worst T Frequency Frequency MIS Path activity Aging Signal coupling Worst Critical path F margin V margin F margin EOL V margin Voltage Voltage Nominal BOL Frequency Frequency Voltage-frequency margins

  14. Array Vmin 6T SRAM Vmin Multi-V 6T SRAM SER, erratic bits Cumulative fail rate Nominal array Vmin Worst die Vmin 8T+ cell LLC density Density Array voltage Multi-voltage cache

  15. 6T SRAM cell PPU NX NPD Dynamic multi-voltage cache Dynamic voltage collapse for write Wordline underdrive for read Array to WL differential supply noise tracking Pulse width control 45nm dynamic multi-V testchip MIN cell Source: Intel 26X less fails

  16. Cache reconfiguration Reduce cache size @ low V/F by eliminating failing words/bits Bit fix Word disable Failing words Bitmap of failing words Source: Intel 1-bit ECC Word disable Bit fix 10-bit ECC Source: Intel * Normalized reference value

  17. Impact max performance Efficiency: MIPS/Watt Efficiency: MIPS/Watt Improve range Improve range & efficiency Performance Performance Low-voltage logic design Narrow muxes Robust flip-flops No stack height > 2 Robust level converters Design & technology optimizations to balance range, performance & efficiency

  18. Low-voltage motion estimation engine 65nm CMOS 70K transistors Die area ~1mm2

  19. Dynamic V & F adaptation Source: Intel Source: Intel Prototype chip in 90nm Environment-aware dynamic adaptation • Adapt F/V to V/T change  reduce V/T margin • Adapt F/V to aging  reduce aging margin

  20. Resilient circuits Error Detection Sequential (EDS) • Detect errors in critical path FFs • Propagate error signals • Correct errors by re-execution • Feedback to adaptive V/F 65nm resilient circuits testchip

  21. Resiliency experiments Response to voltage droops Source: Intel Source: Intel

  22. Summary • Energy efficiency and wide dynamic operating range are critical for all platforms • Integration, fine-grain power management, advanced voltage regulators & 3D memory stacking are key for energy efficiency • Reliability, functionality, margins & efficiency limit dynamic operating range • Multi-voltage design, dynamic adaptation, reconfiguration & resiliency are key enablers

  23. Acknowledgement CRL prototype Bangalore Design LTD Design Murli Tirumala Tomm Aldridge Mark Anders Paolo Aseron Ravi Mahajan Jerry Bautista Nitin Borkar Shekhar Borkar Ravi Prasher Keith Bowman Saurabh Dighe Zeshan Chishti Venkat Natarajan Lev Finkelstein Shih-Lien Lu Varghese George Anand Deshpande Marci Glenn George Goodman Steve Gunther Ali Farhang Matthew Haycock Chris Wilkerson Ming Zhang Amit Agarwal Andrew Henroid Yatin Hoskote Jason Howard Robert Chau Steven Hsu Tanay Karnik Himanshu Kaul Rajesh Kumar Alaa Alameldeen M. Khellah V. Erraguntla Ketan Paranjape Sean Koehl R. Krishnamurthy Partha Kundu Greg Taylor Sanu Mathew Randy Mooney A. Raychowdhury P. Vishakantaiah Alon Naveh Trang Nguyen Fabrice Paillet R. Kuppuswamy Clark Roberts Ronny Ronen Erfaim Rotem Rohit Vidwans Mark Rowland Greg Ruhl Gerhard Schrom Gautam Doshi Joe Schutz D. Somasekhar James Tschanz Sunit Tyagi Sriram Vangal Manny Vara Howard Wilson B. Chatterjee Bibiche Geuskens Steven Hsu Kevin Zhang Sati Banerjee Clair Webb Mark Bohr Gunjan Pandya

  24. Q&A

More Related