1 / 23

Techniques for Multicore Thermal Management

Techniques for Multicore Thermal Management. Field Cady, Bin Fu and Kai Ren. Techniques for Multicore Thermal Management. Overview and comparison of techniques Plus determining the critical thread DVFS details Thread movement. Taxonomy. Stop & Go vs DVFS

inez
Download Presentation

Techniques for Multicore Thermal Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Techniques for Multicore Thermal Management Field Cady, Bin Fu and Kai Ren

  2. Techniques for Multicore Thermal Management • Overview and comparison of techniques • Plus determining the critical thread • DVFS details • Thread movement

  3. Taxonomy Stop & Go vs DVFS Stop & Go : suspend core operation for 30 millisecs when temperature above threshold DVFS : dynamic voltage and frequency scaling, from control theory Distributed vs Global Apply above to all cores or individually Performance asymmetry : different demands on different cores

  4. Taxonomy (cont.) • Migration • Moving threads between cores • Timescale on order of a millisecond, much slower than DVFS • Migration is “outer loop” or control, riding on top of DVFS or Stop-Go • Migrate “critical” thread • Measure criticality with heat sensor • Or with cache misses as a proxy

  5. Aside : Criticality In separate paper, Abhishek et. al. defines “critical” as slowest thread If we know which is critical: Task stealing from critical thread Guide DVFS to prefer critical thread Explored proxies 13-32% performance boost in task stealing on 32-core machine

  6. Criticality (cont.) Cache misses an excellent proxy

  7. Donald and Martonosi : comparison of techniques Goal : maximize performance subject to temperature constraint Measure performance in BIPS and “duty cycle”, i.e. % useful time, scaled for DVFS frequency Run on SPEC benchmarks Simulated 4-core processor

  8. Results All normalized to distributed Stop-Go

  9. Stop-Go was terrible! Why didn’t they try with lower frequency? Was 30 milliseconds the right time to stop? They subsequently focus solely on DVFS, even though the hardware is trickier

  10. Migration Policies

  11. Summary & Conclusion DVFS far superior to Stop-Go Distributed control helps, esp. for Stop-Go Migration helps for Stop-Go Counter and Sensor-based migration comparable

  12. DVFS • Dynamic voltage and frequency scaling (per core). • Dynamic voltage scaling is a power management technique in computer architecture, where the voltage used in a component is increased or decreased • Dynamic frequency scaling (also known as CPU throttling) is a technique in computer architecture where a processor is run at a less-than-maximum frequency in order to conserve power.

  13. Challenge • Multiple coresmay need to be manipulated simultaneously to control both power and temperature for a CMP chip. Require a Multi-Input-Multi-Output (MIMO) control • Application software is always designed for single-core processors. Power shifting needed. • Heterogeneous cores • Workload of a CMP processor is unpredictable at design time and may vary significantly at runtime

  14. DFVS

  15. Open-Loop Control P(k+1) = P (k) + A Δ f(k)

  16. Using Feedback (Close-loop) • Dynamically change matrix A.

  17. Thread Motion: Fine-Grained Power Management for Multi-Core Systems

  18. Limitations of DVFS Coarse grained Initiated by OS in milliseconds Voltage transition delay ~ 10 microseconds Too slow to respond finevariations in program behavior (Cache miss ~ nanoseconds) Per-core DVFS with multiple VF settings High cost of off-chip regulators Bad scalability with a large number of cores Motivation

  19. Idea of Thread Motion Moving threads between cores with two VF domains Threads experience virtually continuous Voltage Thread Motion

  20. TM Manager A separate embedded microcontroller running TM algorithm Effective IPC maintain a table of IPC for each application high IPC – compute-intensive low IPC –cache miss, memory access latency Thread Motion

  21. Movement Policy Assign a thread in a compute-intensive phase to a high VF core Intra-cluster movement considered first Trigger point: TM-interval : fixed intervals ~ 200 cycles Miss-driven : move a cache-missed thread Thread Motion: Algorithm

  22. Thread Motion Better Quality

More Related