1 / 19

Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors

Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors. Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim 2,3. 1. 2. 3. Single-chip heterogeneous processors. Compared to systems based on discrete components Lower communication overhead

katy
Download Presentation

Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon1,2 Jae Young Jang2 Jae W. Lee2 Nam Sung Kim2,3 1 2 3

  2. Single-chip heterogeneous processors • Compared to systems based on discrete components • Lower communication overhead • Lower power consumption • Lower cost (less silicon) • Emerging application friendly (sequential + parallel processing) Samsung’s Exynos Intel’s Sandy Bridge AMD’s Llano Sources: AMD, Intel, and Samsung

  3. Challenges • SCHP’s performance: limited by power budget • Total chip power budget • CPU/GPU power budget • Multiprogrammed workload • Workload-aware power allocation • Considering characteristics and metrics How can optimize overall performance within limited power budget?

  4. Outline • Motivation • Target platform: SCHP + MW • Workload-aware power allocation • Characteristics of programs • Evaluation Metrics • Methodology • Power configuration • Benchmark programs • Evaluation • Algorithm • Conclusion

  5. Target platform: SCHP + MW • 4-core CPU + 16-SM GPU • Multiple V/F domains  DVFS • 2 programs running • Hardware resources evenly divided Multiprogrammed Workload GPU0 V/F domain CPU V/F domain (per-core) GPU0 Program 1 CPU Core0 CPU Core1 GPU1 V/F domain CPU Core2 CPU Core3 Program 2 GPU1 Memory Controllers MCs V/F domain

  6. Workload-aware power allocation • Characteristics of programs • Non-uniform performance sensitivities • Evaluation metrics • Throughput vs. Energy efficiency Allocating more power to mri-q Normalized throughput Power allocation (using the same HW)

  7. Outline • Motivation • Target platform: SCHP + MW • Workload-aware power allocation • Characteristics of programs • Evaluation Metrics • Methodology • Power configuration • Benchmark programs • Evaluation • Algorithm • Conclusion

  8. Methodology: shared power budget Output Energy Efficiency Throughput Power Configuration 22.4 34.2 34.2 22.4 46.4 46.4 24.8 24.8 16.8 16.8 31.2 31.2 41.6 62.8 62.8 41.6 11.2 11.2 17.4 17.4 • Can change the power budget for CPU 1 CPU 2 GPU 1 GPU 2 • Total chip power budget = 100 W • CPU power budget = 80 W • GPU power budget = 64 W • Baseline configuration • Evenly divided (25 W for each CPU/GPU group)

  9. Methodology: benchmark programs • Used 6 benchmark programs. • Divided into 3 groups depending on characteristics

  10. Outline • Motivation • Target platform: SCHP + MW • Workload-aware power allocation • Characteristics of programs • Evaluation Metrics • Methodology • Power configuration • Benchmark programs • Evaluation • Algorithm • Conclusion

  11. Evaluation: case study 1 (compute- vs. memory-bound) 19% throughput improvement 32% energy efficiency improvement • Allocating more power to compute-bound • Optimal points vary depending on metrics.

  12. Evaluation: case study 2 (memory- vs. memory-bound) 10% throughput improvement 32% energy efficiency improvement • Equally allocated power • Again, optimal point depends on • Evaluation metric • Workload characteristics (compute- or memory-bound)

  13. Evaluation: variation of optimal configuration • Depending on programs’ characteristics and evaluation metrics

  14. Evaluation: performance improvement from optimal power allocation • Achieved significant improvement • 12% for throughput • 18% for energy efficiency

  15. Algorithm for throughput maximization calculate (slope) wait(regular_time) abs(sp1-sp2) < threshold YES alloc(equally) Normalized throughput NO sp1 > sp2 YES alloc(p1_more) Power allocation NO alloc(p2_more)

  16. Algorithm for energy efficiency maximization final = min_power • Gradient search from the minimum power allocation MAX = max( EE(final), EE(final, p1++), EE(final, p2++) ) EE(final) == MAX exit EE(final, p1++) > EE(final, p2++) final = (final, p2++) final = (final, p1++)

  17. Conclusion • We propose a solution for optimal power allocation • Workload-aware power allocation • By using program characteristics and evaluation metrics • Significant performance improvement achieved • 12% for throughput • 18% for energy efficiency • Run-time algorithms effectively find (near-)optimal power allocation

  18. Backup slides

  19. Simulator • Integrated CPU + GPU simulator • H. Wang, V. Sathish, R. Singh, M. Schulte and N. Kim, "Workload and Power Budget Partitioning for Single-Chip Heterogeneous Processors," in PACT, 2012. • http://cpu-gpu-sim.ece.wisc.edu/ • gem5 + GPGPU-Sim • Adaptive power allocation for multiprogrammed workload • Per-core V/F domains for CPU • 2 V/F domains for GPU

More Related