1 / 32

Understanding Performance, Power and Energy Behavior in Asymmetric Processors

Understanding Performance, Power and Energy Behavior in Asymmetric Processors. Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute of Technology. Outline. Background and Motivation Thread Interactions Dynamic Scheduling Asymmetry Aware Scheduling

rosa
Download Presentation

Understanding Performance, Power and Energy Behavior in Asymmetric Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute of Technology

  2. Outline • Background and Motivation • Thread Interactions • Dynamic Scheduling • Asymmetry Aware Scheduling • Conclusion and Future Work

  3. PEB PEB PEA Interconnect PEB PEB Heterogeneous Architectures • A particularly interesting class of parallel machines is Heterogeneous Architectures • Multiple types of Processing Elements (PEs) available on the same machine

  4. Special Accelerator IBM Cell processor Heterogeneous Architectures • Heterogeneous architectures are becoming very common Focus of this talk Fast core Fast core Slow core Slow core Slow core Slow core Asymmetric Processors

  5. Machine configurations • M-I experiments have 8 threads, M-II experiments have 16 threads • AMPs emulated using SpeedStep/PowerNow

  6. Power Measurement Using Extech 380801 Power Analyzer Total system power consumption Power Socket Windows Machine Experiment Machine Power Cable Serial Cable 6

  7. PARSEC Benchmark Suite • Desktop-oriented multithreaded benchmark suite • Multithreaded • Animation, Data Mining, Financial Analysis • Pthreads, OpenMP

  8. Performance of PARSEC benchmarks Execution Time slow-limited middle-perf unstable • On average, performance of half-half is between that of all-slow and all-fast

  9. barrier barrier Classification of Benchmarks barrier (b) middle-perf (c) unstable (a) slow-limited

  10. Energy Consumption of PARSEC Energy consumption slow-limited middle-perf • In half-half/all-slow, total energy consumption is higher even though average power consumed might be lower

  11. Behavior of Parsec Benchmarks • Observations –Different applications behave differently on AMPs –Usually SMP with fast processors saves energy

  12. Why do different applications behave differently on AMPs?

  13. Outline • Background and Motivation • Thread Interactions • Dynamic Scheduling • Asymmetry Aware Scheduling • Conclusion and Future Work

  14. Thread Interactions Sources of thread interactions • Critical Sections • Barriers

  15. Critical Sections (CS) • Waiting to enter CSs Case (a) Case (b) Critical section Useful work Waiting

  16. barrier Barriers • Waiting for other threads to finish barrier

  17. Effect of Critical Section length • CS limited application Normalized Power Consumption • As critical section length increases, the average power consumed decreases

  18. Effect of Critical Section length • CS limited application Normalized Execution Time

  19. Effect of Critical Section length • CS limited application Normalized Execution Time • Performance of AMPs sensitive to CS length 19

  20. Effect of Critical Section length • CS limited application Normalized Energy Consumption • Energy consumption shows the same trend 20

  21. Effect of Critical Section frequency • Both length and frequency of CS affect performance and energy consumption • As frequency increases, performance difference between half-half and all-fast reduces • If majority of the execution time is spent waiting for locks, it is OK to have a few slow processors • Results available in the paper

  22. Effect of Barriers • For few barriers, half-half performs similar to all-slow • For large number of barriers, half-half performs similar to all-fast • Results available in the paper

  23. Outline • Background and Motivation • Thread Interactions • Dynamic Scheduling • Asymmetry Aware Scheduling • Conclusion and Future Work

  24. Dynamic Scheduling • Motivation: better run-time adaptivity • Each thread requests for more work after completing the assigned work • OpenMP, Intel Thread Building Blocks

  25. Dynamic Scheduling • Can help improve performance and reduce energy consumption in AMPs • Should be preferred to static and guided policies • Parallel-for application

  26. Outline • Background and Motivation • Thread Interactions • Dynamic Scheduling • Asymmetry Aware Scheduling • Conclusion and Future Work

  27. Slow core Fast core Slow core Fast core barrier Scheduling in AMPs • Longest Job to a Fast Processor First (LJFPF) [Lakshminarayana’08]

  28. How Does the Scheduler Know • Length of work? • Current mechanism: application sends task length information • On-going work: Prediction mechanism

  29. LJFPF • ITK: Medical image processing applications (OpenSource) • MultiRegistration (Registration method) • kernel with 50 iterations • 50 iterations divided among 8 threads Normalized Execution Time Normalized Energy Consumption

  30. Outline • Background and Motivation • Thread Interactions • Dynamic Scheduling • Asymmetry Aware Scheduling • Conclusion and Future Work

  31. Conclusion & Future Work Conclusion • Evaluated the performance/energy consumption behavior of multithreaded applications in AMPs • For symmetric workloads • With little thread interaction: SMP with fast processors • With a lot of thread interaction: AMP could be better • For asymmetric threads – AMP could provide lowest energy consumption Future Work • Predict application characteristics and use predicted information for thread scheduling on AMPs

  32. Thank you!

More Related