1 / 22

Jun Ma, Guihai Yan, Yinhe Han and Xiaowei Li State Key Laboratory of Computer Architecture

Am phi sbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance. Jun Ma, Guihai Yan, Yinhe Han and Xiaowei Li State Key Laboratory of Computer Architecture Institute of Computing Technology, C.A.S.

shadi
Download Presentation

Jun Ma, Guihai Yan, Yinhe Han and Xiaowei Li State Key Laboratory of Computer Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-coresan analytical performance model for boosting performance Jun Ma, Guihai Yan, YinheHan and XiaoweiLi State Key Laboratory of Computer Architecture Institute of Computing Technology, C.A.S. Univ. of Chinese Academy of Sciences

  2. Trends in Cloud Computing • The increasing computing demands • More massive • More diverse • High service level agreement(response time, throughput) • The computing platform to meet these demands • Multicore to manycore • Homogeneous to heterogeneous

  3. Two Orthogonal Ways to Boost Performance • Scale-out speedup: explore many cores for higher thread-level parallelism • Scale-up speedup: explore heterogeneous cores for optimal application-core mapping

  4. Quantifying Scale-out and Scale-up Speedup • The overall performance Indicate how to improve overall performance of each application. How to figure out the application-specific scale-out and scale-up speedup?

  5. Amphisbaena: an Analytical Approach to Model Performance • Amphisbaena, or shortly, • Modeling the overall performance speedup coming from two orthogonal ways I’m I’m The ratio of performance on target multithreading configuration to current configuration on the same type of cores. The ratio of performance on target cores to current cores under the same multithreading configuration.

  6. Experimental Setup

  7. Scale-out Speedup • the serial part. • the parallelizable part. • the multithreading penalty.

  8. Observation • modulating constant. • synchronization waiting cycles per kilo-instructions(SPKI). • thread number. • modulating constant. • misses waiting cycles per kilo-instructions(MPKI). • thread number squared.

  9. The Details of Multithreading Penalty offline online

  10. Alpha Model Accuracy Our error is under 5% on average, which outperforms the error of Amdahl’s Law with error of 11.4%.

  11. Scale-up Speedup How to predict the CPI on various type of cores? C0 C1 S B S B C2 C3 S S B B

  12. Observation • this trend is well approximated by a power law. • this trend fits an exponential function well.

  13. The Details of CPI Model • memory intensity. • computing intensity. • bias. offline online online

  14. Beta Model Accuracy Our error is kept below 8% on average, which outperforms the error of PIE with error of 12.2%.

  15. Phi Model Accuracy The prediction error of overall performance is kept below 12% on average.

  16. Orthogonality Validation • three measured values. For most applications, the error about orthogonality is below 5% on average.

  17. Application of Phi Model • Using Phi for runtime management

  18. Phi Scheduling function “decide the cores to map for each application.” “decide the thread number to spawn for each application.” “Phi scheduling use the heuristic algorithm to maximize performance.” policy “application with largest scale-up speedup is allocated with the fastest type of cores.” “application with higher scale-out speedup should spawn more thread.” algorithm

  19. Performance Comparison Phi averagely outperforms the other three baselines by 12.2% (Static), 13.3% (Bias) and 12.9% (PIE).

  20. Related Works • Performance prediction and optimization periodically • Only decided the number of threads/active cores • CPR: Composable Performance Regression for Scalable Multiprocessor • [Benjamin C. Lee etc. MICRO2008] • FDT: Feedback-Driven Threading Power-Efficient and High-Performance Execution of Multi-threaded Workloads on CMPs • [M. Aater Suleman etc. ASPLOS2008] • Only decided the type of heterogeneous cores • Single-ISA Heterogeneous Multi-core Architectures for Multithreaded Workload Performance • [Rakesh Kumar etc. ISCA2004] • Scheduling Heterogeneous Multi-cores Through Performance Impact Estimation (PIE) • [Kenzo Van Craeynest etc. ISCA2012]

  21. Conclusion • Analytical model for performance prediction • Scale-out speedup • Scale-up speedup • Overall performance • Phi scheduling • Apply for runtime management • Return optimal performance

  22. Thanks for Your Attention Q&A

More Related