Jun Ma, Guihai Yan, Yinhe Han and Xiaowei Li State Key Laboratory of Computer Architecture

Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-coresan analytical performance model for boosting performance Jun Ma, Guihai Yan, YinheHan and XiaoweiLi State Key Laboratory of Computer Architecture Institute of Computing Technology, C.A.S. Univ. of Chinese Academy of Sciences

Trends in Cloud Computing • The increasing computing demands • More massive • More diverse • High service level agreement(response time, throughput) • The computing platform to meet these demands • Multicore to manycore • Homogeneous to heterogeneous

Two Orthogonal Ways to Boost Performance • Scale-out speedup: explore many cores for higher thread-level parallelism • Scale-up speedup: explore heterogeneous cores for optimal application-core mapping

Quantifying Scale-out and Scale-up Speedup • The overall performance Indicate how to improve overall performance of each application. How to figure out the application-specific scale-out and scale-up speedup?

Amphisbaena: an Analytical Approach to Model Performance • Amphisbaena, or shortly, • Modeling the overall performance speedup coming from two orthogonal ways I’m I’m The ratio of performance on target multithreading configuration to current configuration on the same type of cores. The ratio of performance on target cores to current cores under the same multithreading configuration.

Experimental Setup

Scale-out Speedup • the serial part. • the parallelizable part. • the multithreading penalty.

Observation • modulating constant. • synchronization waiting cycles per kilo-instructions(SPKI). • thread number. • modulating constant. • misses waiting cycles per kilo-instructions(MPKI). • thread number squared.

The Details of Multithreading Penalty offline online

Alpha Model Accuracy Our error is under 5% on average, which outperforms the error of Amdahl’s Law with error of 11.4%.

Scale-up Speedup How to predict the CPI on various type of cores? C0 C1 S B S B C2 C3 S S B B

Observation • this trend is well approximated by a power law. • this trend fits an exponential function well.

The Details of CPI Model • memory intensity. • computing intensity. • bias. offline online online

Beta Model Accuracy Our error is kept below 8% on average, which outperforms the error of PIE with error of 12.2%.

Phi Model Accuracy The prediction error of overall performance is kept below 12% on average.

Orthogonality Validation • three measured values. For most applications, the error about orthogonality is below 5% on average.

Application of Phi Model • Using Phi for runtime management

Phi Scheduling function “decide the cores to map for each application.” “decide the thread number to spawn for each application.” “Phi scheduling use the heuristic algorithm to maximize performance.” policy “application with largest scale-up speedup is allocated with the fastest type of cores.” “application with higher scale-out speedup should spawn more thread.” algorithm

Performance Comparison Phi averagely outperforms the other three baselines by 12.2% (Static), 13.3% (Bias) and 12.9% (PIE).

Related Works • Performance prediction and optimization periodically • Only decided the number of threads/active cores • CPR: Composable Performance Regression for Scalable Multiprocessor • [Benjamin C. Lee etc. MICRO2008] • FDT: Feedback-Driven Threading Power-Efficient and High-Performance Execution of Multi-threaded Workloads on CMPs • [M. Aater Suleman etc. ASPLOS2008] • Only decided the type of heterogeneous cores • Single-ISA Heterogeneous Multi-core Architectures for Multithreaded Workload Performance • [Rakesh Kumar etc. ISCA2004] • Scheduling Heterogeneous Multi-cores Through Performance Impact Estimation (PIE) • [Kenzo Van Craeynest etc. ISCA2012]

Conclusion • Analytical model for performance prediction • Scale-out speedup • Scale-up speedup • Overall performance • Phi scheduling • Apply for runtime management • Return optimal performance

Thanks for Your Attention Q&A

Jun Ma, Guihai Yan, Yinhe Han and Xiaowei Li State Key Laboratory of Computer Architecture

Jun Ma, Guihai Yan, Yinhe Han and Xiaowei Li State Key Laboratory of Computer Architecture

Presentation Transcript

Li Yan

The Career of Chao-Jun Li

Ethical Hacking Han Li

Suming Lai, Boyuan Yan and Peng Li Department of Electrical and Computer Engineering

Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University

Xiaoyu Yao and Jun Wang Computer Architecture and Storage System Laboratory (CASS)

Hui-jun Li

Group16 Zhengqi Li Yan Li Quiz Presentation

Jun Han Nankai University Shi Li Beijing Normal University

Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University

Guangjian Yan State Key Laboratory of Remote Sensing Science,

Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University

Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University

Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University

Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University

Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University

Guihai Yan 1 , Xiaoyao Liang 2 , Yinhe Han 1 , and Xiaowei Li 1

Jun Li, MD, Ph.D

Jun Han Nankai University Shi Li Beijing Normal University