230 likes | 346 Views
Am phi sbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance. Jun Ma, Guihai Yan, Yinhe Han and Xiaowei Li State Key Laboratory of Computer Architecture Institute of Computing Technology, C.A.S.
E N D
Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-coresan analytical performance model for boosting performance Jun Ma, Guihai Yan, YinheHan and XiaoweiLi State Key Laboratory of Computer Architecture Institute of Computing Technology, C.A.S. Univ. of Chinese Academy of Sciences
Trends in Cloud Computing • The increasing computing demands • More massive • More diverse • High service level agreement(response time, throughput) • The computing platform to meet these demands • Multicore to manycore • Homogeneous to heterogeneous
Two Orthogonal Ways to Boost Performance • Scale-out speedup: explore many cores for higher thread-level parallelism • Scale-up speedup: explore heterogeneous cores for optimal application-core mapping
Quantifying Scale-out and Scale-up Speedup • The overall performance Indicate how to improve overall performance of each application. How to figure out the application-specific scale-out and scale-up speedup?
Amphisbaena: an Analytical Approach to Model Performance • Amphisbaena, or shortly, • Modeling the overall performance speedup coming from two orthogonal ways I’m I’m The ratio of performance on target multithreading configuration to current configuration on the same type of cores. The ratio of performance on target cores to current cores under the same multithreading configuration.
Scale-out Speedup • the serial part. • the parallelizable part. • the multithreading penalty.
Observation • modulating constant. • synchronization waiting cycles per kilo-instructions(SPKI). • thread number. • modulating constant. • misses waiting cycles per kilo-instructions(MPKI). • thread number squared.
The Details of Multithreading Penalty offline online
Alpha Model Accuracy Our error is under 5% on average, which outperforms the error of Amdahl’s Law with error of 11.4%.
Scale-up Speedup How to predict the CPI on various type of cores? C0 C1 S B S B C2 C3 S S B B
Observation • this trend is well approximated by a power law. • this trend fits an exponential function well.
The Details of CPI Model • memory intensity. • computing intensity. • bias. offline online online
Beta Model Accuracy Our error is kept below 8% on average, which outperforms the error of PIE with error of 12.2%.
Phi Model Accuracy The prediction error of overall performance is kept below 12% on average.
Orthogonality Validation • three measured values. For most applications, the error about orthogonality is below 5% on average.
Application of Phi Model • Using Phi for runtime management
Phi Scheduling function “decide the cores to map for each application.” “decide the thread number to spawn for each application.” “Phi scheduling use the heuristic algorithm to maximize performance.” policy “application with largest scale-up speedup is allocated with the fastest type of cores.” “application with higher scale-out speedup should spawn more thread.” algorithm
Performance Comparison Phi averagely outperforms the other three baselines by 12.2% (Static), 13.3% (Bias) and 12.9% (PIE).
Related Works • Performance prediction and optimization periodically • Only decided the number of threads/active cores • CPR: Composable Performance Regression for Scalable Multiprocessor • [Benjamin C. Lee etc. MICRO2008] • FDT: Feedback-Driven Threading Power-Efficient and High-Performance Execution of Multi-threaded Workloads on CMPs • [M. Aater Suleman etc. ASPLOS2008] • Only decided the type of heterogeneous cores • Single-ISA Heterogeneous Multi-core Architectures for Multithreaded Workload Performance • [Rakesh Kumar etc. ISCA2004] • Scheduling Heterogeneous Multi-cores Through Performance Impact Estimation (PIE) • [Kenzo Van Craeynest etc. ISCA2012]
Conclusion • Analytical model for performance prediction • Scale-out speedup • Scale-up speedup • Overall performance • Phi scheduling • Apply for runtime management • Return optimal performance