1 / 20

Cross-Platform Performance Prediction Using Partial Execution

This presentation discusses a model and approach for predicting cross-platform performance using partial execution, focusing on iteration-based scientific applications. The method utilizes partial executions and observation-based performance prediction to estimate resource usage and job wall time. The presentation also covers the challenges faced in modeling and simulating larger and more complex machines and applications.

rfoley
Download Presentation

Cross-Platform Performance Prediction Using Partial Execution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science Center for High Performance Simulations (CHiPS) North Carolina State University (* Joint Faculty with Oak Ridge National Laboratory) Supercomputing 2005

  2. Presentation Roadmap • Introduction • Model and approach • Performance results • Conclusion and future work Supercomputing 2005

  3. Cross-Platform Performance Prediction • Users face wide selection of machines • Need cross-platform performance prediction to • Choose platform to use / purchase • Estimate resource usage • Estimate job wall time • Machines and applications both grow larger and more complex • Modeling- and simulation-based approaches harder and more expensive • Performance data not reused in performance prediction Supercomputing 2005

  4. Observation-based Performance Prediction T = ? hrs T = 20 hrs • Observe cross-platform behavior • Treating applications and platforms as black boxes • Avoiding case-by-case model building • Covering entire application • Computation • Communication • I/O • Convenient with third-party libraries Performance translation Observation: existence of “reference platform” Goal: Cross-platform Meta-predictor Approach: based on relative performance Supercomputing 2005

  5. Presentation Roadmap • Introduction • Model and approach • Performance results • Conclusion and future work Supercomputing 2005

  6. Main Idea: Utilizing Partial Execution • Observation: majority of scientific applications are iteration-based • Highly repetitive behavior • phases -> timesteps • Execute small partial executions • Low-cost “test drives” • Simple APIs (indicate timesteps: k) • Quit after k timesteps target system Partial-1 Partial-2 reference system Relative performance = 0.6 Full-2 (predicted) Full-1 Supercomputing 2005

  7. Application Model • Execution of parallel simulations modeled as regular expression I(C*[W])*F • I: one-time initialization phase • C: computation phase • W: optional I/O phase • F: one-time finalization phase • Different phases likely have different cross-platform relative performance • Major challenges • Avoid impact of initially unstable performance • Predict correct mixture of C and W phases Supercomputing 2005

  8. Partial Execution • Terminate applications prematurely • API • init_timestep() • Optional, useful with large setup phase • begin_timestep() • end_timestep(maxsteps) • “begin” and “end” calls bracket C or CW phase • Execution terminated after maxsteps timesteps • Easy-to-use interface • 2-3 lines of codes inserted into source codes Supercomputing 2005

  9. Base Prediction Model • Given reference platform and target platform • Perform 1 or more partial executions • Compute average execution time of timestep on both platforms • Compute relative performance • Compute overall execution time estimate for target platform • Prediction performance (predicted-to-actual ratio) Supercomputing 2005

  10. Refined Prediction Model • Problem 1: initial performance fluctuations • Variances due to cache warm-up, etc. • May span dozens of timesteps • Problem 2: periodic I/O phases • I/O frequency often configurable and determined at run time • Unified solution • Monitor per-timestep performance variance at runtime • Identify anomalies and repeated patterns • Filter out early, unstable timestep measurements • Consider only later results once performance stabilizes • Combine early timestep overheads into initialization cost • Computing sliding window averages of per-timestep overheads • Use multiples of observed pattern length as window size Supercomputing 2005

  11. Presentation Roadmap • Introduction • Model and approach • Performance results • Conclusion and future work Supercomputing 2005

  12. Proof-of-concept experiments • Questions: • Is relative performance observed in a very short early period indicative of overall relative performance? • Can we reuse partial executiondata in predicting execution with different configurations? • Experiment settings • Large-scale codes: • 2 ASCI Purple (sphot and sPPM) • fusion code (Gyro) • rocket simulation (GENx) • Full runs take >5 hours • 10 super computers: SDSC, NCSA, ORNL, LLNL, UIUC, NCSU, NERSC • 7 architectures (SP3, SP4, Altix, Cray X1, 3 clusters: G5, Xeon, Itanium) Supercomputing 2005

  13. Base Model Accuracy (Sphot) • High accuracy with very short partial execution Supercomputing 2005

  14. Refined Model (sPPM, Ram->Henry2) normalized • Issues: • Ram: init variance • Henry2: 1 in 10 steps I/O • Smarter algorithms • Initialization filter • Sliding window • handle anomaly and periodic I/O Supercomputing 2005

  15. Application with Variable Problem Size • GENx Rocket Simulation (CSAR, UIUC), TuringFrost • Limited accuracy w/ variable timesteps Supercomputing 2005

  16. Reusing Partial Execution Data • Scientists often repeat runs with different configurations • Number of processors • Input size and data content • Computation tasks • Results from Gyro fusion simulation on 5 platforms Avg. Error: 12.1% - 25.8% Avg. Error: 5.6% - 37.9% Supercomputing 2005

  17. Presentation Roadmap • Introduction • Model and approach • Performance results • Conclusion and future work Supercomputing 2005

  18. Conclusion T = 20 hrs • Empirical performance prediction works! • Real-world production codes • Multiple parallel platforms • Highly accurate predictions • Limitations with • Variable problem sizes • Input-size/processor scaling • Observation-based prediction • Simple • Portable • Low cost (few timesteps) T = 1 hrs T = 2 hrs T = 10 hrs Supercomputing 2005

  19. Related Work • Parallel program performance prediction • Application-specific analytical models • Compiler/instrumentation tools • Simulation-based predictions • Cross-platform performance studies • Mostly examine multiple platforms individually • Grid job schedulers • Do not offer cross-platform performance translation Supercomputing 2005

  20. Ongoing and Future Work • Evaluate with AMR applications • Automated partial execution • Automatic computation phase identification • Binary rewriting to avoid source code modification • Extend to non-dedicated systems • For job schedulers Supercomputing 2005

More Related