1 / 24

A Framework for Elastic Execution of Existing MPI Programs

A Framework for Elastic Execution of Existing MPI Programs. Aarthi Raveendran Tekin Bicer Gagan Agrawal. Motivation. Emergence of Cloud Computing Including for HPC Applications Key Advantages of Cloud Computing Elasticity (dynamically acquire resources) Pay-as-you model

vivek
Download Presentation

A Framework for Elastic Execution of Existing MPI Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Framework for Elastic Execution of Existing MPI Programs AarthiRaveendran TekinBicer GaganAgrawal

  2. Motivation • Emergence of Cloud Computing • Including for HPC Applications • Key Advantages of Cloud Computing • Elasticity (dynamically acquire resources) • Pay-as-you model • Can be exploited to meet cost and/or time constraints • Existing HPC Applications • MPI-based, use fixed number of nodes • Need to make Existing MPI Applications Elastic A Framework for Elastic Execution of Existing MPI Programs

  3. Detailed Research Objective • To make MPI applications elastic • Exploit key advantage of Cloud Computing • Meet user defined time and/or cost constraints • Avoid new programming model or significant recoding • Design a framework for • Decision making • When to expand or contract • Actual Support for Elasticity • Allocation, Data Redistribution, Restart A Framework for Elastic Execution of Existing MPI Programs

  4. Outline • Research Objective • Framework Design • Run time support modules • Experimental Platform: Amazon Cloud Services • Applications and Experimental Evaluation • Conclusion A Framework for Elastic Execution of Existing MPI Programs

  5. Framework components A Framework for Elastic Execution of Existing MPI Programs

  6. Framework design – Approach and Assumptions • Target – Iterative HPC Applications • Assumption : Uniform work done at every iteration • Monitoring at the start of every few iterations of the time-step loop • Checkpointingand re- distribution • Calculaterequired iteration time based on user input A Framework for Elastic Execution of Existing MPI Programs

  7. Framework design - Modification to Source Code • Progress checked based on current average iteration time • Decision made to stop and restart if necessary • Reallocation should not be done too frequently • If restarting is not necessary, the application continues running A Framework for Elastic Execution of Existing MPI Programs

  8. Framework Design Execution flow A Framework for Elastic Execution of Existing MPI Programs

  9. Other Runtime Steps • Steps taken to perform scaling to a different number of nodes: • Live variables and arrays need to be collected at the master node and redistributed • Read only need not be restored – just retrieve • Application is restarted with each node reading the local portions of the redistributed data. A Framework for Elastic Execution of Existing MPI Programs

  10. Runtime support modules Decision layer • Interaction with user and application program • Constraints- Time or cost • Monitoring the progress and making a decision • Current work : • Measuring communication overhead and estimating scalability • Moving to large – type instances if necessary A Framework for Elastic Execution of Existing MPI Programs

  11. Framework design – Modification to Source Code A Framework for Elastic Execution of Existing MPI Programs

  12. Background – Amazon cloud • Services used in our framework : • Amazon Elastic compute cloud (EC2) • Virtual images called instances • Small instances : 1.7 GB of memory, 1 EC2 Compute Unit, 160 GB of local instance storage, 32-bit platform • Large instances : 7.5 GB of memory, 4 EC2 Compute Units, 850 GB of local instance storage, 64-bit platform • On demand , reserved , spot instances A Framework for Elastic Execution of Existing MPI Programs

  13. Background – Amazon cloud • Amazon Simple Storage Service (S3) • Provides key - value store • Data stored in files • Each file restricted to 5 GB • Unlimited number of files A Framework for Elastic Execution of Existing MPI Programs

  14. Runtime support modulesResource allocator • Elastic execution • Input taken from the decision layer on the number of resources • Allocating de- allocating resources in AWS environment • MPI configuration for these instances • Setting up of the MPI cluster • Configuring for password less login among nodes A Framework for Elastic Execution of Existing MPI Programs

  15. Runtime support modules Check pointing and redistribution • Multiple design options feasible with the support available on AWS • Amazon S3 • Unmodified Arrays • Quick access from EC2 instances • Arrays stored in small sized chunks • Remote file copy • Modified arrays (live arrays) • File writes and reads A Framework for Elastic Execution of Existing MPI Programs

  16. Runtime support modules Check pointing and redistribution • Current design • Knowledge of division of the original dataset necessary • Aggregation and redistribution done centrally on a single node • Future work • Source to source transformation tool • Decentralized array distribution schemes A Framework for Elastic Execution of Existing MPI Programs

  17. Experiments • Framework and approach evaluated using • Jacobi • Conjugate Gradient (CG ) • MPICH 2 used • 4, 8 and 16 small instances used for processing the data • Observation made with and without scaling the resources - Overheads 5-10% , which is negligible A Framework for Elastic Execution of Existing MPI Programs

  18. Experiments – Jacobi A Framework for Elastic Execution of Existing MPI Programs

  19. Experiments – Jacobi A Framework for Elastic Execution of Existing MPI Programs

  20. Experiments – Jacobi • Matrix updated at every iteration • Updated matrix collected and redistributed at node change • Worst case total redistribution overhead – less than 2% • Scalable application – performance increases with number of nodes A Framework for Elastic Execution of Existing MPI Programs

  21. Experiments - CG A Framework for Elastic Execution of Existing MPI Programs

  22. Experiments - CG A Framework for Elastic Execution of Existing MPI Programs

  23. Experiments - CG • Single vector which needs to be redistributed • Communication intensive application • Not scalable • Overheads are still low A Framework for Elastic Execution of Existing MPI Programs

  24. Conclusion • An overall approach to make MPI applications elastic and adaptable • An automated framework for deciding the number of instances for execution • Framework tested using 2 MPI applications showing low overheads during elastic execution A Framework for Elastic Execution of Existing MPI Programs

More Related