1 / 18

Adding the Easy Button to the Cloud with SnowFlock and MPI

Adding the Easy Button to the Cloud with SnowFlock and MPI. Philip Patchin , H. Andr é s Lagar-Cavilla , Eyal de Lara, Michael Brudno University of Toronto philip.patchin@utoronto.ca. Cloud Computing = 7 th Heaven?. Rapid growth of large computing clusters Reduced hardware costs

zohar
Download Presentation

Adding the Easy Button to the Cloud with SnowFlock and MPI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adding the Easy Button to the Cloud with SnowFlock and MPI Philip Patchin, H. Andrés Lagar-Cavilla, Eyal de Lara, Michael Brudno University of Toronto philip.patchin@utoronto.ca

  2. Cloud Computing = 7th Heaven? Rapid growth of large computing clusters Reduced hardware costs No physical hardware to manage Clouds with fast, local connectivity Upload a VM to the cloud Rent computing time Seems good for HPC Can set up large cluster affordably Leave physical machine management to others Can have privileged access in VMs

  3. Or just another circle of Hell? Still have to manage all the VMs Every VM must be started up and configured Application user or developer becomes sys admin Users must learn a new API Often arcane, provider-specific Not germane to the application Undermines the advantages of using the cloud Much time spent managing virtual infrastructure May have to leave VMs idle to reduce start-up time Idle VMs often get consolidated

  4. MPI on Static Cluster Well, at least the paint looks dry... ...but we need something quicker and easier!

  5. A Solution: SnowFlock + MPI • Instantaneous virtual clusters • SnowFlock • Standard Interface for parallel programming • MPI

  6. Advantages • Easy management • Of VMs • Of applications • Familiar interface • Application code unchanged

  7. SnowFlock Cloning Overview • Instantiate virtual cluster in ~1 second Virtual Network • Start-up master SnowFlock VM • Merge cluster

  8. Cloning – Some Detail ~1MB for 1GB VM Private State Clone 1 Memory State ? Virtual Machine Multicast VM Descriptor VM Descriptor VM Descriptor Clone 2 Private State ?

  9. Naïve SnowFlock + MPI • Instantiate virtual cluster in ~1 second Virtual Network • Merge cluster • Slow • Still significant administration • Start-up master SnowFlock VM • Start up MPI...

  10. SnowFlock-Aware MPI • Fast • Administration free Virtual Network

  11. SnowFlock-Aware MPI Needs modifying to work with SnowFlock MPICH from Argonne Open source, popular New mpirun uses SnowFlock API to initiate cloning Establishes connections between hosts Manages application MPICH API unchanged Existing applications can just be re-linked

  12. Experimental Evaluation Platform 32x4 CPU cluster Gigabit Ethernet Applications ClustalW, MPI Blast – bioinformatics MrBayes – evolutionary biology VASP – chemistry, physics Tachyon - graphics

  13. Results I Overhead 13-40%

  14. Results II Very fast cluster start-up and shutdown Optimal Footprint Little administration required No need to change applications

  15. Results III • Some overhead • Varies with application • Large numbers of connections creates a bottleneck • More state swapping decreases performance • Make MPICH more SnowFlock-aware • Remove bottleneck in architecture • Make connection management distributed

  16. Future Work Computation resizing using multi-level cloning Dynamic resizing to use newly-added hardware Increase capacity for large sub-computations Variable footprint for multiple stage computations Read-only shared memory Create state before cloning Rely on SnowFlock to transfer state rather than MPI Fault tolerance Allow failed nodes to be restarted Only re-run affected part of computation

  17. Conclusion SnowFlock-MPI Easy to use Little administration No application code changes Initial performance evaluation Overhead from 13-40% Further improvements still possible Allows for MPI extensions We have escaped from the Inferno!

  18. Questions? http://sysweb.cs.toronto.edu/snowflock philip.patchin@utoronto.ca

More Related