1 / 19

Running Scientific Workflow Applications on the Amazon EC2 Cloud

Running Scientific Workflow Applications on the Amazon EC2 Cloud. Bruce Berriman NASA Exoplanet Science Institute, IPAC Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta Information Sciences Institute, USC Benjamin Berman USC Epigenome Center Phil Maechling So Cal Earthquake Center.

byron
Download Presentation

Running Scientific Workflow Applications on the Amazon EC2 Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Running Scientific Workflow Applications on the Amazon EC2 Cloud Bruce Berriman NASA Exoplanet Science Institute, IPAC Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta Information Sciences Institute, USC Benjamin Berman USC Epigenome Center Phil Maechling So Cal Earthquake Center

  2. Clouds (Utility Computing) • Pay for what you use rather than purchase compute and storage resources that end up underutilized • Analogous to household utilities • Originated in the business domain to provide services for small companies who did not want to maintain an IT Department • Provided by data centers that are built on compute and storage virtualization technologies. • Clouds built with commodity hardware. They are a “new purchasing paradigm” rather than a new technology.

  3. Benefits and Concerns • Concerns • What if they become oversubscribed and user cannot increase capacity on demand? • How will the cost structure change with time? • If we become dependent on them, will we be at the cloud providers’ mercy? • Are clouds secure? • Are they up to the demands of science applications? Benefits • Pay only for what you need • Elasticity - increase or decrease capacity within minutes • Ease strain on local physical plant • Control local system administration costs

  4. Cloud Providers • Pricing Structures vary widely • Amazon EC2 charges for hourly usage • Skytap charges per month • IBM requires an annual subscription • Savvis offers servers for purchase • Uses • Running business applications • Web hosting • Provide additional capacity for heavy loads • Application testing

  5. Purposes of Our Study How useful is cloud computing for scientific workflow applications? • An experimental study of the performance of three workflows with different I/O, memory and CPU requirements on a commercial cloud • A comparison of the performance of cloud resources and typical HPC resources, and • An analysis of the various costs associated with running workflows on a commercial cloud. http://www.ncsa.illinois.edu/UserInfo/Resources/Hardware/Intel64Cluster/ http://aws.amazon.com/ec2/

  6. The Applications: Montage Toolkit for assembling FITS images into science-grade mosaics. Montage processing flow http://montage.ipac.caltech.edu Reprojection Background rectification Co-addition • Science Grade - preserves spatial and calibration fidelity of input images. • Portable – all common *nix platforms • Open source code • General – all common coords and image projections • Speed – Processes 40 million pixels in 32 min on 128 nodes of 1.2 GHz Linux cluster • Utilities for managing and manipulating image files • Stand-alone modules

  7. The Applications: Broadband and Epigenome • Broadband simulates and compares seismograms from earthquake simulation codes. • Generates high- and low-frequency earthquakes for several sources • Computes intensities of seismograms at measuring stations. • Epigenome maps short DNA segments collected using high-throughput gene sequencing machines to a reference genome. • Maps chunks to a reference genome • Produces an output map of gene density compared with the reference genome

  8. Comparison of Resource Usage • Ran a mosaic 8 deg sq of M17 in 2MASS J-band • Workflow contains 10,429 tasks • Reads 4.2 GB of input data • Produces 7.9 GB of output data. • Montage is I/O-bound because it spends more than 95% of its time in I/O operations.

  9. Comparison of Resource Usage • Broadband • 4 sources and 5 stations • Workflow contains 320 tasks • 6 GB of input data and 160 MB of output data. • Memory-limited because more than 75% of its runtime is consumed by tasks requiring more than 1 GB of physical memory • Epigenome • Workflow contains 81 tasks, • 1.8 GB of input data • 300 MB of output data. • CPU-bound because it spends 99% of its runtime in the CPU and only 1% on I/O and other activities.

  10. Processing Resources Amazon Abe • Processors and OS • Linux Red Hat Enterprise with VMWare • Amazon EC2 offers different instances – look at cost vs. performance • c1.xlarge and abe.local equivalent – estimate overhead due to virtualization • abe.lustre and abe.local differ only in file system • Networks and File Systems • HPC systems use high-performance network and parallel file systems BUT • Amazon EC2 uses commodity hardware • Ran all processes on single, multi-core nodes. Used local and parallel file system on Abe.

  11. Execution Environment Amazon EC2 • Establish equivalent software environments on the two platforms • “Submit” host used to send jobs to EC2 or Abe. • All workflows used the Pegasus Workflow Management System with DAGMan and Condor. • Pegasus - transforms abstract workflow descriptions into concrete plans • DAGMan – manages dependencies • Condor manages task execution Abe

  12. Montage Performance(I/O Bound) • Slowest on m1.small, but fastest on those machines with the most cores: m1.xlarge, c1.xlarge and abe.lustre, abe.local. • The parallel file system on abe.lustre offers a big performance advantage for I/O bound systems – cloud providers would need to offer parallel file system and high-speed networks. • Virtualization overhead <10%

  13. Broadband Performance (Memory bound) • Lower I/O requirements – not much difference between abe.lustre and abe.local; both have 8 GB memory. Only slightly worse performance on c1.xlarge, 7.5 GB memory. • Poor performance on c1.medium – only 1.7 GB of memory. Cores may sit idle to prevent system running out of memory. • Virtualization overhead small

  14. Epigenome Performance (CPU Bound) • c1.xlarge, abe.lustre and abe.local give best performance – they are the three most powerful machines (64-bit, 2.3-2.6 GHz) • The parallel file system on abe.lustre offers little benefit. • Virtualization overhead is roughly 10%, largest of three apps - competing for CPU with OS.

  15. Resource Cost Analysis • You get what you pay for! • The cheapest instances are the least powerful. • c1.medium a good choice for Montage but more powerful processors better for other two.

  16. Data Transfer Costs • For Broadband and Epigenome, economical to transfer data out of the cloud • For Montage, output larger than input, so the costs to transfer data out are equal to or higher than processing costs for all but one processing instance. • Is it more economical to store data on the cloud?

  17. Storage Costs Storage Costs of Output/job Storage Charges … And the bottom line

  18. Most cost-effective model? • Assume 1,000 2MASS mosaics of 4 deg sq centered on M17 per month for 3 years. Assume c1.medium processor on Amazon EC2 • 15 Xeon 3.2- GHz dual processor dual-core Dell 2650 Power Edge servers • Aberdeen Technologies 6-TB staging disk farm • Dell PowerVault MD1200 storage disks

  19. Conclusions • Clouds can be used effectively and fairly efficiently for scientific applications. The virtualization overhead is low. • The high speed network and parallel file systems give HPC clusters a significant performance advantage over cloud computing for I/O bound applications. • On Amazon EC2, primary cost for Montage is data transfer. Processing is primary cost for Broadband, epigenome. • Amazon EC2 offers no dramatic cost benefits over a locally mounted image-mosaic service. Reference:G. Juve, E. Deelman, K. Vahi, G. Mehta, B. Berriman, B. P. Berman, and P. Maechling, "Scientific Workflow Applications on Amazon EC2," in CloudComputing Workshop  in Conjunction with e-Science Oxford, UK: IEEE, 2009

More Related