Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research firstname.lastname@example.org GMOD in the Cloud Genome Informatics November 3, 2011
Introduction: GMOD is … • A set of interoperable open-source software components for visualizing, annotating, and managing biological data. • An active community of developers and users asking diverse questions, and facing common challenges, with their biological data.
Who uses GMOD? Plus hundreds of others
GMOD in the Cloud What GMOD in the cloud isn't: Clouds Guy getting blown up Garry's MOD (aka gmod.com)
Several GMOD Cloud Projects Galaxy - Web-based platform for data intensive biomedical research CloVR - Automated and portable sequence analysis GBrowse2 - Web-based, scalable genome browser cloud.gmod.org - Several integrated GMOD tools http://gmod.org/wiki/Cloud
Galaxy Cloudman • Get Galaxy without the data or usage limitations. • Combine with Cloud BioLinux to have access to MANY tools. • Create an analysis cluster in minutes. • Use autoscaling to get good performance at low cost. http://wiki.g2.bx.psu.edu/Admin/Cloud
Deploying Galaxy cluster on AWS 1. 2. 3. 4.
Exercising elasticity with autoscaling Fixed cluster size Computation time: 9 hrs 5 nodes Computation cost: $20 Computation time: 6 hrs 20 nodes Computation cost: $50 Dynamic cluster size 1 to 16 nodes Computation time: 6 hrs Computation cost: $20
CloVR • Cloud Virtual Resource. • Automated pipeline for sequence analysis. • Uses 2 GMOD tools: Workflow and Ergatis. • Use a virtual machine locally to interact with resources in the cloud. http://clovr.org/
Why the virtual machine? Running the pipeline happens on the local machine, while the heavy lifting is done on the cloud/cluster
GBrowse2 • Installed and configured recent release of GBrowse2. • Tools to allow automatically adding rendering servers. • Ability to add standard data sets. http://gmod.org/wiki/GBrowse
GBrowse2 GBrowse2 in the Cloud Master Render Slaves Yeast Fly Worm Human Amazon Snapshots
cloud.gmod.org GMOD tools preinstalled: Can be run as a micro machine (albeit slowly)
A little more on Tripal Based on the popular CMS Drupal. Several modules written to serve as an interface for Chado: Controlled Vocabularies Features Analyses Libraries Stocks Integrated job management
Potential use case for Cloud GMOD Community annotation: Just add a web-start Apollo and set the security group to allow it to connect to the database. When WebApollo is ready, it's even easier: WA is an addon to JBrowse but allows collaborative editing. Tripal and Drupal allow editing of most data types in Chado, and commenting on pages similar to a blog.
Why use the cloud? Avoid installation related issues (saves you time and frustration!) Save money (how much, of course, depends) Availability of common genomic data sets (several projects already make these available at AWS)
Future work • Get GBrowse2 AMI public (very soon) • Add Apollo to gmod.cloud.org (relatively soon) • Add WebApollo to gmod.cloud.org (as soon as it's released)
Conclusion http://gmod.org/wiki/Cloud for more information on GMOD work in the cloud. http://cloud.gmod.org/ for a running example of cloud.gmod.org. http://clovr.org/ for more info on CloVR and to download the client VM. http://getgalaxy.org/ for more information on getting Cloudman.
Acknowlegements • Funding agencies: NIH, USDA ARS, NSF, Ontario Ministry of Economic Development and Innovation • Lincoln Stein, Chris Vandevelde • Enis Afgan and the Galaxy Team • Sam Angiuoli et al at UofM SOM • Stephen Ficklin and the Tripal group • Mitch Skinner and JBrowse developers • The rest of the GMOD community