1 / 19

A Grid based System for Microbial Genome Comparison and analysis

A Grid based System for Microbial Genome Comparison and analysis. Anil Wipat University of Newcastle upon Tyne, UK. Motivation: Genome Comparison. The past decade has seen the emergence of whole genome sequencing

ernst
Download Presentation

A Grid based System for Microbial Genome Comparison and analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Grid based System for Microbial Genome Comparison and analysis Anil Wipat University of Newcastle upon Tyne, UK

  2. Motivation: Genome Comparison • The past decade has seen the emergence of whole genome sequencing • Whole genome sequences can reveal a great deal about the biology of an organism • Comparing genomes is one of the most effective ways to exploit genome sequence information • Establishes the differences and similarities at the genetic level • Aids biologists in understanding pathogenicity, evolution, ecology, metabolism, etc.

  3. Microbial Genome comparison commonly applied at different levels: Proteins (amino acid sequence MCSAKMQTR..) Proteins (amino acid sequence MSAKMPTR..) All–against-all Amino acid sequence comparisons between proteins Nucleotide sequence Comparison (whole genome) DNA (nucleotide sequence) DNA (nucleotide sequence) (..atcggatcgtacgagcgatc..) (..atcccatcgaacgagcgatc..)

  4. Motivation: Genome Comparison • The number of complete genome sequences is rapidly increasing as sequencing technology advances • e.g. ~200 whole genomes have been sequenced • Sequence analysis and comparison is becoming more computationally intensive • Large scale genome comparison is already beyond the capability of many laboratories • How are we going to handle all these genomes? • New methods and technologies for genome comparison are required.

  5. Microbase Project Overview • Aims to create a scalable, Grid-enabled analytical system to support microbial genome comparison. • Aims to support both the biological and bioinformatics community. • Funded by BBSRC Bioinformatics and e-Science & DTI • Started April 2003. • Collaboration with microbiologists and industrial partners • Providing use cases.

  6. Microbase: Functionality • A system that utilises Grid resources to automatically perform genome comparisons at nucleotide and protein levels • An information repository that: • maintains and exposes the results of these comparisons to users as a base level dataset • provides canned algorithms for analysis • A Grid-enabled high-performance environment to execute remote user-specified computations • Data integration with remote, Grid-enabled databases • e.g. Genomic, Metabolic, Protein Interaction, Gene Expression databases, etc…

  7. MicrobaseLite:A Prototype • The first prototype of the Microbase system • Automatically performs all-against-all genome comparisons and exposes the resulting datasets • Provide services for biologists to browse and query genome sequences and comparison results • Helps the specification of entire Microbase system and the derivation of use cases • Implemented using a Component-based architecture with Web servicesinterfaces • Also uses existing Grid technology – myGrid Notification Service

  8. MicrobaseLite: Datasets • 170 + microbial genomes including • Bacteria, archaea, eukaryota • Held in theGenomePoolcomponent • Results of all-against-all nucleotide sequence comparison • Blastn, MUMmer • Results of all-against-all protein sequence comparison • Blastp, Ssearch, Promer • Held in theComparisonPoolcomponent • Object-oriented data model of interspecies genome rearrangements • The OGRE module component (current research)

  9. MicrobaseLite: Architecture Server Side Client Side User Tools Genome Comparison Pool Microbial Genome Pool Client Proxies Request Builder Task Scheduler Notification Service Notification Proxy External Notification Internal Notification Web Services Proxy Protein Comparison DNA Comparison Response Receiver Genome Loader Post-processing BIOSQL Data Processing Graphical Viewer Comparison Database Web Services Query Query & Execution Object Model Builder Object-oriented Database OGRE Module

  10. MicrobaseLite: Microbial Genome Pool • Provide a Web / Grid service based information repository of microbial genomes • maintains a database of 170+ microbial genomes • A web-service implementation of BioJava Interfaces • Uses the myGrid Notification Service to notify registered clients of new genomes • Available for use now with a prototype API Microbial Genome Pool Comparison Pool Clients Notification Service External Notification Internal Notification Genome Loader BIOSQL Web Service API

  11. Genome Comparison Pool Comparison Database N1 Grid Engine Protein & Nucleotide Comparison Task Scheduler Post-processing Parallel Cluster(s) Parallel Cluster(s) MicrobaseLite: Genome Comparison Pool • Retrieves genomes from the Microbial Genome Pool automatically on Notification • Executes a variety of genome comparison tools: Blast, MUMmer, Promer, MSPcrunch • Incorporates a Task Scheduler for parallel processing • Uses N1 Grid Engine (batch system) to dispatch comparison tasks to run on Linux clusters • Comparison outputs processed and stored into a relational database (mySQL).

  12. Task Scheduler and scalability Execution times of all-against-all comparisons with 10 microbial genomes (Blastp, Blastn, MSPcrunch, MUMmer and PROmer )

  13. MicrobaseLite: User Tools • Demonstration graphical tools under development • Genome Browser allows users to view genomes, the comparison results and the results of canned algorithms • Deployed at client-side operating via Web services

  14. Vision for the full Microbase System • Continue to explore scalability issues using MicrobaseLite as platform • Towards seamless scalability • Harnessing of remote clusters on demand • A system for the submission and enactment of remotely conceived code or workflows for user defined comparative analysis • Investigating the integration of Taverna core to enact SCUFL workflows within Microbase

  15. Conclusions • Microbase aims to exploit Grid resources to provide a scalable system for Microbial genome comparison • MicrobaseLite produced as a prototype and demonstrator application for the biologist/bioinformatician • Work now underway on the full Microbase - a system to support remotely conceived computations

  16. Acknowledgements • The Microbase Team: • Anil Wipat, Yudong Sun, Matthew Pocock, Keith Flanagan, Pete Lee, and Paul Watson • The Microbase User Requirements/Use case contributors • myGrid project (Particularly Southampton and EBI) • The Industrial supporters: NonLinear Dynamics, NCIMB, Arrow Therapeutics, Angel Biotech, Complement Genomics, ACS Dobfar, AstraZeneca • See www.microbase.org.uk

  17. Microbial Genome comparison commonly applied at two levels: Proteins (amino acid sequence MCSAKMQTR..) Proteins (amino acid sequence MSAKMPTR..) All–against-all Amino acid sequence comparisons between proteins Nucleotide sequence Comparison (whole genome) DNA (nucleotide sequence) DNA (nucleotide sequence) (..atcggatcgtacgagcgatc..) (..atcccatcgaacgagcgatc..)

  18. OGRE: Object-oriented Genome REarrangements Model • A dataset that captures genomic rearrangements between microorganisms • Object-Oriented (OO) concepts and formalism are being used to classify the results of the nucleotide sequence comparison • An Ontology and OO-conceptual model is being developed to describe chromosomal rearrangements and to define objects that can represent them • Algorithms developed to recognise defined rearrangement features in nucleotide sequence comparison data • Objects made persistent in a OO database

  19. MicrobaseLite: OGRE Module • Performs object-oriented analysis and storage of genome rearrangements • An OO dataset captures genomic rearrangements revealed through nucleotide sequence comparison • Made persistent in an OO database • Provides Web services interface for external users to query and analyse the OO dataset Comparison Pool Web Services Query & Execution Object Model Builder Object-oriented Database OGRE Module

More Related