1 / 11

Analysing African and European cattle with Taverna 2.2

Analysing African and European cattle with Taverna 2.2. Stuart Owen Based on the work by : Professor Andy Brass and Mohammad Khodadadi University of Manchester, UK Harry Noyes and Steve Kemp University of Liverpool, UK BOSC2010 – Boston.

lunea-mayo
Download Presentation

Analysing African and European cattle with Taverna 2.2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysing African and European cattle with Taverna 2.2 • Stuart Owen • Based on the work by: • Professor Andy Brass and Mohammad Khodadadi • University of Manchester, UK • Harry Noyes and Steve Kemp • University of Liverpool, UK • BOSC2010 – Boston.

  2. Analysing African and European cattle with Taverna 2.2 A BioInformatics case study demonstrating the use of the Taverna 2 workflow system This is a snapshot of some exiting science which is currently in progress

  3. Analysing African and European cattle with Taverna 2.2 • 10,000 years separation • African Livestock adaptations: • Hardier • Better disease resistance • Potential outcomes: • Food security • Understanding resistance • Understanding environmental Conditions • Drought • Parasites • Understanding diversity • http://news.bbc.co.uk/1/hi/science_and_environment/10403254.stm • http://www.sciencemag.org/cgi/content/full/328/5986/1640

  4. Workflow and phases MAP FILTER ANALYSIS

  5. Workflow and phases Input SNP file Populate DB with start SNP’s and resource version numbers Lift-over: maps between UMD3 and BTA4 cow assemblies Exon positions from ENSMBL Find SNPs in Exon regions PolyPhen to mark “dangerous” SNP’s

  6. Little more about the phases … • Input SNP file result of 15 fold average coverage of an entire Boran cow • 11.9 million SNP’s described. • Resulting from Next Generation Sequencing. • All initial data is stored within a Database, mapped by a runID to the versions of ENSEMBL, LiftOver, Polyphen. • LiftOver – provides a mapping between 2 different reference cow assemblies – • UMD3 : more accurate assembly • BTA4 : better annotated and ENSEMBL friendly • Store BT4 position, Chromosome and Allele in database • Filter out, but store, results where there is a mismatch between the base.

  7. … Little more about the phases • ESEMBL is used to retrieve annotations about the SNP’s : http://www.ensembl.org/ • For all the SNPs that have the same base we go over all the exons for cow in ENSEMBL and see if we can match the SNPs to any of these exons ( exon start < SNP position < exon end), also store geneID, Allele, associated Gene names, and Bio-Type. • Filter out, but store, ENSEMBL/BTA4 mismatches. • Second phase fetches the consequence according the the BTA4 positions. • From this information a file is generated for PolyPhen, for all SNPs that got non-synonymous as a consequence. • A local instance of PolyPhen is queried using a file generated from the ENSEMBL annotations to produce an indication of the level to which a SNP changes the protein. • Outcome is an Annotated Database of ~20,000 “interesting” SNPs

  8. ENSEMBL ENSEMBL LiftOver LiftOver 50,000 annotated SNPs 20,000 annotated SNPs + provenance. 11.9 Million SNPs 11.9 Million SNPs PolyPhen PolyPhen Results Results Packaged as a sharable virtual machine image

  9. Packaged as a sharable virtual machine image • LiftOver, Taverna, PolyPhen and the Workflow is packaged as a Virtual Machine image. • Everything (except ENSEMBL) is run locally • Full Cow analysis takes 2 days – previous attempts would have taken an estimated 3 months for the PolyPhen phase alone. • Results and experiment can be distributed and shared as a complete package • Re-use • Repeatable • Reproducible • Future plans to deploy the image on “The Cloud”

  10. Packaged as a sharable virtual machine image ENSEMBL MAP MAP FILTER FILTER FILTER Annotated DB Boran Cow Sheko Cow ANALYSIS ANALYSIS ANALYSIS N’Dama Cow Etc …

  11. Highlights of new Taverna 2.2 features • Officially released last Wednesday – July 7th 2010 • Loading and sharing of service sets • Ability to load and edit workflows that contain services that are offline • Reporting on the state of the workflow • Tabular representation of a workflow run • Retrying and parallelization of service calls • Consistent representation of the intermediate and workflow results • Pause/resume/cancel of a running workflow • Command line tool that allows you to execute workflows outside of the workbench. • Faster, Better, Easier

More Related