1 / 47

Taverna the story from up-above

Taverna the story from up-above. Antoon Goderis The University of Manchester, UK. http://www.mygrid.org.uk/taverna http://www.omii.ac.uk. DART workshop, Brisbane, Australia, 14 December 2006. Overview. The situation in –omics Creating new biology using Taverna Taverna Key traits

chaman
Download Presentation

Taverna the story from up-above

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tavernathe story from up-above Antoon Goderis The University of Manchester, UK http://www.mygrid.org.uk/taverna http://www.omii.ac.uk DART workshop, Brisbane, Australia, 14 December 2006

  2. Overview • The situation in –omics • Creating new biology using Taverna • Taverna • Key traits • Features on the OMII roadmap • Including today’s release

  3. Bioinformaticians & co.

  4. Open environmentData, Data, Data National Center for Biotechnology Information (USA) EBI Tokyo, Japan Cambridge, UK SeqHound SRS

  5. 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

  6. The situation in {genomics, transcriptomics, proteomics, metabolomics ..} • Lots of data • Lots of parameters to choose • An analysis takes a long time • The analysis services are unreliable • Lots of analysis steps • Need to record and explain your steps

  7. Enter workflows • Lots of data[high throughput] • Lots of parameters to choose[best practice] • An analysis takes a long time [long running] • The analysis services are unreliable [fault tolerance] • Lots of analysis steps [data and control flow] • Need to record and explain your steps [provenance]

  8. 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg Workflow-based middleware

  9. myGrid • myGrid http://www.mygrid.org.uk • UK e-Science pilot project since 2001 • Part of the Open Middleware Infrastructure Institute UK • Build middleware for Life Scientists that enables them to undertake in silico experiments and share those experiments and their results. • Individual scientists, in under-resourced labs, who use other people’s applications. • Open source. • Workflows & Semantic Techologies for metadata management. • Data flows. Ad hoc & exploratory

  10. Overview • The situation in -omics • Creating new biology using Taverna • Taverna • Key traits • Features on the OMII roadmap • Including today’s release

  11. Phenotype Genotype 200 ? Phenotypic response investigated using microarray in form of expressed genes or evidence provided through QTL mapping Genes captured in microarray experiment and present in QTL region Microarray + QTL [Andy Brass, Steve Kemp, Paul Fisher, 2006]

  12. Key: A – Retrieve genes in QTL region B – Annotate genes with external database Ids C – Cross-reference Ids with KEGG gene ids D – Retrieve microarray data from MaxD database E – For each KEGG gene get the pathways it’s involved in F – For each pathway get a description of what it does G – For each KEGG gene get a description of what it does [Andy Brass, Steve Kemp, Paul Fisher, 2006]

  13. Result • Captured the pathways returned by QTL and Microarray workflows over the MaxD microarray database • Identified a pathway for which its correlating gene (Daxx) is believed to play a role in trypanosomiasis resistance. • Manually analysis on the microarray and QTL data had failed to identify this gene as a candidate. [Andy Brass, Steve Kemp, Paul Fisher, 2006]

  14. Trichuris muris (mouse whipworm) infection • Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite. • Manual experimentation: Two year study of candidate genes, processes unidentified • Workflows: trypanosomiasis cattle experiment, was reused without change. • Analysis of the data by a biologist found the processes in a couple of days. [Joanne Pennock, Paul Fisher, 2006]

  15. Changing scientific practice • Systematic and comprehensive automation. • Eliminated user bias and premature filtering of datasets and results leading to single sided, expert-driven hypotheses • Dry people hypothesise, wet people validate. • “make sense of this data” -> “does this make sense?” • Workflow factories. • Different dataset, different result • Accurate provenance.

  16. Overview • The situation in -omics • Creating new biology using Taverna • Taverna • Key traits • Features on the OMII roadmap • Including today’s release

  17. User Uptake • ~25000 downloads • Systems biology • Proteomics • Gene/protein annotation • Microarray data analysis • Medical image analysis • Heart simulations • High throughput screening • Phenotypical studies • Plants, Mouse, Human • Astronomy • Dilbert Cartoons

  18. Finding and Sharing Tools 3rd Party Applications and Portals Taverna Workbench myExperiment DAS Utopia Feta Workflow Enactor Clients Workflow enactor Service Management LSIDs Provenance log Metadata DefaultData Store Custom Store Results Management KAVE BAKLAVA

  19. Taverna workbench

  20. 3000+ services • Open domain services and resources, Third party. • Enforce NO common data model. • No common typing, Missing metadata. • Soaplab • InstantSoap

  21. Services Landscape

  22. User Interaction • Allows a workflow to call out to an expert human user • E.g. Used to embed the Artemis annotation editor within an otherwise automated genome annotation pipeline [University of Bergen]

  23. Tools, Tools, Tools Pedro Annotation tool Feta Search tool

  24. Capture and Curation Effort Ontology and Annotation Curation Team Franck Tanoh and Katy Wolstencroft Community Scientists Community Service Providers

  25. Workflow enactor Processor Processor Processor Processor Processor Processor Processor Processor Processor Bio MOBY Bio MART Seq Hound Plain Web Service Soap lab Local Java App WF Enactor WSRF Beanshell Shielding & Extensible plug-ins Taverna Workbench Application Scufl Model Simple Conceptual Unified Flow Language Nested workflows, Automatic iterations, Best guess data type handling Workflow Execution

  26. Duncan Hull, myGrid Khalid Belhajjame, ISPIDER Service incompatibility • Fix up the services to be compatible or…. • Shims – libraries of adapters. • Automated data type matching using reasoning over a mismatch and service ontology

  27. Shimidentification Mismatchdetection

  28. Service failure? • Most services are owned by other people • No control over service failure • Some are research level Workflows only as good as the services they connect. • Notify failures • Instigate retries • Set criticality • Substitute services

  29. [instanceOf] urn:data1 SwissProt_seq [similar_sequence_to] [input] urn:hit1… [performsTask] [instanceOf] urn:BlastNInvocation3 urn:hit2…. [contains] [output] Find similar sequence urn:hit50….. urn:data2 Sequence_hit urn:data12 [input] [hasHits] [instanceOf] urn:compareinvocation3 Blast_report [directlyDerivedFrom] [distantlyDerivedFrom] [instanceOf] [output] urn:hit5… urn:data:3 urn:hit8…. [contains] Data generated by services/workflows [output] urn:hit10….. [output] urn:data:f1 urn:invocation5 [ ] Properties [type] [hasName] urn:data:f2 Concepts [type] [hasName] Services Missed sequence DatumCollection New sequence LSDatum literals Provenance Collection • Observes events from the workflow engine • Populates an RDF triple store with information from these events • Browse interface • Simple browser replicates Taverna’s existing result and status browser • Graphical browser • ProQA Query API [Zhao et al 07 provenance challenge paper]

  30. Provenance Tracking From which Ensembl gene does pathway mmu004620 come from?

  31. Workflows over Results Automatically backtrack through the data provenance graph Entrez dF dF dF dF Pathway_id KEGG_id Uniprot Ensembl_gene_id

  32. A workflow marketplace

  33. webTaverna GUI - main

  34. Overview • The situation in -omics • Creating new biology using Taverna • Taverna • Key traits • Features on the OMII roadmap • Including today’s release

  35. myGrid Alliance Source-forge community Ingest OMII-UK Release myGrid Release myGrid Pre-release Evaluation Software Engineering Quality & Test OMII Software Engineering Quality & Test Software Engineering XP Prioritise & Plan Applications & Professional Services Production Conservatives Early adopters Pioneers Early adopters Pioneers Pioneers

  36. Who are the OMII Users? Different scientific/research domains End Users Different activities Application Developers Increasing variation in requirements with the scientific domain. Service and Middleware Developers Middleware Deployers Systems Administrators

  37. Taverna is now part of OMII-UK • Taverna 1.5 – Today! • Taverna 1.6 • myExperiment

  38. Taverna 1.5 • Integrated provenance • Raven release mechanism to simplify updates for the user • +/- 300 semantic annotations for core services • Patterns for using proxies for bulk data transactions • Redeveloped plug in and enactor framework, improved iteration events, data management

  39. Taverna 1.5 • Integrated provenance

  40. Taverna 1.5 • Integrated provenance • Raven release mechanism to simplify updates for the user

  41. Taverna 1.5 • Integrated provenance • Raven release mechanism to simplify updates for the user • +/- 300 semantic annotations for core services Add_ncbi_to_string : beanshell script, need to ask Paul for more details Input: Output: Kegg_gene_ids_all_species (bconv): converts external IDs to KEGG IDs [mapping] string: External ID . e.g. NCBI ID [Genebank_GI] return: KEGG gene ID [KEGG_record_id] Get_pathways_by_genes: Search all pathways which include all the given genes [Searching] Input: List of KEGG genes id [KEGG_gene_id] Output: Return a list of pathway_id of specified KEGG genes_id Merge_pathways Stringlist Concatenated This workflow takes in Entrez gene ids then adds the string "ncbi-geneid:" to the start of each gene id. These gene ids are then cross-referenced to KEGG gene ids. Each KEGG gene id is then sent to the KEGG pathway database and its relevant pathways returned.

  42. Taverna 1.5 • Integrated provenance • Raven release mechanism to simplify updates for the user • +/- 300 semantic annotations for core services • Patterns for using proxies for bulk data transactions • Redeveloped plug in and enactor framework, improved iteration events, data management

  43. Taverna 1.6 • Due out Summer 2007 • Revised enactment core • Native support for long running workflows • Data proxy to deal with bulk data transactions • Improved service discovery and provenance management

  44. Obtaining Taverna • Taverna is available under the LGPL from our project site on Sourceforge.net • http://taverna.sourceforge.net • Win32, Solaris / Linux & OS-X • Includes online and downloadable user manual, examples etc. • Support via project mailing lists

  45. Conclusions • See plans for Taverna 2.0 on myGrid wiki • Taverna development is user-driven • Please keep in touch and tell us what you would like to see by the myGrid mailing lists: Taverna Users, Taverna Hackers Taverna http://taverna.sourceforge.net myGrid http://www.mygrid.org.uk OMII-UK http://www.omii.ac.uk

  46. Acknowledgements • Phase1 myGrid researchers, Phase2 OMII-UK, myGrid Research Team • Peter Li, Paul Fisher, Andy Brass, Robert Stevens, Mark Wilkinson • EPSRC, Wellcome Foundation, EU

More Related