html5-img
1 / 51

Taverna and myExperiment: Designing, Exchanging and Sharing of Scientific Workflows

Taverna and myExperiment: Designing, Exchanging and Sharing of Scientific Workflows. Katy Wolstencroft University of Manchester. Connecting things Together. Data Resources Genome databases Kinetic/metabolite data Analysis tools Sequence alignment Similarity searching Pattern matching

deacon
Download Presentation

Taverna and myExperiment: Designing, Exchanging and Sharing of Scientific Workflows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Taverna and myExperiment: Designing, Exchanging and Sharing of Scientific Workflows Katy Wolstencroft University of Manchester

  2. Connecting things Together • Data Resources • Genome databases • Kinetic/metabolite data • Analysis tools • Sequence alignment • Similarity searching • Pattern matching • Knowledge Resources • Ontologies • Controlled vocabularies

  3. Predicted Genes out Sequence RepeatMasker Web service GenScan Web Service BlastWeb Service What is aWorkflow? A mechanism for connecting things together Workflows provide a general technique for describing and enacting a process Describes what you want to do, not how you want to do it Simple language specifies how bioinformatics processes fit together Processes are represented as web services

  4. What is a workflow? • Business Process workflows • Tasks, Schedules, dependencies (on staff time), and costs • Scientific Workflows – on in silico data • Data throughput, dependencies (on analysis results) • Input, algorithm, output • Flow of information, scheduling of order, collection of results, intermediate results and provenance • High level description of your experiment • Workflow is the model of the experiment • Methods section in your publication • Workflow can be shared and reused

  5. Kepler Ptolemy II Triana BPEL Taverna

  6. Workflow diagram Available services Taverna Tree view of workflow structure Open source and extensible

  7. What is a web service? NOT the same as services on the web (i.e. web forms) Web services support machine-to-machine interaction over a network

  8. Web Evolution XML HTML Technology TCP/IP Presentation Programmability Connectivity FTP, E-mail, Gopher Innovation Web Pages Web Services Browse the Web Program the Web Taken from :http://www.softstar-inc.com/

  9. How do you use Web Services? • SOAP (Simple Object Access Protocol) • An xml protocol for passing messages • WSDL (Web Service Definition Language) • A machine-readable description of the operations supported • Normally transferred by http

  10. Who Provides the Services? • Open domain services and resources • Taverna accesses 3500+ services • Third party – we don’t own them – we didn’t build them • All the major providers • NCBI, DDBJ, EBI … • Enforce NO common data model. • Quality Web Services considered desirable

  11. What types of service? • WSDL Web Services • BioMart • R-processor • BioMoby • Soaplab • Local Java services • Beanshell • Workflows • Coming soon.....REST, Matlab......?

  12. A Collection of Components Discover and reuse services Share, discover and reuse workflows Feta Create and run workflows Manage the metadata needed and generated RDF, OWL

  13. What do Scientists use Taverna for? • Data gathering, annotation and model building • Data analysis from distributed tools • Data mining and knowledge management • Data curation and warehouse population • Parameter sweeps and simulation Users from Systems Biology, Proteomics, Sequence analysis, Protein structure prediction, Gene/protein annotation, Microarray data analysis, QTL studies, Chemioinformatics, Medical image analysis, Public Health care epidemiology, Heart model simulation, Phenotype studies, Phylogeny, Statistical analysis, Pharmacogenomics, Text mining Astronomy, Music, Meteorology

  14. Taverna - Successful cases of adoption Selected Successful Cases of Adoption Originally designed to support bioinformatics, now expanded into new areas

  15. Annotation Pipelines • Genome annotation pipelines • Bergen Center for Computational Science – Gene Prediction in Algal Viruses, a case study. • Workflow assembles evidence for predicted genes / potential functions • Human expert can ‘review’ this evidence before submission to the genome database • Data warehouse pipelines • e-Fungi – model organism warehouse • ISPIDER – proteomics warehouse • Annotating the up/down regulated genes in a microarray experiment

  16. Building models and knowledge management • SBML population • Comparing models and experimental data • Mining text resources and building knowledge models

  17. [Peter Li, Doug Kell] Systems Biology Model Construction Automatic reconstruction of genome-scale yeast metabolism fromdistributed data in the life sciences to create and manipulate Systems Biology Markup Models.

  18. LibSBML Integration • API consumer used to integrate libSBML directly into Taverna • Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data Peter Li, Juan I. Castrillo, Giles Velarde, Ingo Wassink, Stian Soiland-Reyes, Stuart Owen, David Withers, Tom Oinn, Matthew R. Pocock, Carole A. Goble, Stephen G. Oliver, Douglas B. Kell – Submitted to BMC bioinformatics

  19. Data Analysis Pipelines • Access to local and remote analysis tool • You start with your own data / public data of interest • You need to analyse it to extract biological knowledge

  20. Trichuris muris • Mouse whipworm infection - parasite model of the human parasite - Trichuris trichuria Understanding Phenotype • Comparing resistant vs susceptible strains – Microarrays Understanding Genotype • Mapping quantitative traits – Classical genetics QTL Joanne Pennock, Richard Grencis University of Manchester

  21. Trichuris muris • Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite. • Manual experimentation: Two year study of candidate genes, processes unidentified Joanne Pennock, Richard Grencis University of Manchester

  22. Trichuris muris • Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite. • Manual experimentation: Two year study of candidate genes, processes unidentified • JO IS A LAB BIOLOGIST • JO HAS NEVER BUILT A WORKFLOW Joanne Pennock, Richard Grencis University of Manchester

  23. Sleeping Sickness in African Cattle • Caused by infection by parasite (Trypanosoma brucei) • Some cattle breeds more resistant than others • Differences between resistant and susceptible cattle? • Can we breed cattle resistant to infection? Steve Kemp Andy Brass Fisher et al (2007) A systematic strategy for large-scale analysis of genotype phenotype correlations: identification of candidate genes involved in African trypanosomiasis. Nucleic Acids Res.35(16):5625-33 Paul Fisher http://www.genomics.liv.ac.uk/tryps/trypsindex.html

  24. Why was the Workflow Approach Successful? • Workflows are protocols – they can be reused or repurposed • Workflow analysed each piece of data systematically • Eliminated user bias and premature filtering of datasets and results leading to single sided, expert-driven hypotheses • The size of the QTL and amount of the microarray data made a manual approach impractical • Workflows capture exactly where data came from and how it was analysed • Workflow output produced a manageable amount of data for the biologists to interpret and verify • “make sense of this data” -> “does this make sense?”

  25. Sharing Experiments • Taverna supports the in silico experimental process for individual scientists • How do you share your results/experiments/experiences with your • Research group • Collaborators • Scientific community

  26. Just Enough Sharing…. • myExperiment can provide a central location for workflows from one community/group • myExperiment allows you to say • Who can look at your workflow • Who can download your workflow • Who can modify your workflow • Who can run your workflow

  27. Ownership and Attribution The most important aspect of myExperiment - Designed by scientists

  28. Packs • Packs allow you to collect different items together, like you might with a "wish list" or "shopping basket" • You can collect internal things (such as workflows, files and even other packs) as well as link to things outside myExperiment • Your packs can then be shared, tagged, discovered and discussed easily on myExperiment

  29. myExperiment Plugin in Taverna Bringing myExperiment to the Taverna User

  30. Running Workflows Through myExperimentTaverna Remote Execution (T-REX)

  31. PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX myexp: <http://rdf.myexperiment.org/ontology#> PREFIX sioc: <http://rdfs.org/sioc/ns#> select ?friend1 ?friend2 ?acceptedat where {?z rdf:type <http://rdf.myexperiment.org/ontology#Friendship> . ?z myexp:has-requester ?x . ?x sioc:name ?friend1 . ?z myexp:has-accepter ?y . ?y sioc:name ?friend2 . ?z myexp:accepted-at ?acceptedat } All accepted Friendships including accepted-at time Semantically-Interlinked Online Communities

  32. Service Discovery Feta “old School” BioCatalogue Discovery by tags, text and semantics Social curation Web based catalogue • Semantic Discovery • Ability to find service mismatches • Complex queries • Closed curation • Ugly GUI interface

  33. Finding Services There are over 3500 distributed services. How do we find an appropriate one? • We need to annotate services by their functions (and not their names!) • The services might be distributed, but a registry of service descriptions can be central and queried • Annotated with terms from the myGrid ontology • Questions we can ask: Find me all the services that perform a multiple sequence alignment and accept protein sequences in FASTA format as input

  34. myGrid Ontology Logically separated into two parts: • Service ontology Physical and operational features of web services • Domain ontology Vocabulary for core bioinformatics data, data types and their relationships Ontology developed in OWL

  35. myGrid ontology • Example : BLAST (from the DDBJ) • Performs task: Alignment • Uses Method: Similarity Search Algorithm • Uses Resources: DNA/Protein sequence databases • Inputs: • biological sequence • database name • blast program • Outputs: Blast Report

  36. Feta Search Result

  37. Limitations of the Current Model • Feta discovery tool is only accessible from the Taverna Workbench • Only pertinent to Taverna users – other people need to find and use web services • Focuses on finding services, but not workflows. For reuse, we need to do both • Closed annotation system - myGrid curator provides service descriptions

  38. BioCatalogue: A Community Resource • Expanding annotation to allow the community to join in • What is the minimum annotation we need to find the service, and to execute it? • Graduated annotation – bronze, silver, gold, platinum • Record who annotated what and when, to address service versioning and status • Service status monitors

  39. BioCatalogueJoint Manchester-EBI Curation by Developers refine validate seed Curation by Experts refine validate refine validate seed seed Automated Curation Curation by the Community Launch ISMB 2009

  40. Current work

  41. Speed and Scalability Taverna 2 enactor • Support for long running workflows • Large scale data – industrial bioinformatics • Data streaming • Passing data by reference • Integration with established computing platforms • caGrid, EGEE, KnowArc, Dutch e-Science Grid

  42. caGrid Plugin for Taverna • Enables discovery of services in caGrid service registry • Taverna support for GAARDS-secured caGrid services Lymphoma type prediction workflow

  43. Extensibility and ease of use • Drag and drop workflow building • More content • greater pool of workflows from myExperiment • More components • Gathering together commonly used sets of services • Service and workflow annotation checking • Shim libraries – for connecting incompatible services

  44. Remote Execution Taverna Remote Execution Service (T-REX) • Running workflows on a server • Running workflows inside other applications Taverna is for informatics people (bioinformaticians, cheminformaticians etc). We need other interfaces for uptake by laboratory scientists and health workers

  45. Toolkits “Taverna Inside” Workflows under the hood • e-Laboratories (portals) • Systems Biology, e-Health • Web based execution • Running workflows over the web through myExperiment • Visualisation clients that call workflows in the background

  46. UTOPIA Pettifer, Kell, University of Manchester

  47. Toolkits “Taverna Inside”Workflow development pipeline E-Labs and 3rd party clients Social support to find and reuse workflows and expertise CONFIGURABLE access to ready made workflows for biologists Workflows embedded in applications and combined with data management systems Social support for bioinformaticians to find and reuse workflows and expertise Access to ready made workflows for biologists Workflows developed by bioinformaticians Enacted locally Taverna remote execution service (T-Rex) Workflows enacted locally

  48. myGrid Team

More Related