html5-img
1 / 39

my Grid and Taverna: Now and in the Future

my Grid and Taverna: Now and in the Future. Dr. K. Wolstencroft University of Manchester Helsinki, June 2006. Background. my Grid middleware components to support in silico experiments in biology Originally designed to support bioinformatics chemoinformatics health informatics

nida
Download Presentation

my Grid and Taverna: Now and in the Future

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. myGrid and Taverna:Now and in the Future Dr. K. Wolstencroft University of Manchester Helsinki, June 2006

  2. Background • myGrid middleware components to support in silico experiments in biology • Originally designed to support bioinformatics chemoinformatics health informatics medical imaging integrative biology

  3. History EPSRC funded UK eScience Program Pilot Project

  4. myGrid in OMII-UK 10 Developers Dedicated design, implementation, testing and support team – moving towards production quality software myGrid OMII Stack OGSA-DAI March 2006

  5. Lots of Resources NAR 2006 – over 850 databases

  6. The User Community Bioinformatics is an open Community • Open access to data • Open access to resources • Open access to tools • Open access to applications Global in silico biological research

  7. The User Community Problems • Everything is Distributed • Data, Resources and Scientists • Heterogeneous data • Very few standards • I/O formats, data representation, annotation • Everything is a string! Integration of data and interoperability of resources is difficult

  8. ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY). FT CONFLICT 374 374 S -> A (IN REF. 3). SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI

  9. myGrid Approach - Workflows General technique for describing and enacting a process describes what you want to do, not how you want to do it Simple language specifies how bioinformatics processes fit together – processes are web services • High level workflow diagram separated from any lower level coding – therefore, you don’t have to be a coder to build workflows Predicted Genes out Sequence RepeatMasker Web service GenScan Web Service BlastWeb Service

  10. SCUFL Taverna Workbench Application data flow layer Scufl graph + service introspection Scufl + Workflow Object Model Execution flow layer List management; implicit iteration mechanism; MIME & semantic type decoration; fault management; service alternates Workflow Execution Freefluo Workflow enactor Processor invocation layer Processor Processor Processor Processor Processor Processor Processor Bio MOBY Bio MART Seq Hound Plain Web Service Soap lab Local App Enactor

  11. SOAPLAB Web Service Any Application Web Service e.g. DDBJ BLAST Taverna Workflow Components Freefluo Freefluo Workflow engine to run workflows Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available

  12. What Services we Support

  13. User Interaction Handling • Interaction Service and corresponding Taverna processor allows a workflow to call out to an expert human user • Used to embed the Artemis annotation editor within an otherwise automated genome annotation pipeline Collaboration with the University of Bergen Ref: Poster, Nettab 2005 • R for numerical analysis (microarray informatics amongst others)

  14. What shall I do when a service fails? • Most services are owned by other people • No control over service failure • Some are research level Workflows are only as good as the services they connect! To help - Taverna can: • Notify failures • Instigate retries • Set criticality • Substitute services

  15. myGrid Users • ~20000 downloads • Users in US, Singapore, UK, Europe, Australia • Systems biology • Proteomics • Gene/protein annotation • Microarray data analysis • Medical image analysis

  16. Trypanosomiasis Study Resistance to trypanosomiasis in cattle in Kenya Andy Brass, Paul Fisher – University of Manchester • Form of Sleeping sickness in cattle – • Known as n’gana • Caused by Trypanosoma brucei

  17. Study involves Microarray data QTL SNPs Metabolic pathway analysis Need to access microarray data, genomic sequence information, pathway databases AND integrate the results

  18. Addisons Disease SNP design Protein annotation Microarray analysis Workflow Reuse myGrid Workflow Repository http://workflows.mygrid.org.uk/repository

  19. Scufl Workflows + Taverna Workflow Workbench LSID OGSA-Distributed Query Processing mIR Results management Portal & Application tools Metadata & provenance management using semantics e-Science process patterns e-Science mediator KAVE e-Science coordination e-Science events Publication and Discovery using semantics myGrid information model Feta Service management Components designed to work together Ontology Notification service Pedro

  20. Data Management • Workflows can generate vast amount of data - how can we manage and track it? • Data AND metadata AND experiment provenance • LSIDs - to identify objects • Semantic Web technologies (RDF, Ontologies) • To store knowledge provenance • Taverna workflow workbench & plugins • Ensure automated recording

  21. [instanceOf] urn:data1 SwissProt_seq [similar_sequence_to] [input] urn:hit1… [performsTask] [instanceOf] urn:BlastNInvocation3 urn:hit2…. [contains] [output] Find similar sequence urn:hit50….. urn:data2 Sequence_hit urn:data12 [input] [hasHits] [instanceOf] urn:compareinvocation3 Blast_report [directlyDerivedFrom] [distantlyDerivedFrom] [instanceOf] [output] urn:hit5… urn:data:3 urn:hit8…. [contains] Data generated by services/workflows [output] urn:hit10….. [output] urn:data:f1 urn:invocation5 [ ] Properties [type] [hasName] urn:data:f2 Concepts [type] [hasName] Services Missed sequence DatumCollection New sequence LSDatum literals KAVE Data and metadata management • Life Science Identifiers (LSIDs) • Information Model • File management • Support for custom database building • Provenance metadata capture using RDF • SRB integration • OGSA-DAI integration

  22. Provenance Browsing in Taverna New in Taverna 1.4

  23. Feta Semantic Discovery Over 3000 services! Find services by their function Questions we can ask: Find me all the services that perform a multiple sequence alignment And accepts protein sequences in FASTA format as input

  24. myGrid Ontology sequence protein_structure_feature biological_sequence Similarity Search Service protein_sequence BLAST service nucleotide_sequence DNA_sequence BLASTp service InterProScan service Specialises Upper level ontology Contributes to Task ontology Informatics ontology Molecular Biology ontology Bioinformatics ontology Web Service ontology

  25. DL Reasoner User Ontologist Feta Architecture Feta Descriptions Feta Descriptions Feta Descriptions Obtain descriptions Taverna Workbench 3 Obtain Classification Feta GUI Client Feta Engine Service Ontology Editor 3 Semantic Discovery 4 Classification - In RDF(S) - Build myGrid Domain Ontology

  26. Annotations • Feta has been available for ~1 year • Not yet in the release • Need critical mass of services before release • Annotation experiments with users and domain experts • Domain expert annotations much better – • hiring a full-time annotation – see the myGrid website for details

  27. Gene annotation pipeline workflow Integration and visualisation of GD annotation workflow results Provenance Record Result Input Custom Data Model Results Integration Smarter workflow design incorporating visualisation VBI collaboration

  28. Visualisation SeqVista Utopia

  29. New Plans for Taverna 2.0

  30. Evolving challenges • Long running data intensive workflows • Manipulation of confidential or otherwise protected information • Use with classical grid systems • Interaction with users during workflows

  31. Development • Development of Taverna 2.0 • reworking of the processor model to include duel execution semantics incorporating data and control flow • enhanced support for long-running workflows • fully distributed workflow enactment and authoring • User steering • large scale data transfer

  32. Enhanced Processor Model • Modular dispatcher mechanism • Dynamic service binding • Recursive invocation • Data filter implementation • Retry, failover, back-off behaviours • Transparent third party data transfers • High throughput stream handling with implicit iteration semantics

  33. 3rd Party Data Transfers • Allows ‘in place’ referencing of data • Large data sets no longer round-trip between workflow engine and data provider • Allows restricted access to sensitive data • Automatic de-reference when a reference type is linked to a value type within a workflow. • Connecting a grid service to a web service

  34. Streaming Data • Allow execution of downstream workflow stages on partially complete results from upstream. Service 1 Service 2 Service 3 Non streaming (Taverna 1), entire iteration must complete at each stage Streamed data, Service 2 starts operating on partial results from Service 1

  35. Recursive Invocation Receive Input • Dispatcher allowing recursive invocation to be plugged into per operation semantics. Return Result

  36. Future Direction • Enhancements to the Workflow Core • Enhancements to user interface and experience • Expanded use of semantic web technologies • Code remains open source and always will

  37. Latest News • See plans for Taverna 2.0 on myGrid wiki • Taverna development is user-driven • Please keep in touch and tell us what you would like to see by the myGrid mailing lists: Taverna Users, Taverna Hackers • Bioinformatics curator for service annotation Details on the myGrid website

  38. Acknowledgements • The myGrid group – Past and Present • OMII-uk • Carole Goble • Pinar Alper • Tom Oinn • Antoon Goderis • Matthew Gamble • Daniele Turi

More Related