1 / 16

Selected Workflow and Semantic Experiences from my Grid

Selected Workflow and Semantic Experiences from my Grid. Professor Carole Goble. The University of Manchester, UK carole.goble@manchester.ac.uk http://www.mygrid.org.uk http://www.omii.ac.uk.

kassia
Download Presentation

Selected Workflow and Semantic Experiences from my Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SelectedWorkflow and Semantic Experiences from myGrid Professor Carole Goble The University of Manchester, UK carole.goble@manchester.ac.uk http://www.mygrid.org.uk http://www.omii.ac.uk

  2. 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt • A UK e-Science project to build middleware for in silico experiments by individual life scientists, stuck in under-resourced labs, who use other people’s applications. • Sequence analysis, microarray analysis, proteomics, chemoinformatics, image processing, rendering Dilbert cartoons.

  3. Two tiers of services • myGrid services • for workflow, data management, provenance management, browser clients, service discovery etc • Open extensible SO architecture: Web services, APIs, e-Science events, messages, plug-in framework, information model • Neat and controlled • Domain services • BioMART, BioMOBY, NCBI, EMBL-EMBL, R package, Seqhound, EMBOSS, PubMed, caBIG etc • 3000+ of these. None of them ours. • Scruffy and independent. And not much WSDL.

  4. Open World Burden • Independent third party service providers • Independent, unknown users • No compatibility compliance between domain services expected • No one application (data pipeline focused) • No common domain data model • Lightweight + Jam today

  5. Workflows • Explicit exposed description for the scientist about how to do stuff …and what you did…and the provenance of what you got. • Easier to explain, share, relocate, reuse and repurpose. • User viewpoint. • Pattern books and workflow catalogues • A market of workflows

  6. Freefluo Workflow enactor Processor Styx Processor Processor Processor Processor Processor Processor Processor Bio MOBY Bio MART WSRF Plain Web Service Soap lab Local Java App Styx client R package Enactor • How to hide the complexity of interoperating these domain services? • Bury it

  7. How to cope with data incompatibility between services? • Fix up the services to be compatible • Shims – libraries of adapters.

  8. Experience Report • Workflows and bits of workflow are popular and get exchanged. • Buy-in depends on MY service’s availability. • User-oriented workflow language hides a multitude of sins. • Shims are ok. And we should hide ‘em. • Results management is killer. • Need workflow patterns and best practice. • Did not use BPEL.

  9. 3000+ services? 100s of workflows? How do I find anything? How do I know what works with what and what it does? Service Model Ontology

  10. Experience Report • OWL Reasoning to classify and match services • Capturing and curating content bottleneck. • People vs machine descriptions. • For people - a little semantics goes a long way. Don’t be too clever. • Semantic Web Service models (OWL-S, WSMO, WSDL-S) immature

  11. Workflow outcomes • A record of outcome data and its provenance. • Store data outcomes with a unique id, link together in a typed graph. • In fact store all provenance as graph! • Life Science Identifier

  12. [instanceOf] urn:data1 SwissProt_seq [similar_sequence_to] [input] urn:hit1… [performsTask] [instanceOf] urn:BlastNInvocation3 urn:hit2…. [contains] [output] Find similar sequence urn:hit50….. urn:data2 Sequence_hit urn:data12 [input] [hasHits] [instanceOf] urn:compareinvocation3 Blast_report [directlyDerivedFrom] [distantlyDerivedFrom] [instanceOf] [output] urn:hit5… urn:data:3 urn:hit8…. [contains] Data generated by services/workflows [output] urn:hit10….. [output] urn:data:f1 urn:invocation5 [ ] Properties [type] [hasName] urn:data:f2 Concepts [type] [hasName] Services Missed sequence DatumCollection New sequence LSDatum literals Concept Data

  13. DNA_sequence Blast_service Blast_report [instanceOf] urn:data1 SwissProt_seq instanceOf [similar_sequence_to] [input] urn:BlastNInvocation3 urn:hit1… instanceOf [performsTask] instanceOf [instanceOf] urn:BlastNInvocation3 urn:hit2…. inputOf outputOf urn:run5 [contains] contains_similiar_seq_to [output] Find similar sequence urn:hit50….. createdFrom urn:data2 urn:data:3 urn:data2 Sequence_hit inputOf runOf urn:data12 [input] [hasHits] [instanceOf] urn:williamsA urn:genbank1… urn:compareinvocation3 Blast_report instanceOf urn:genbank2… [directlyDerivedFrom] DNA_sequence [distantlyDerivedFrom] createdBy [instanceOf] createdBy [output] urn:hit5… urn:data:3 urn:genbank50… urn:data2 inputOf urn:hit8…. [contains] Data generated by services/workflows urn:run7 LSID [output] urn:hit10….. [output] urn:data:f1 urn:invocation5 GenBank UniProt runOf [ ] Properties [type] [hasName] urn:data:f2 Concepts urn:williamsB [type] [hasName] Services Missed sequence DatumCollection New sequence LSDatum literals Fusion between different data models using shared concepts or data Add assertions, Add rulesReason over assertions

  14. Experience Report • Classification and reasoning over results. • Graph matching. • User provenance + machine provenance • Extensible non-prescriptive model • Maturity of standards – LSID . • Scalability and maturity of tools. • RDF graphs are not for humans. Customised presentation tools.

  15. Take home • Workflows and semantic web technologies powerful tools. • Especially for scruffies. • Both about description. • Both help us be flexible.

More Related