1 / 39

Performing In silico Experiments in a Service Based Architecture: Solutions and Issues

Performing In silico Experiments in a Service Based Architecture: Solutions and Issues. Chris Wroe, Phillip Lord, Robert Stevens & Carole Goble The University of Manchester, UK http://www.mygrid.org.uk. EPSRC funded UK eScience Program Pilot Project.

odin
Download Presentation

Performing In silico Experiments in a Service Based Architecture: Solutions and Issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord, Robert Stevens & Carole Goble The University of Manchester, UK http://www.mygrid.org.uk VBI Web Services Workshop 26-27 May 2005

  2. EPSRC funded UK eScience Program Pilot Project Thanks to the other members of the Taverna project, http://taverna.sf.net VBI Web Services Workshop 26-27 May 2005

  3. Core • Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Jan Humble, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pocock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Ian Roberts, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson, Jimi Worthington and Chris Wroe. Users • Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK • Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK • Steve Kemp, Liverpool, UK Postgraduates • Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial • Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) • Robin McEntire (GSK) Collaborators • Keith Decker VBI Web Services Workshop 26-27 May 2005

  4. Bioinformatics Services • A typical HAD environment– Distributed, Autonomous and very, very Heterogeneous • No standard API or calling mechanisms • Complex types are often implicit – everything is String • No domain typing – everything is String • Numerous Services and growing • Close the world – controlled, but constrained • Open the world – uncontrolled, but versatile VBI Web Services Workshop 26-27 May 2005

  5. In silico Bioinformatics • Bioinformatics experiments use 1, 2 up to N services chained together • Ultimate result is the goal and some or all intermediates are part of the goal • Intermediates are necessary for evidence gathering • Often need to be repeated • Often need to be re-purposed • Workflows offer a suitable model for bioinformatics experiments VBI Web Services Workshop 26-27 May 2005

  6. ~1.5 Mb 7q11.23 Patient deletions * * WBS SVAS CTA-315H11 CTB-51J22 Physical Map ‘Gap’ Chr 7 ~155 Mb Williams-Beuren Syndrome • Contiguous sporadic gene deletion disorder • 1/20,000 live births, caused by unequal crossover (homologous recombination) during meiosis • Haploinsufficiency of the region results in the phenotype VBI Web Services Workshop 26-27 May 2005

  7. 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa • Identify new, overlapping sequence of interest • Characterise the new sequence at nucleotide and amino acid level Cutting and pasting between numerous web-based services i.e. BLAST, InterProScan etc VBI Web Services Workshop 26-27 May 2005

  8. Middleware for data intensive in silico biology by bioinformaticians • The individual scientist doodling • Workflows & distributed queries to link up your own and others resources • Data intensive, up stream pipelines • Reuse - sharing and adapting workflows & resources, and their outcomes • Semantic descriptions for discovery, validation & linkage • Whole experiment lifecycle, including logging provenance Forming experiments Personalisation Discovering and reusing experiments and resources Executing & monitoring experiments Managing lifecycle, provenance and results Sharing services & experiments VBI Web Services Workshop 26-27 May 2005

  9. An Open World • Open source • Open domain services and resources • Open community • Open application • Nothing specific to biology but oriented to • Open model and open data • No prescribed typing or domain data model • A layered information model • Open architecture • Service Oriented Architecture • Loosely coupled • Web services based • Assemble your own components • Designed to work together Feta Discovery Pedro Annotation KAVE Grimoire Registry Taverna Freefluo Mediator Portal Event Notification LSIDs mIR Soaplab Gowlab BioNanny Info. Model DQP VBI Web Services Workshop 26-27 May 2005

  10. Stakeholders Biologists Service Providers Bioinformaticians VBI Web Services Workshop 26-27 May 2005

  11. Benefit Cost Activation Energy • Jam today • Important for take up and community building. • Take up leads to much better understanding. • Energy of bioinformaticians and service providers • Dealing with lots of legacy remote services • Incorporating my bits and pieces • Networking effects • Added value with added effort VBI Web Services Workshop 26-27 May 2005

  12. SOAPLAB Web Service Any Application Web Service e.g. DDBJ BLAST Taverna http://taverna.sourceforge.net/ Freefluo Freefluo Workflow engine to run workflows Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available SeqHound Service Special processor VBI Web Services Workshop 26-27 May 2005

  13. Service failure protocol Viewer plug-ins Viewer plug-ins VBI Web Services Workshop 26-27 May 2005

  14. Life Science Identifiers Model Driven Approach Information Repository and Common Information model for e-Science RDF Knowledge Added Value to Experiment OWL & RDFS Ontologies To annotate and classify entities with a common vocabulary based on a common understanding. VBI Web Services Workshop 26-27 May 2005

  15. Williams-Beuren Workflows Identification of overlapping sequence Characterisation of protein sequence Characterisation of nucleotide sequence VBI Web Services Workshop 26-27 May 2005

  16. WBS Workflow Experience • Correct and Biologically meaningful results: Found all expected results; plus unnoticed pseudo gene • Automation: Saved time, increased productivity • Sharing: Other people have used and want to develop the workflows, notably mouse and chicken VBI Web Services Workshop 26-27 May 2005

  17. Graves Disease Autoimmune disease of the thyroid in which the immune system of an individual attacks cells in the thyroid gland resulting in hyperthyroidism Gene annotation pipelines Microarray analysis pipelines Find differentially expressed genes, e.g. NF-kappa beta inhibitor protein VBI Web Services Workshop 26-27 May 2005

  18. Reuseadapting and sharing best practice and know-how across a community Mouse genome Trypanosomiasis in cattle Chicken genome Chris Wroe, Carole Goble, Antoon Goderis, Phillip Lord, Simon Miles, Juri Papay, Pinar Alper, Luc Moreau Recycling workflows and services through discovery and reuseConcurrency and Computation: Practice and Engineering accepted for publication VBI Web Services Workshop 26-27 May 2005

  19. Third-party tools Tavernae-Science workbench LSID Launchpad Haystack Applications Web portals Utopia e-Science process patterns LSID support myGrid information model e-Science mediator e-Science coordination Metadata Management Data Management e-Science events KAVE metadata store mIRmyGrid information repository Service & workflowdiscovery Feta semantic discovery ProQA provenance manager Custom databases Core Services Pedro semantic publication Workflow enactment Pedro semantic publication Taverna-Freefluoworkflow engine GRIMOIRES registry Notification service myGrid ontology Web Service (Grid Service) communication fabric External Services Java applications Soaplab Termino Lexical mark-up OGSA-DQP service Executable codes with an IDL Gowlab OGSA-DAI databases Legacy applications Web Services Web Sites VBI Web Services Workshop 26-27 May 2005

  20. First, catch your service • Taverna currently ships with access to over 1000 services • But it wasn’t always the case! • Lack of available services, at least at first • A lot of activation energy needed that hopeful gets less as services get pooled • Service partnerships and network effects • If your service ain’t there, that’s an obstacle. VBI Web Services Workshop 26-27 May 2005

  21. Service Bootstrapping • Soaplab and Gowlab wrappers • http://industry.ebi.ac.uk/soaplab/ • WSDL scavenging • Processor abstraction over stereotypical invocation patterns of service families • Many services are not plain WSDL • API consumer in Taverna 1.1 VBI Web Services Workshop 26-27 May 2005

  22. API Consumer Interface • Interoperate existing APIs with SOAP services, SoapLab, BioMoby, SeqHound, caBIG, BioJava, etc. • Refine complex APIs to sets of task centric functionality • Take advantage of myGrid infrastructure: monitoring, result browsing, provenance etc. and applies it to your APIs • Taverna 1.1 onwards, download API consumer and toolset at http://taverna.sf.net User selects appropriate methods to be exposed within Taverna Classes and Interfaces presented here VBI Web Services Workshop 26-27 May 2005

  23. Import into Taverna Previously created API definition is imported – methods and constructors appear as components alongside other services. VBI Web Services Workshop 26-27 May 2005

  24. Invocation Heterogeneity Processors IBM Life Sciences BLAST service • WSDL - single Web Service operation described in a WSDL file. • Local Java or Beanshell function • Soaplab - CORBA-like stateful protocol of the Web Service operations • Nested workflow - implemented by a Scufl workflow. • BioMOBY processor. • SeqHound - a Representational State Transfer style interface • BioMart - directly accesses queries over a relational database. • Styx - executes a workflow subgraph containing streamed services using P2P data transfer based on Styx Grid service protocol. BLAST SOAPLAB BLAST service setProgram() createJob() setDatabase() run() setE_value() getResults() blastQuery() VBI Web Services Workshop 26-27 May 2005

  25. Three tiered abstraction Taverna Workbench Application data flow layer Scufl graph + service introspection Scufl + Workflow Object Model Execution flow layer List management; implicit iteration mechanism; MIME & semantic type decoration; fault management; service alternates Workflow Execution Freefluo Workflow enactor Processor invocation layer Processor Processor Processor Processor Processor Processor Processor Bio MOBY Bio MART Seq Hound Plain Web Service Soap lab Local App Enactor VBI Web Services Workshop 26-27 May 2005

  26. Architecture Confusagram Tom Oinn, Mark Greenwood, Matthew Addis, M. Nedim Alpdemir,Justin Ferris, Kevin Glover, Carole Goble, Duncan Hull, Darren Marvin, Peter Li, Phillip Lord, Matthew R. Pocock, Martin Senger, Robert Stevens, Anil Wipat and Chris Wroe Taverna: Lessons in creating a workflow environment for the life sciences in Concurrency and Computation: Practice and Engineering in press VBI Web Services Workshop 26-27 May 2005

  27. Soaplab Service WSDL Web Service BioMOBY Service Local Java Service VBI Web Services Workshop 26-27 May 2005

  28. e-Science process patterns e-Science mediator e-Science coordination e-Science events Notification service Workflows are not the only game Workflows Applications Protein Phosphatases OGSA-DQP Mediator VBI Web Services Workshop 26-27 May 2005

  29. So many services, so poorly described • How to select among 1000+ services? • Mostly inputs & outputs are “string” • Domain specific descriptions of capabilities • Selection is part of workflow assembly by bioinformaticians • Selection of alternates for failure also generally user defined, and usually replicas, but need not be. ? VBI Web Services Workshop 26-27 May 2005

  30. Semantic discovery • Publish and find services (and workflows) with description using an ontology (in OWL/RDF) • Define domain types for objects passed around and a set of dimensions with which service capabilities can be defined using processor abstraction • Bootstrapping descriptions • Mining and maintaining descriptions • The Expert Annotator • GRIMOIRE / WebDAV directory • Tie into BioMOBY central • http://phoebus.cs.man.ac.uk:8100/feta-beta/mygrid/descriptions/ Phillip Lord, Pinar Alper, Chris Wroe, and Carole Goble Feta: A light-weight architecture for user oriented semantic service discovery in Proc of 2nd European Semantic Web Conference, Crete, June 2005 VBI Web Services Workshop 26-27 May 2005

  31. Processor Processor API API Semantic Web ServicesLayered model Generic Schema for Service (part of Information model) Specific Application Ontology e.g. caCORE We don’t describe WSDL, we describe operations and processors We are classifying for people not machines, so don’t be too clever! Web Interface Wroe C, Goble CA, Greenwood M, Lord P, Miles S, Papay J, Payne T, Moreau L Automating Experiments Using Semantic Data on a Bioinformatics Grid in IEEE Intelligent Systems Jan/Feb 2004 VBI Web Services Workshop 26-27 May 2005

  32. Operation name, description task method resource application Service name description authororganisation Parameter name, description semantic type format transport type collection type collection format hasInput hasOutput subclass subclass WSDL based operation WSDL based Web service workflow bioMoby service Soaplab service Local Java code VBI Web Services Workshop 26-27 May 2005

  33. Service hassles • The workflow are only as good as the services they link together. • Licensing models  • Instability and unreliability • BioNanny + QoS registry description • Configurable fault tolerance and fail over strategies for graceful failure • Few alternates and genuine replica services VBI Web Services Workshop 26-27 May 2005

  34. Sequence database entry Fasta format sequence Genbank format sequence Sequence i.e. last known 3000bp Identify new sequences and determine their degree of identity Mask BLAST Simplify and Compare Retrieve Lister Old BLAST result Alignment of full query sequence V full ‘new’ sequence BLAST2 Type management: Shims • The fiddly bits necessitated by not having a common type system or object model, or building elaborate wrappers • Adding functionality to Web Services • Shim libraries; Automatic deployment at workflow assembly • Beanshell scripts for quick and dirty scripting ‘I want to identify new sequences which overlap with my query sequence and determine if they are useful’ VBI Web Services Workshop 26-27 May 2005

  35. Provenance Record Input Result Result Result Result Result Workflow Practices • Put the workflow together to duplicate how they did the linking without duplicating how they did the on-the-fly integration • Post hoc analysis. Don’t analyse data piece by piece receive all data all at once • Service interoperability but fragmented results • Because integration needs smarter workflows and smart thinking about data types. • Close the world with Shims or services and build domain objects. • Smarter ways of visualising and linking intermediate results using provenance graphs • Custom visualisation application VBI Web Services Workshop 26-27 May 2005

  36. Gene annotation pipeline workflow Integration and visualisation of GD annotation workflow results Provenance Record Result Input Custom Data Model Integrated results VBI Web Services Workshop 26-27 May 2005

  37. Integration and interoperation Provenance Annotation Service & Data Annotation App & Shim Services Domain Semantics Domain Semantics Ontologies Ontologies • Information model is a container for domain semantics • Linking stuff together is Integration Lite Custom Data Objects Custom Data Objects Information Model e-Science Semantics e-Science Semantics Syntax Syntax Configuration Configuration Workflows Processors Shims Invocation model Invocation model Interface Interface Shims Data format Data format LSID Data identity Data Identity VBI Web Services Workshop 26-27 May 2005

  38. Take Homes • Our apps are providing real scientific results – or at least the hypotheses… • The problem is not really gathering and coordinating services, but gathering and coordinating the results • Are you interoperating or integrating • Careful thought has to go into the abstractions we apply to services for finding them and running them • Activation energy vs reusability of service: ROI and altruism • We need more services, more replicas of services, better service interfaces and better reliability and stability • Most of our services turn out not to be vanilla WSDL • Light touch vs added value VBI Web Services Workshop 26-27 May 2005

  39. Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord, Robert Stevens & Carole Goble The University of Manchester, UK http://www.mygrid.org.uk VBI Web Services Workshop 26-27 May 2005

More Related