1 / 36

Service Discovery in my Grid and the Biocatalogue, a Life Science Service Registry

Service Discovery in my Grid and the Biocatalogue, a Life Science Service Registry. Katy Wolstencroft myGrid University of Manchester. Lots of Resources. NAR 2008 – over 1000 databases. Taverna Workflow Workbench. Design and execution of workflows

razi
Download Presentation

Service Discovery in my Grid and the Biocatalogue, a Life Science Service Registry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Service Discovery in myGrid and the Biocatalogue, a Life Science Service Registry Katy Wolstencroft myGrid University of Manchester

  2. Lots of Resources NAR 2008 – over 1000 databases

  3. Taverna Workflow Workbench • Design and execution of workflows • Access to local and remote resources and analysis tools • Automation of data flow • Iteration over large data sets • Part of the myGrid project

  4. Access to 3500+ public service operations 55,000+ sourceforge downloads 10,000+ downloads of v1.7 40+ downloads per day Ranked 148 sourceforge activity (11 Nov 2008) 350+ known organisations 17 known commercial 1000+ active users at any one time Users throughout UK, USA, Europe, SE Asia and South America Netherlands Bioinformatics Centre Genome Canada Bioinformatics Platform BioMOBY US iPlant Consortium US FLOSS social science program RENCI French SIGENAE farm animals project ThaiGrid CARMEN Neuroscience project SPINE consortium EU Enfin, EMBRACE, BioSapian, Casimir EU SysMO Consortium NEBC The NERC Environmental Bioinformatics Centre Bergen Centre for Computational Biology Max-Planck institute for Plant Breeding Research Genoa Cancer Research Centre AstroGrid caBIG/caGRID Who Uses Taverna?

  5. What do Scientists use Taverna for? Systems biology model building Proteomics Sequence analysis Protein structure prediction Gene/protein annotation Proteomics Microarray data analysis QTL studies QSAR studies Chemoinformatics Medical image analysis Public Health care epidemiology Heart model simulations High throughput screening Phenotype studies Phylogeny Statistical analysis Text mining Astronomy, Music, Meteorology • Data gathering, annotation and model building • Data analysis from distributed tools • Data mining and knowledge management • Hypothesis generation and modelling and Text mining • Data curation and warehouse population • Parameter sweeps and simulation

  6. Discover and reuse services Feta Share, discover and reuse workflows Create and run workflows Create and manage services as components Manage the metadata needed and generated RDF, OWL API Consumer Open Source Workflow Environment for Scientists

  7. Workflow Reuse • Workflows allow high throughput experiments and automation • Workflows are encapsulations of experiments • Workflows developed for one experiment can be reused for others • Easier to share, reuse and repurpose The METHODS section of a scientific publication

  8. Recycling, Reuse, Repurposing • Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle • Paul meets Jo. Jo is investigating mouse Whipworm infection. • Jo reuses one of Paul’s workflow without change. • Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite. • Previously a manual two year study by Jo had failed to do this.

  9. Where are the Services From? • Over 3500 services available • Major Service Providers • European Bioinformatics Institute • DNA DataBank of Japan • NCBI – USA • ‘Boutique’ Services • Individual research labs producing public data sets • Specialist tools for niche experiments • We are not service providers

  10. What types of services? • HTML • WSDL Web Services • BioMart • R-processor • BioMoby • Soaplab • Local Java services • Beanshell • Workflows • ….coming soon – REST, Matlab Variable or non-existent documentation or help

  11. Taverna in a ‘open’ world Advantages • Connection to lots of resources • Flexible system • Can adapt to new technologies Disadvantages • Services are developed for other purposes • We can’t control how they work • We have to deal with the heterogeneity

  12. Finding Services When using services, scientists need to: • Find them – in distributed locations, produced by different host institutions • Interpret them – what do the services do - what experiments can they perform using them? • Know how to invoke them – what data and initial parameters do they need to supply?

  13. Metadata from a WSDL <wsdl:message name="getGlimmersResponse"> <wsdl:part name="getGlimmersReturn" type="xsd:string"/> </wsdl:message> <wsdl:message name="aboutServiceRequest"/> <wsdl:message name="getGlimmersRequest"> <wsdl:part name="in0" type="xsd:string"/> <wsdl:part name="in1" type="xsd:string"/> <wsdl:part name="in2" type="xsd:string"/> <wsdl:part name="in3" type="xsd:string"/> <wsdl:part name="in4" type="xsd:string"/> <wsdl:part name="in5" type="xsd:string"/> <wsdl:part name="in6" type="xsd:string"/> <wsdl:part name="in7" type="xsd:int"/> <wsdl:part name="in8" type="xsd:string"/> Name of the service Uninformative names for parameters What kind of string? Pathport Web service from the Virginia Bioinformatics Institute http://pathport.vbi.vt.edu/services/wsdls/beta/glimmer.wsd

  14. Semantics and Web Services • SAWSDL – Semantic Annotations for WSDL working group • Virtually no uptake by bioinformatics service providers • Doesn’t address non-WSDL services

  15. Adding Semantics – Annotating Services Find services by their function instead of their name • The services might be distributed, but a registry of service descriptions can be central and queried • We need to annotate services with semantics In myGrid, we use the Feta Semantic Discovery tool and a semantic annotation tool – and expert curation

  16. myGrid Ontology Logically separated into two parts: • Service ontology Physical and operational features of (web) services • Domain ontology (Semantic Content Model) Annotation vocabulary for core bioinformatics data, data types and their relationships

  17. Service Ontology • Models services from the point of view of the scientist • Where is it? • How many inputs/outputs? • Who hosts it? • Invocation details are hidden by the Taverna workbench • Differs from related initiatives in this respect

  18. Domain Ontology • Informatics: captures the key concepts of data, data structures, databases and metadata. • Bioinformatics: The domain-specific data sources (e.g. the model organism sequencing databases), and domain-specific algorithms for searching and analyzing data (e.g. the sequence alignment algorithm, clustalw). • Molecular biology: Concepts include examples such as, protein sequence, and nucleic acid sequence. • Formats: A hierarchy describing bioinformatics file formats. For example, fasta format for sequence data, or phylip format for phylogenetic data • Tasks: A hierarchy describing the generic tasks a service operation can perform. Examples include retrieving, displaying, and aligning.

  19. Example Service Annotation • Example : BLAST from the DDBJ • Performs task: Alignment • Uses Method: Similarity Search Algorithm • Uses Resources: DNA/Protein sequence databases • Inputs: • biological sequence (and format) • database name (and format) • blast program (and format) • Outputs: Blast Report

  20. myGrid Ontology First version of the ontology ~ 2002 Originally developed in DAML+OIL Now developed in OWL and a version exported to RDFS Number of classes in the ontology ~750 Domain and service ontology used by myGrid users and developers of myGrid related plugins Service ontology also used by BioMoby W3C compliant WRT ontology modelling

  21. How do we use the ontology? Two methods of service description 1. Decision Support - querying Composite matches to ontology terms Multiple terms are used to query the annotations 2. Decision Making - reasoning Single description – whole service model Enables automated detection of service mismatches Enables possibility of automated addition of services

  22. Curation Sweatshop • Steady increase in numbers of services and workflows • Users able to find annotated services BUT • Time-consuming and expensive • More and more services built daily SO • Should we encourage service providers to add value? • Should we get users involved?

  23. Collaboration between University of Manchester and EBI Drawing on 6 years experience in Taverna of semantic annotation of services using RDF and OWL ontologies Drawing on experience at EBI in service provision Drawing on experience of social curation and networking from myExperiment First pilot December 2008

  24. Getting the Minimum Community annotation • Must be easy and quick • Must allow partial descriptions • Multiple annotations of the same service • What is the minimum information to enable • service discovery • service invocation

  25. Grading Services • Bronze – enough to locate the service. Example of service invocation • Silver • Gold • Platinum – full description. All properties annotated – including dependencies between them – reliability metrics – AND CHECKED AND VERIFIED BY A CURATOR

  26. Automatic Annotation • Inferring service descriptions from workflows • Gathering usage data • How many workflows use this service • Gathering reliability data - monitoring • When is this service available • How many times does it fail • Helps with “shopping” for services • People who used this service also used this service • Top 10 services • Services that do the same things

  27. Annotation Provenance • Who said what about what? • Harvesting community annotation • Verifying and augmenting by a curator • ‘Trust’ Models • Annotation versions • In a workflow context • As stand alone services

  28. Semantic Content Model Service Model Feta Model

  29. Quantitative Content Tags Ontologies Curation Model Semantic Content Model Service Model Provenance Functional Conditions of Use Social Standing Operational Metrics Operational Biocatalogue Service Profile

  30. Quant’ve Semantic Content Model Curation Service Model Execution Host Finding Service Profile Search A.N. Other WSDL Browse/Shop WADL Ranking Customised S-A.N. Other SAWSDL Analytics SA-REST Service Workflow

  31. Annotation Process

  32. BioCatalogue: The pilot Features: • User Registration • Service Registration • Search • Annotation • Notification • Integration with myExperiment

  33. For More Information • BioCatalogue website • http://www.biocatalogue.org/ • BioCatalogue wiki • http://www.biocatalogue.org/wiki • myGrid website • http://www.mygrid.org.uk/

  34. myGrid Team

  35. Policies Multiple Instances Discovery Conditions of Use Interoperability Functional Operational Multiple Versions Composition Interface Reuse Services Dynamic Neutral Operational Metrics Provenance Trusted Authorities Ranking Social Standing Monitoring Aggregated Feeds Multiply described Third Party Multiple Sources

More Related