1 / 37

e -science is…

e -science is…. Legos. “Science is built up of facts, as a house is built of stones; but an accumulation of facts is no more a science than a heap of stones is a house.”  – Henri Poincaré, Science and Hypothesis, 1905 http://adaptivedisclosure.org.

jenn
Download Presentation

e -science is…

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. e-science is… BioAID

  2. Legos “Science is built up of facts, as a house is built of stones; but an accumulation of facts is no more a science than a heap of stones is a house.”  – Henri Poincaré, Science and Hypothesis, 1905 http://adaptivedisclosure.org

  3. Who will annotate the annotators themselves? facilitating resource management with (semantic) web services M. Scott Marshall

  4. Examples on web • Example of less accessible: WSDL list for AIDA serviceshttp://ws.adaptivedisclosure.org/ (these services “annotate”) • Human-readable service info: http://xml.ddbj.nig.ac.jp/wsdl/index.jsp • But not machine-readable..

  5. Outline • Vision – an e-science virtual laboratory • Some definitions • Some requirements • Essential concepts of semantic web • facets for interfaces • Conclusions

  6. The Vision: Scientist as knowledge worker • For Knowledge Workers: • Knowledge is the data (i.e. rules, relations, properties, hypotheses, etc.) • For Today's Biologist: • Numbers, sequences, organisms(!), and images are the data • Manipulate knowledge instead of data • Find support for relations between concepts instead of discovering table and column names and numbers. • In the virtual laboratory, everything is a resource that can be described and manipulated with semantics

  7. User • ....? • End users – scientists using our applications • API users – programmers extending and using our code • System administrators – setting up services, grids etc. • Other classes... • If you’re not sure which one someone means please shout and ask them! Slide courtesy of Tom Oinn, OMII-EBI Workshop

  8. Service Oriented Architecture (SOA) • A way of doing computing where services are somehow combined to perform some overall function • Implies a communication framework between the services • Used because it’s easier to reconfigure the arrangement of a set of services than to rewrite a script • Services as LEGO bricks  Slide courtesy of Tom Oinn, OMII-EBI Workshop

  9. Grid • Not just Globus, or EGEE, or Naregi... • No such thing as ‘the grid’ • Unlike ‘the internet’ which does exist! • We mean : • A computational facility, normally comprising multiple computers, which provides some combination of compute and data storage capacity and which can abstract over its inner workings in some fashion • Very loose definition! • Can be part of a Service Oriented Architecture Slide courtesy of Tom Oinn, OMII-EBI Workshop

  10. Knowledge “data”, “information”, “facts”, “knowledge” Knowledge is a statement that can be tested for truth. (by a machine)

  11. RDF : a web format for knowledge RDF is a W3C language to express statements. RDF Triple: Subject Predicate Object Graph of Knowledge: Node Edge Node

  12. OWL : The Web Ontology Language A W3C standard for ontology representation based on description logic.

  13. Resources are shared on the web • Shared: • CPU time • network bandwidth • memory • storage space • But also: • Data • Knowledge • Services

  14. Computational experiment: what we want to do with the resources Database Database Computational experiment in workflow environment ... Database

  15. What are the tasks? • Search – discovering resources that match our needs • Workflow composition • Data integration • Enactment/Deployment • Access control • Registry of a resource

  16. Issues raised by computational experimentation • How will we find relevant data? • How will we automatically integrate such data into our experiment? • How will we find apropriate services? • How will we integrate our results as usable data for a new (computational) experiment? • -> annotation

  17. Finding the stone… Where is the piece thatis red, has a triangular top, and was previously used to build a roof? BioAID

  18. Computational Experiments Anticipated needs of the data consumer • Data integration - combining different types of data • Data annotation: beyond formats • Not only: • Data types (integer, string, etc.) • But also: • Data semantics: What do the data represent? • Determined by the experimental design • Provenance: What has been done to the data? • Description of the procedure(s) that produced/transformed the data • Discover and enact appropriate (web) services with appropriate data • Reuse results from a computational experiment as data in another computational experiment • derived data is “tagged” and put into the repository

  19. Anticipated needs of the data supplier (and consumer) • Data in: • Simple submission/registration of data to e-science repository • Semi-automatic annotation • Data out: • Easy search and retrieval of previous datasets (my personal and my group’s data) • Easy search and retrieval of relevant datasets from public repository • Combining data: • Different types and different sources • Example: Intersecting views of data • data mapped to physical or semantic space (Examples follow..)

  20. The Semantic Gap Application Middleware Resources User

  21. The Model in the middle My Model Model Model Application Middleware Resources User

  22. Why semantic annotation? We want annotation to be “machine-readable”: • Free text – arbitrary text tags generated by users won’t always match up • Simplest problem: Finding a “named” object • Hyponyms - Different names exist for the same object in different contexts and roles. • Synonyms - The same name is used for different objects. • Which name should I use? • Standardized vocabulary list • can only find literal matches • Example: Using data types to search for services will find too many! • Semantic tags • allow searching for similar items: • “Find items like this one.” • allow searching with a description: • “Find items with these properties.” • semantic description of service (SA-WSDL) as well as data (OWL)

  23. What is an ontology? Definitions: • A collection of things that are defined in terms of their properties and relations to other things. • A specification of a conceptualization that is designed for reuse across multiple applications and implementations (Gruber ’93, ‘95, Guarino’ ‘96, Guarino and Giaretta ‘95) General applications: • Searching for objects that are resources, documents, concepts, experimental data, or collections of these things. • Knowledge capture • Example: Biological model with hypothetical knowledge Common applications in bioinformatics: • Annotation of database entries (e.g. gene products) • Categorization of clustered elements (e.g. genes)

  24. Inheritance in ontologies Animal • Often represented as DAG’s (Directed Acyclic Graphs) or hierarchies (trees) • Power of inheritance • Subsumption relations (ISA) apply transitivity to create inheritance of class and properties downward along chains in the hierarchy. • Use an element as a metadata tag for semantic annotation (ontotag) • An ontotag serves as a pointer into a “semantic space” Bird Mammal Robin Heron Penguin

  25. Gene Ontology Mouse p53: {List of GO identifiers} Process: apoptosis, DNA damage response, signal transduction by p53 class mediator... Component: cytoplasm, cytosol... Function: DNA binding, protein binding... • Cluster of genes X from micro array analysis • Collection of {List of GO identifiers} per gene in cluster • Most prevalent GO identifiers: • Apoptosis, Cytosol, Protein Binding • Significant relationships between GO classes (e.g. cell death and DNA damage response)

  26. Semantic annotation - ontotags Evidence Ontology Provenance Author Gene Ontology Metadata

  27. Resource mngmt use case: data integrationFinding a basis for relation Hypothesis Epigenetic Mechanisms Transcription “There is a relation” Chromatin Transcription Factors Histone Modification Transcription Factor Binding Sites Classes Instances Common Domain position KSinBIT’06

  28. Scenario: A Use Case is born • E-scientist explains benefits of semantic web to (wet lab) biologist • Biologist wants to see a demonstration with actual data • => Use Case: Find evidence of a relation between transcription and histone modifications • Our approach: Annotate data with our own semantic types so that we can issue a query using our own terms KSinBIT’06

  29. Computer readable model Biologist readable model E-science perspective on data integration:From cartoon to model to semantic data integration Biological concepts (‘myModel’) Data KSinBIT’06

  30. Some of the pieces we need • knowledge representation – triples • pointing at things: EPR's and URI's, not just the things but the statements about the things • unification and reasoning • annotation: linking knowledge to resources

  31. Provenance – example in Taverna

  32. Computational experiment Database Database Some provenance should be added by the module/service itself ... Database

  33. The AIDA toolbox for knowledge extraction and knowledge managementreusable components to enhance science BioAID

  34. Living examples:dynamic interfaces • http://aida.science.uva.nl:9999/search/AID • Yahoo Pipes interface to AIDA medline search: http://pipes.yahoo.com/pipes/pipe.info?_id=cv7nIBpw3BGw4NOLJphxuA • MeSH facet interface from Exhibit: http://aida.science.uva.nl:9999/search/json_test.html • W3C Health Care and Life Sciences KB (unofficial URL): http://www.w3.org/2001/sw/hcls/notes/kb/http://esw.w3.org/topic/HCLS/Banff2007Demo

  35. Conclusions • The Web is a collection of resources: resource sharing • Disclosure of semantic models can greatly enhance resource sharing and resource management • Semantic annotation can be applied to any type of resource: data and (web)services. • Semantic annotation and provenance can be added by the (web)services themselves. • Need text mining for web services (to support semantic annotation) • Need web services for text mining

  36. The End “Science is built up of facts, as a house is built of stones; but an accumulation of facts is no more a science than a heap of stones is a house.”  – Henri Poincaré, Science and Hypothesis, 1905 http://adaptivedisclosure.org

More Related