410 likes | 533 Views
This document provides a comprehensive overview of the Ontology-Based Data (OBD) model, focusing on its role in streamlining biomedical annotations. It details the annotation lifecycle, current architectural frameworks, and future prospects for OBD. By highlighting the importance of a unified approach to data integration, it discusses how OBD facilitates comparisons across diverse sources through ontologies. The text outlines the flexible model requirements for OBD, its graph-based structure, and the capacity to represent complex biomedical relationships, essential for effective knowledge curation in biomedicine.
E N D
OBD : technical overview Chris Mungall
Outline • The annotation lifecycle • OBD Model and modeling requirements • Current OBD architecture • Roadmap
The need for OBD • The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data • Current annotations using ontologies are fragmented across multiple databases, multiple schemas • OBD provides a common means of accessing and querying across these annotations
OBD - What is it? • General purpose biomedical knowledgebase • Repository of biomedical annotations • Ontology-based queries and analysis • Annotations from multiple sources can be compared through use of ontologies and ontology mappings • Current primary use • Genotype-phenotype associations for DBPs • Future uses • Annotation of information entities • Documents, datasets, records, images • Annotation of any biomedical entity using bio-ontologies
Dev Biol 2005 Jul 15;283(2):357-72 “Sonic hedgehog is required for cardiac outflow tract and neural crest cell development” The annotation lifecycle Lab db Absence of aorta investigator read observation publish/ create Information entity bio-entity Experiment/ investigation communicate X Direct annotation query/ meta-analysis Agent+tools (human/computer) Community/expert Shh- Absence Of aorta annotation Shh bio-entity Shh+ Heart development Computational representation
What is an annotation? • OBD has a very inclusive definition of annotation • An attributed statement positing some relation(s) between entities • Typically accompanied by associations to evidence-oriented entities and metadata • Examples: • Shh participates_in heart development • p53 implicated_in cancer • p53 has_function DNA repair • PMID:1234 mentions melanoma • http://… depicts (lesion that located_in CA4) • Abc[-] influences blood pressure • Trial3456 has_inclusion_criteria (age that < 65) Shh+ Heart development Participates in
represents Dev Biol 2005 Jul 15;283(2):357-72 “Sonic hedgehog is required for cardiac outflow tract and neural crest cell development” OBD and annotations subj relation obj annotation Absence of aorta investigator read observation publish/ create Information entity bio-entity Experiment/ investigation communicate X Direct annotation query/ meta-analysis Agent (human/computer) Community/expert annotation Shh- Absence Of aorta local db influences local db local db submit/ consume Shh bio-entity Shh+ Heart development Participates in Multiple schemas Computational representation
Flexibility of OBD • Most ontology-based bio-curation focuses on stating associations between bio-entities and types as represented in ontologies • Where bio-entities can be types or instances • Genes, proteins, genotypes, cells, organisms, strains • OBD can also accommodate ‘tagging’ annotations • E.g. Ontrez, term extraction from literature • Associations between information entities and ontology terms • E.g. documents, document parts, datasets, images
representation Dev Biol 2005 Jul 15;283(2):357-72 “Sonic hedgehog is required for cardiac outflow tract and neural crest cell development” Ontrez in OBD subj relation obj annotation Absence of aorta investigator PMID:1234 read observation publish/ create Information entity bio-entity Experiment/ investigation communicate X Direct annotation query/ meta-analysis Agent (human/computer) Community/expert annotation Cardiac outflow tract PMID:1234 abstract local db describes local db local db Shh bio-entity Shh describes PMID:1234 abstract Multiple schemas Computational representation
OBD model: Requirements • Generic • We can’t define a rigid schema for all of biomedicine • Let the ontology do the modeling • Expressive • Use cases vary from simple ‘tagging’ to complex descriptions of biological phenomena • Formal semantics • Amenable to logical reasoning • Standards-compatible • Integration with semantic web • OWL-1.1
OBD Model: overview • Graph-based: nodes and links • Nodes: Classes, instances, relations • Links: Relation instances • Annotations: Posited links with attribution / evidence • Equivalent expressivity as RDF and OWL • Links aka axioms and facts in OWL • Attributed links: • Named graphs • Reification • N-ary relation pattern • Supports construction of complex descriptions through graph model
Constructing descriptions • The ability to compose descriptions is a key requirement for biomedical annotation • Logical expressions built using multiple classes • Post-composed at annotation time • Example (in owl manchester syntax*): • GODendrite_spine thatpart_of CLGolgi_cell • Genus-differentia description • Can be nested: • PATODecreased_length thatinheres_in (GODendrite_spine thatpart_ofCLGolgi_cell) • Representing and reasoning over these is a key OBD requirement * Existential quantifier omitted
Reasoning over descriptions • Query requirement • Queries for annotations to “CNS neuron cell projection” • Should return: • Annotations to: GODendrite_spine thatpart_ofCLGolgi_cell • Computational Requirements • Entailments • EL++ or greater • OWL constructs • intersectionOf • equivalentClass • Representing Phenotypes in OWL (OWLED 2007)
Example of Annotation in OBD Post-composition of complex anatomical entity descriptions Post-composition of phenotype classes (PATO EQ formalism) key
OBD Architecture • Two stacks • Semantic web stack • First iteration • Built using Sesame triplestore + OWLIM • Limited developer resources • Future iterations: Science-commons Virtuoso • OBD-SQL stack • Current focus • Traditional enterprise architecture • Plugs into Semantic Web stack via D2RQ
OBD Architecture: Two stacks
Alpha version of API implemented Test clients access via SOAP Phenote current accesses via org.obo model & JDBC Wraps org.obo model and OBD schema Share relational abstraction layer Org.obo wraps OWLAPI Phenote currently connects via JDBC connectivity in org.obo OBD-SQL Stack
OBDAPI illustrative examples • node = getNodeById(“OMIM:601653”) • nodes = getNodesBySearch(“p53*”) • nodes = getNodesBySource(“OMIM”) • nodes = getNodesByQuery(queryExpr) • graph = getAnnotationGraphAroundNode(“PATO:0001050”, true) • statements = getAnnotationStatementsForAnnotatedEntity(“Entrez:2138”)
Phenote as an OBD client Currently Implemented
Genome browser mashup Sensory neuron Vulva Uterine muscle locomotion oviposition Under Development (Holmes lab)
OBD Mediator Architecture • OBDAPI can act as client to other OBDAPIs • Mediator node distributes queries to source nodes
OBD-SQL Database • Generic minimal table model • Makes heavy use of views for core capabilities • E.g. analyzing information content of classes based on annotation • Views can be materialized for speed • Deductive closure of classes (named and class expressions) pre-computed • Not a blind transitive closure • OWL semantics (EL++) http://www.bioontology.org/wiki/index.php/OBD:OBD-SQL-Schema
Analysis requirements • The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data • OBD must have capabilities for using to ontologies to query and analyze data effectively
Inter-ontology reasoning
Annotation comparison Within species Across species Translational research Visualisation and display of annotations OBD web-based interface prototype
OBD API in BioPortal: two choices • Choice A: Two separate APIs • Ontology API • Annotation API • Choice B: Unified API • Use same API for search, implementing same behaviour • Same query model
Requirements for unified API • Expressive model • Logical entailment for both named classes and class expressions • Expressive queries • Compatible with OWL • Easy to express common queries
Open Questions • Classes vs instances
API Extensions • Data mining support • Complex queries
Current OBD Nodes http://www.bioontology.org/wiki/index.php/OBD:Querying
Distribution • Distribution is optional • Not required for supporting current DBPs • OBD Nodes should be easy to set up • Lightweight DBMS • Query mediator • Integrates queries across multiple resources • Caches nodes in links in local node • Registry
Similar Systems • BIRN System • RDF DB (IODT) • Semantic Mediator • FreeBase/MetaWeb etc • RDF Based? • Wiki model • Currently centralized • Referent Tracking Model • Formal basis
Timeline • Current focus • OBD API • Formal representation of genotypes • Clinical Trials • Post May meeting • Distributed querying • Post BIRN meeting
Representing the world in OBD • Requires formal mappings from other models into OBD constructs • What kind of entities are being represented? • How are they related? • Example: • Qualities and their bearers • See: Representing Phenotypes in OWL (OWLED 2007)
OBD Requirements • Model: • Generic, cross-domain • Formal semantics • Supports complex annotation • Queries: • Deductive capabilities • Data mining capabilities • Efficient • Distributable • Standards-compatible