Common Anatomy Reference Ontology Workshop What an Ontology is For Barry Smith University at Buffalo http://ontology.buffalo.edu/smith
we are accumulating huge amounts of data • how do we know what data we have ? • how do I know what data you have ? • how do we know what data we don’t have ? • how do we make different sorts of data combinable ?
where in the cell ? what kind of process ? what kind of biological end ? we need semantic annotation of data
how create broad-coverage semantic annotation systems for biomedicine? • Semantic Web, Moby, wikis, UMLS, etc. • let a million flowers (weeds) bloom • and create integration via post hoc mappings
for science an alternative • develop high quality annotation resources in a collaborative, community effort • create an evolutionary path towards improvement • on the basis of common prospective standards • based on science
for science • science works out from a validated core, and strives to isolate and resolve inconsistencies as it extends outwards • we need to create a validated core including ontologies corresponding to the basic biomedical sciences in this core • low hanging fruit
Organ Part Organ Subdivision Anatomical Space Anatomical Structure Organ Cavity Subdivision Organ Cavity Organ Organ Component Serous Sac Tissue Serous Sac Cavity Subdivision Serous Sac Cavity is_a Pleural Sac Pleura(Wall of Sac) Pleural Cavity part_of Parietal Pleura Visceral Pleura Interlobar recess Mediastinal Pleura Mesothelium of Pleura FMA Foundational Model of Anatomy
for science but we need more • where do we find scientifically validated information linking gene products and other entities represented in biochemical databases to semantically meaningful terms pertaining to disease, anatomy, development, histology in different model organisms?
The methodology of annotations • science base: trained experts curating peer-reviewed literature • create an evolving set of standardized descriptions used to annotate the entities represented in the major biochemical databases • and thereby to integrate these databases
this leads to improvements and extensions of the ontology • which in turn leads to better annotations • which leads to further improvement in the quality and reach of both future annotations and the ontology itself • RESULT: a slowly growing computer-interpretable map of biological reality within which major databases are automatically integrated in semantically searchable form
Five bangs for your GO buck • cross-species database integration • cross-granularity database integration • through links to the things which are of biomedical relevance • semantic searchability links people to software • human curated science base creates de facto gold standard (benchmark for comparison)
need to create a de jure standard: • improve the quality of the GO • establish common rules governing best practices for creating ontologies and for using these in annotations • apply these rules to create a complete suite of orthogonal interoperable biomedical reference ontologies but now
First step (2003) • a shared portal for (so far) 58 ontologies • (low regimentation) • http://obo.sourceforge.net
id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 Second step (2004)reform efforts initiated, e.g. linking GO to other OBO ontologies to ensure orthogonality GO + Cell type = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix. New Definition
Third step (2006) The OBO Foundryhttp://obofoundry.org/
The OBO Foundry • a family of interoperable gold standard biomedical reference ontologies to serve the annotation of inter alia • scientific literature • model organism databases • clinical trial data The OBO Foundryhttp://obofoundry.org/
A prospective standard designed to guarantee interoperability of ontologies from the very start (contrast to: post hoc mapping) 12 initial candidate OBO ontologies – focused primarily on basic science domains several being constructed ab initio by influential consortia who have the authority to impose their use on large parts of the relevant communities.
undergoing rigorous reform GO Gene Ontology ChEBI Chemical Ontology CL Cell Ontology FMA Foundational Model of Anatomy PaTO Phenotype Quality Ontology SO Sequence Ontology CARO Common Anatomy Reference Ontology CTO Clinical Trial Ontology FuGO Functional Genomics Investigation Ontology PrO Protein Ontology RnaO RNA Ontology RO Relation Ontology new The OBO Foundryhttp://obofoundry.org/
GO Gene Ontology ChEBI Chemical Ontology CL Cell Ontology FMA Foundational Model of Anatomy PaTO Phenotype Quality Ontology SO Sequence Ontology CARO Common Anatomy Reference Ontology CTO Clinical Trial Ontology FuGO Functional Genomics Investigation Ontology PrO Protein Ontology RnaO RNA Ontology RO Relation Ontology new to be absorbed in new Ontology of Biomedical Investigations (OBI) The OBO Foundryhttp://obofoundry.org/
all OBO Foundry developers have agreed to a common set of evolving principles reflecting best practice in ontology development designed to ensure • tight connection to the biomedical basic sciences • compatibility • interoperability, common relations • formal robustness • support for logic-based reasoning The OBO Foundryhttp://obofoundry.org/
The ontology is OPENand available to be used by all. • The ontology is in, or can be instantiated in, a COMMON FORMAL LANGUAGE. • The developers of the ontology agree in advance to COLLABORATE with developers of other OBO Foundry ontology where domains overlap. PRINCIPLES The OBO Foundryhttp://obofoundry.org/
UPDATE: The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement. • ORTHOGONALITY: They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary. PRINCIPLES The OBO Foundryhttp://obofoundry.org/
for science orthogonality of ontologies implies additivity of annotations • if we annotate a database or body of literature with one high-quality biomedical ontology, we should be able to add annotations from a second such ontology without conflicts • science aims for consistency • because science aims for correctness The OBO Foundryhttp://obofoundry.org/
PRINCIPLES • IDENTIFIERS: The ontology possesses a unique identifier space within OBO. • VERSIONING: The ontology provider has procedures for identifying distinct successive versions to ensure BACKWARDS COMPATIBITY with annotation resources already in common use • The ontology includes TEXTUAL DEFINITIONS and where possible equivalent formal definitions of its terms.
PRINCIPLES • CLEARLY BOUNDED: The ontology has a clearly specified and clearly delineated content. • DOCUMENTATION: The ontology is well-documented. • USERS: The ontology has a plurality of independent users. The OBO Foundryhttp://obofoundry.org/
PRINCIPLES • COMMON ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.* • * Smith et al., Genome Biology 2005, 6:R46 The OBO Foundryhttp://obofoundry.org/
OBO Relation Ontology The OBO Foundryhttp://obofoundry.org/
IT WILL GET HARDER • Further principles will be added over time in light of lessons learned • The Foundry is not seeking to serve as a check on flexibility or creativity BUT NOT EVERYONE NEEDS TO JOIN The OBO Foundryhttp://obofoundry.org/
GOALS • CREDIT for high quality ontology development work • KUDOS for early adopters of high quality ontologies / terminologies e.g. in reporting clinical trial results The OBO Foundryhttp://obofoundry.org/
GOALS • to introduce some of the features of SCIENTIFIC PEER REVIEW into biomedical ontology development • to providing a FRAMEWORK OF RULES to counteract the current policy of ad hoc creation • if data-schemas are formulated using a single ontology system in widespread use this supports DATA REUSABILITY The OBO Foundryhttp://obofoundry.org/
A dichotomy universals (types, kinds, classes) vs. instances (particulars, individuals)
An ontology is a representation of universals • We learn about universals by looking at scientific texts – which describe what is general in reality
substance organism animal cat instances siamese universals mammal leaf class frog
rule of single inheritance • no diamonds: • C • is_a2 • B • is_a1 • A
problems with multiple inheritance • B C • is_a1 is_a2 • A • ‘is_a’ no longer univocal
‘is_a’ is pressed into service to mean a variety of different things • shortfalls from single inheritance are often clues to incorrect entry of terms and relations • the resulting ambiguities make the rules for correct entry difficult to communicate to human curators
is_a overloading • serves as obstacle to integration with neighboring ontologies • The success of ontology alignment depends crucially on the degree to which basic ontological relations such as is_a and part_of can be relied on as having the same meanings in the different ontologies to be aligned.
What single inheritance costs • In some respects harder to build ontologies • harder to use ontologies to find terms • Solutions: normalization, GUIs • Recommendation: if building from scratch use single inheritance
What single inheritance brings • Coherent hierarchies • Modularity • Statistical representativeness • Jointly exhaustive pairwise disjoint classification • Coherent methodology for definitions
Aristotelian definitions • When A is_a B, the definition of ‘A’ has the form: • an A =def. a B which ... • a human being =def. an animal which is rational • Each definition reflects the position in the hierarchy to which a defined term belongs.
FMA Examples • Cell =def. ananatomical structure which consists ofcytoplasmsurrounded by a plasma membrane with or without a cell nucleus • Plasma membrane=def. acell part that surrounds the cytoplasm
The FMA is a canonical representation • of types and relations between types deduced from the qualitative observations of the normal human body, which have been refined and sanctioned by successive generations of anatomists and presented in textbooks and atlases of structural anatomy.