Biomedical Ontologies: The State of the Art Barry Smith and Werner Ceusters MIE, Sarajevo, August 30
Part 1: Barry Smith Ontologies are Representations of What is General in Reality Part 2: Werner Ceusters Referent Tracking: Pinning Ontologies to Instances in Reality
You’re interested in which genes control heart muscle development 17,536 results
time Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Puparial adhesion Molting cycle hemocyanin Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Immune response Immune response Toll regulated genes control attacked Microarray data shows changed expression of thousands of genes. How will you spot the patterns?
You’re interested in which of your hospital’s patient data is relevant to understanding how genes control heart muscle development
Lab / pathology data EHR data Clinical trial data Family history data Medical imaging Microarray data Model organism data Flow cytometry Mass spec Genotype / SNP data How will you spot the patterns? How will you find the data you need?
One strategy for bringing order into this huge conglomeration of data is through the use of Common Data Elements • Discipline-specific (cancer, NIAID, …) • Do not solve the problems of balkanization (data siloes) • Do not evolve gracefully as knowledge advances • Support data cumulation, but do not readily support data integration and computation
An ontology is not a terminology Existing term lists and CDEs • built to serve specific data-processing • in ad hoc ways Ontologies • designed from the start to ensure integratability and reusability of data • by incorporating a common logical structure
How does the Gene Ontology work? with thanks to Jane Lomax, Gene Ontology Consortium
GO provides a controlled system of representations for use in annotating data • multi-species, multi-disciplinary, open source • contributing to the cumulativity of scientific results obtained by distinct research communities • compare use of kilograms, meters, seconds … in formulating experimental results
GO provides answers to three types of questions for each gene product • in what parts of the cell has it been identified? • exercising what types of molecular functions? • with what types of biological processes? when is a particular gene product involved • in the course of normal development? • in the process leading to abnormality with what functions is the gene product associated in other biological processes?
Some pain-related terms in GO GO:0048265 response to pain GO:0019233 sensory perception of pain GO:0048266 behavioral response to pain GO:0019234 sensory perception of fast pain GO:0019235 sensory perception of slow pain GO:0051930 regulation of sensory perception of pain GO:0050967 detection of electrical stimulus during sensory perception of pain GO:0050968 detection of chemical stimulus involved in sensory perception of pain GO:0050966 detection of mechanical stimulus involved in sensory perception of pain
GO:0050968 detection of chemical stimulus involved in sensory perception of pain
GO allows a new kind of biological research, based on analysis and comparison of the massive quantities of annotations linking GO terms to gene products
One standard method Sjöblöm T, et al. analyzed13,023 genes in 11 breast and 11 colorectal cancers using functional information captured by GO for given gene product types identified 189 as being mutated at significant frequency and thus as providing targets for diagnostic and therapeutic intervention. Science. 2006 Oct 13;314(5797):268-74.
Uses of GO in studies of: • Biomedical discovery acceleration, with applications to craniofacial development. PMID: 19325874 • Persistent changes in spinal cord gene expression after recovery from inflammatory hyperalgesia: a preliminary study on pain memory. PMID: 18366630 • Spinal cord transcriptional profile analysis reveals protein trafficking and RNA processing as prominent processes regulated by tactile allodynia. PMID: 17069981 • Immune system involvement in abdominal aortic aneurisms (PMID 17634102)
$100 mill. invested in literature curation using GO over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO experimental results reported in 52,000 scientific journal articles manually annoted by expert biologists using GO ontologies provide the basis for capturing biological theories in computable form
GO is amazingly successful in overcoming problems of balkanization but it covers only generic biological entities of three sorts: • cellular components • molecular functions • biological processes and it does not provide representations of diseases, symptoms, …
OBO Foundry recognized by NIH as framework to address mandates for re-usability of data collected through Federally funded research see NIH PAR-07-425: Data Ontologies for Biomedical Research (R01)
The OBO Foundry Initial Candidate Members • GO Gene Ontology • CL Cell Ontology • SO Sequence Ontology • ChEBI Chemical Ontology • PATO Phenotype (Quality) Ontology • FMA Foundational Model of Anatomy • ChEBI Chemical Entities of Biological Interest • CARO Common Anatomy Reference Ontology • PRO Protein Ontology
The OBO Foundry Under development • Disease Ontology • Infectious Disease Ontology • Mammalian Phenotype Ontology • Plant Trait Ontology • Environment Ontology • Ontology for Biomedical Investigations • Behavior Ontology • RNA Ontology • RO Relation Ontology
OBO Foundry is organized in terms of Basic Formal Ontology Each Foundry ontology can be seen as an extension of a single upper level ontology (BFO)
Basic Formal Ontology (BFO) Continuant Occurrent (Process, Event) Independent Continuant Dependent Continuant http://ifomis.uni-saarland.de/bfo/
Fundamental Dichotomy • Continuants preserve their identity through change vs. • Occurrents (aka processes) • have temporal parts • unfold themselves in successive phases • exist only in their phases • have all their parts of necessity
Ontology and Referent Tracking types Continuant Occurrent process, event Independent Continuant thing Dependent Continuant quality .... ..... ....... instances
RELATION TO TIME GRANULARITY rationale of OBO Foundry coverage (homesteading principle)
The Gene Ontology (GO) Continuant Occurrent biological process Independent Continuant Dependent Continuant cell component molecular function Kumar A., Smith B, Borgelt C. Dependence relationships between Gene Ontology terms based on TIGR gene product annotations. CompuTerm 2004, 31-38. Bada M, Hunter L. Enrichment of OBO Ontologies. J Biomed Inform. 2006 Jul 26
Users of BFO GO / OBO Foundry NCI BiomedGT SNOMED CT ACGT Clinical Genomics Trials on Cancer – Master Ontology / Formbuilder (Case Report Forms for Cancer Clinical Trials) Ontology for Risks Against Patient Safety (RAPS) (EU)
Users of BFO MediCognos / Microsoft Healthvault Cleveland Clinic Semantic Database in Cardiothoracic Surgery Major Histocompatibility Complex (MHC) Ontology (NIAID) Neuroscience Information Framework Standard (NIFSTD)
IDO Infectious Disease Ontology • MITRE, Mount Sinai, UTSouthwestern – Influenza • IMBB/VectorBase – Vector borne diseases (A. gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus) • Colorado State University – Dengue Fever • Duke University – Tuberculosis, Staph. aureus • Case Western Reserve – Infective Endocarditis • University of Michigan – Brucilosis
Users of BFO Interdisciplinary Prostate Ontology (IPO) Nanoparticle Ontology (NPO): Ontology for Cancer Nanotechnology Research Neural Electromagnetic Ontologies (NEMO): Ontology-based Tools for Representation and Integration of Event-related Brain Potentials Ontology for General Medical Science
depends_on Continuant Occurrent process, event Independent Continuant thing Dependent Continuant quality quality depends on bearer .... ..... .......
Specifically dependent continuants • the quality of whiteness of this cheese • your role as lecturer • the disposition of this patient to experience diarrhea
depends_on Continuant Occurrent process Independent Continuant thing Dependent Continuant quality temperature depends on bearer .... ..... .......
Realizable dependent continuants continuants plan function role disposition capability tendency
Their realizations execution expression exercise realization application course occurrents
Continuant Independent Continuant Dependent Continuant Non-realizable Dependent Continuant (quality) Realizable Dependent Continuant (function, role, disposition) ..... .....
realization depends_on disposition Continuant Occurrent Independent Continuant bearer Dependent Continuant disposition Process of realization .... ..... .......