ontologies in biomedicine the good the bad and the ugly n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Ontologies in Biomedicine: The Good, The Bad and The Ugly PowerPoint Presentation
Download Presentation
Ontologies in Biomedicine: The Good, The Bad and The Ugly

Loading in 2 Seconds...

play fullscreen
1 / 42

Ontologies in Biomedicine: The Good, The Bad and The Ugly - PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on

Ontologies in Biomedicine: The Good, The Bad and The Ugly. Barry Smith http://ontology.buffalo.edu/smith. The Good. Foundational Model of Anatomy (FMA) Pro

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Ontologies in Biomedicine: The Good, The Bad and The Ugly' - seda


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ontologies in biomedicine the good the bad and the ugly

Ontologies in Biomedicine: The Good, The Bad and The Ugly

Barry Smith

http://ontology.buffalo.edu/smith

http://ncor.us

the good
The Good
  • Foundational Model of Anatomy (FMA)
  • Pro
  • Very clear statement of scope: structural human anatomy, at all levels of granularity, from the whole organism to the biological macromolecule
  • Powerful treatment of definitions, from which the entire FMA hierarchy is generated – can serve as basis for formal reasoning
  • Con
  • Some unfortunate artifacts in the ontology deriving from its specific computer representation (Protégé)

http://ncor.us

intermediate
Intermediate
  • GALEN
  • Pro
  • Allows formal representation of clinical information
  • Allows multiple views of relevant detail as needed
  • Uses powerful Description Logic (DL)-based formal structure
  • Con
  • Remains only partially developed
  • Contains errors: Vomitus contains carrot
  • – which DLs did not prevent

http://ncor.us

intermediate1
Intermediate
  • The Gene Ontology
  • Con
  • Poor formal architecture
  • Full of errors
  • menopause part_of death
  • Poor support for automatic reasoning and error-checking
  • Poor treatment of definitions
  • Not trans-granular
  • No relation to time or instances

http://ncor.us

the gene ontology
The Gene Ontology
  • Pro
  • Open Source
  • Cross-Species
  • ... has recognized the need for reform, including explicit representation of granular levels

http://ncor.us

problem of circularity
Problem of Circularity
  • GO:0042270:
  • Protection from natural killer cell mediated cytolysis
  • Definition: The process of protecting a cell from cytolysis by natural killer cells.

http://ncor.us

go 0019836 hemolysis
GO:0019836 hemolysis
  • Definition: The processes that cause hemolysis
  • X =def. the Y of X
  • this is worse than circular

http://ncor.us

the bad
The Bad
  • Reactome
  • Pro
  • Rich catalogue of biological process
  • Con
  • Incoherent treatment of categories:
  • ReferentEntity (embracing e.g. small molecules) is a sibling of PhysicalEntity (embracing complexes, molecules, ions and particles).
  • Similarly CatalystActivity is a sibling of Event.

http://ncor.us

the bad1
The Bad
  • National Cancer Institute Thesaurus
  • Pro
  • Open source; ambitiously broad coverage; DL-based
  • Con
  • Poor realization of DL formalism
  • Full of mistakes (many inherited from its UMLS sources):
    • threedisjoint classes of plants: Vascular Plant, Non-vascular Plant, Other Plant
    • threedisjoint kinds of cells: Cell, Normal Cell, Abnormal Cell
    • Normal Cellis_a Microanatomy

See http://ontology.buffalo.edu/medo/NCIT_Smith.html

http://ncor.us

national cancer institute thesaurus
National Cancer Institute Thesaurus
  • Duratec, Lactobutyrin and Stilbene Aldehydeclassified as: Unclassified Drugs and Chemicals
  • Pro
  • NCIT, too, has recognized the need for reform
  • (NCIT is part of the OBO library)

http://ncor.us

the ugly umls semantic network
The UglyUMLS Semantic Network
  • Pros
  • Broad coverage; no multiple inheritance
  • Cons
  • Incoherent use of ‘conceptual entities’
  • (e.g. the digestive system as a conceptual part of the organism)
  • Full of errors

http://ncor.us

umls semantic network
UMLS Semantic Network
  • Edges in the graph represent merely “possible significant relations”:
    • Bacterium causes Experimental Model of Disease
    • Experimental Model of Disease affects Fungus
    • Experimental model of diseaseis_a Pathologic Function

http://ncor.us

umls semantic network1
UMLS Semantic Network
  • Unclear what the nodes of the graph are:
  • Drug Delivery Device contains Clinical Drug
  • Drug Delivery Device narrower_in_meaning_than Manufactured Object
  • The use-mention confusion:
  • “Swimming is healthy and has 8 letters”

http://ncor.us

the ugly clinical terms version 2 the read codes
The UglyClinical Terms Version 2 (The Read Codes)
  • Classifies chemicals into:
  • chemicals whose name begins with ‘A’,
  • chemicals whose name begins with ‘B’,
  • chemicals whose name begins with ‘C’, ...

http://ncor.us

the astonishingly criminally ugly
The Astonishingly (Criminally?) Ugly
  • Health Level 7
  • HL7 is a UML-based standard for exchange of information between clinical information systems
  • has proved very crumbly as a standard
  • The HL7 Reference Information Model (RIM) is supposed to overcome this problem by defining the universe of healthcare data in a rigorous way

http://ncor.us

hl7 rim
HL7-RIM
  • Animal
  • Definition: A subtype of Living Subject representing any animal-of-interest to the Personnel Management domain.
  • Person
  • A subtype of Living Subject representing single human being [sic] who, in the context of the Personnel Management domain, must also be uniquely identifiable through one or more legal documents.
  • LivingSubject
  • Definition: A subtype of Entity representing an organism or complex animal, alive or not.

http://ncor.us

hl7 rim the problem of circularity
HL7 RIM: The Problem of Circularity
  • Person = Person with documents
  • has the form: ‘An A is an A which is B’
  • – useless in practical terms since neither we nor the machine can use them to find out what ‘A’ means
  • – incorporate a vicious infinite regress
  • – have the effect of making it impossible to refer to A’s which are not Bs, for example to an undocumented person

http://ncor.us

hl7 logically incoherent
HL7 Logically Incoherent
  • act = the record of an act
  • This has the form: An X is the Y of an X
  • again worse than circular

http://ncor.us

hl7 rim logically contradictory definitions
HL7-RIM: Logically Contradictory Definitions
  • Definition of Act: An Act is an action of interest that has happened, can happen, is happening, is intended to happen, or is requested/demanded to happen.
  • Definition of Act: An Act is the record of something that is being done, has been done, can be done, or is intended or requested to be done.

http://ncor.us

hl7 rim ontologically incoherent
HL7 RIM Ontologically Incoherent
  • The truth about the real world is constructed through a combination and arbitration of attributed statements ...
  • As such, there is no distinction between an activity and its documentation.

http://ncor.us

hl7 incredibly successful
HL7 Incredibly Successful
  • embraced as US federal standard;
  • central part of $15 billion program to integrate all UK hospital information systems
  • made mandatory by Canada Health Infoway
  • adopted by Oracle as basis for its EHR support programs

http://ncor.us

hl7 merchandizing
HL7 Merchandizing

http://ncor.us

from molecules to diseases
From molecules to diseases
  • A good ontology should enable us to organize our information resources in such a way that we can bridge the granularity gap between genomics and proteomics data and phenotype (clinical, pharmacological, patient-centered) data

http://ncor.us

good ontologies require
good ontologies require:

Coherent upper level taxonomy distinguishing

  • continuants (cells, molecules, organisms ...)
  • occurrents (events, processes)
  • dependent entities (qualities, functions ...)
  • independent entities (their bearers)
  • universals (types, kinds)
  • instances (tokens, instances)

Coherent relation ontology supporting inference both within and between ontologies.

http://ncor.us

good ontologies require1
good ontologies require:

Consistent use of terms, supported by logically coherent (non-circular) definitions, in both human-readable and computable formats

http://ncor.us

open biomedical ontologies obo upper biomedical ontology ubo
Open Biomedical Ontologies (OBO) Upper Biomedical Ontology (UBO)
  • root UBO:0000001:top
  • subclass BFO:continuant:continuant
    • subclass BFO:dependent_entity:dependent_entity
      • subclass UBO:0000023:quality
        • subclass UBO:0000026:phenotype
          • subclass UBO:0000025:state
        • subclass UBO:0000027:disease
          • subclass UBO:0000005:function
        • subclass GO:0003674:molecular_function
      • subclass BFO:disposition:disposition
    • subclass BFO:independent_entity:independent_entity
      • subclass UBO:0000002:substance
        • subclass UBO:0000019:protein
        • subclass GO:0005575:cellular_component
        • subclass UBO:0000006:anatomical_entity
          • subclass UBO:0000008:gross_anatomical_entity
        • subclass UBO:0000007:organism
          • subclass UBO:0000015:microbe
          • subclass UBO:0000014:plant
          • subclass UBO:0000017:animal
      • subclass BFO:fiat_part_of_substance:fiat_part_of_substance
      • subclass BFO:boundary_of_substance:boundary_of_substance
      • subclass BFO:aggregate_of_substances:aggregate_of_substances
  • subclass BFO:occurrent:occurrent
    • subclass BFO:dependent_occurrent:dependent_occurrent
      • subclass UBO:0000004:process
        • subclass GO:0008150:biological_process
      • subclass BFO:fiat_part_of_process:fiat_part_of_process
        • subclass UBO:0000029:life_cycle_stage
      • subclass BFO:aggregate_of_processes:aggregate_of_processes
        • subclass EO:0007359:environment ontology
      • subclass BFO:temporal_boundary_of_process:temporal_boundary_of_process
    • subclass BFO:independent_occurrent:independent_occurrent

http://ncor.us

obo relation ontology ro
OBO Relation Ontology (RO)
  • Clear distinction between universals (classes, kinds, types and instances (individuals, tokens
  • Precise formal definitions of relations
  • Automatic applicability to time-indexed instance-data e.g. in Electronic Health Record
  • Consistency with the Relation Ontology now a criterion for admission to the OBO ontology library
  • see Genome Biology Apr. 2006

http://ncor.us

three types of relations
Three types of relations
  • between instances:
  • Mary’s heart part_of Mary
  • between an instance and a universal:
  • Mary instance_of homo sapiens
  • between universals:
  • gastrulation part_of embryonic development

http://ncor.us

a suite of primitive instance level relations
A suite of primitive instance-level relations
  • identical_to
  • part_of
  • located_in
  • adjacent_to
  • earlier
  • derives_from
  • ...

http://ncor.us

galen vomitus contains carrot
GALEN: Vomitus contains carrot
  • All portions of vomit contain all portions of carrot
  • All portions of vomit contain some portion of carrot
  • Some portions of vomit contain some portion of carrot
  • Some portions of vomit contain all portions of carrot

http://ncor.us

slide32
all-some structure
  • A part_of B =def. given any instance a of A there is some instance b of B such that a part_of b on the instance level
  • Allows automatic ontology integration via cascading reasoning:
  • A R1 B
  • B R2 C
  •  A R3 C

http://ncor.us

adjacent to
adjacent_to
  • cell wall adjacent_to cytoplasm
  • intron adjacent_to exon
  • Golgi apparatus adjacent_to endoplasmic
  • reticulum
  • periplasm adjacent_to plasma membrane
  • presynaptic membrane adjacent_to synaptic cleft

http://ncor.us

a adjacent to b
A adjacent_to B
  • every instance of A stands in the instance-level adjacent_to relation to some instance of B

http://ncor.us

adjacent to as a relation between universals is not symmetric
adjacent_to as a relation between universals is not symmetric
  • nucleus adjacent_to cytoplasm
  • Not: cytoplasm adjacent_to nucleus
  • seminal vesicle adjacent_to urinary bladder
  • Not: urinary bladderadjacent_to seminal vesicle

http://ncor.us

the granularity gulf
The Granularity Gulf
  • most existing data-sources are of fixed, single granularity
  • many (all?) clinical phenomena cross granularities

http://ncor.us

main obstacle to integrating genetic and ehr data
Main obstacle to integrating genetic and EHR data

No facility for dealing with time and instances (particulars, individuals) in current ontologies

http://ncor.us

key idea
Key idea
  • To define ontological relations like
  • part_of, develops_from
  • it is not enough to look just at universals / classes / types / ‘concepts’ :
  • we need also to take account of instances and time

http://ncor.us

transformation of
transformation_of
  • A transformation_of B
  • =def. any instance of A was at some earlier time an instance of B

http://ncor.us

transformation of1

same instance

C1

C

c att

c att1

time

transformation_of

mature RNA transformation_of pre-RNA

adult transformation_of child

carcinomatous colon transformation_of colon

http://ncor.us

advantages of the methodology of enforcing commonly accepted coherent definitions
Advantages of the methodology of enforcing commonly accepted coherent definitions
  • promote quality assurance (better coding)
  • guarantee automatic reasoning across ontologies and across data at different granularities
  • yields direct connection to times and instances in the EHR

http://ncor.us