1 / 42

Ontologies in Biomedicine: The Good, The Bad and The Ugly

Ontologies in Biomedicine: The Good, The Bad and The Ugly. Barry Smith http://ontology.buffalo.edu/smith. The Good. Foundational Model of Anatomy (FMA) Pro

seda
Download Presentation

Ontologies in Biomedicine: The Good, The Bad and The Ugly

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontologies in Biomedicine: The Good, The Bad and The Ugly Barry Smith http://ontology.buffalo.edu/smith http://ncor.us

  2. The Good • Foundational Model of Anatomy (FMA) • Pro • Very clear statement of scope: structural human anatomy, at all levels of granularity, from the whole organism to the biological macromolecule • Powerful treatment of definitions, from which the entire FMA hierarchy is generated – can serve as basis for formal reasoning • Con • Some unfortunate artifacts in the ontology deriving from its specific computer representation (Protégé) http://ncor.us

  3. Intermediate • GALEN • Pro • Allows formal representation of clinical information • Allows multiple views of relevant detail as needed • Uses powerful Description Logic (DL)-based formal structure • Con • Remains only partially developed • Contains errors: Vomitus contains carrot • – which DLs did not prevent http://ncor.us

  4. Intermediate • The Gene Ontology • Con • Poor formal architecture • Full of errors • menopause part_of death • Poor support for automatic reasoning and error-checking • Poor treatment of definitions • Not trans-granular • No relation to time or instances http://ncor.us

  5. The Gene Ontology • Pro • Open Source • Cross-Species • ... has recognized the need for reform, including explicit representation of granular levels http://ncor.us

  6. Problem of Circularity • GO:0042270: • Protection from natural killer cell mediated cytolysis • Definition: The process of protecting a cell from cytolysis by natural killer cells. http://ncor.us

  7. GO:0019836 hemolysis • Definition: The processes that cause hemolysis • X =def. the Y of X • this is worse than circular http://ncor.us

  8. The Bad • Reactome • Pro • Rich catalogue of biological process • Con • Incoherent treatment of categories: • ReferentEntity (embracing e.g. small molecules) is a sibling of PhysicalEntity (embracing complexes, molecules, ions and particles). • Similarly CatalystActivity is a sibling of Event. http://ncor.us

  9. The Bad • National Cancer Institute Thesaurus • Pro • Open source; ambitiously broad coverage; DL-based • Con • Poor realization of DL formalism • Full of mistakes (many inherited from its UMLS sources): • threedisjoint classes of plants: Vascular Plant, Non-vascular Plant, Other Plant • threedisjoint kinds of cells: Cell, Normal Cell, Abnormal Cell • Normal Cellis_a Microanatomy See http://ontology.buffalo.edu/medo/NCIT_Smith.html http://ncor.us

  10. National Cancer Institute Thesaurus • Duratec, Lactobutyrin and Stilbene Aldehydeclassified as: Unclassified Drugs and Chemicals • Pro • NCIT, too, has recognized the need for reform • (NCIT is part of the OBO library) http://ncor.us

  11. The UglyUMLS Semantic Network • Pros • Broad coverage; no multiple inheritance • Cons • Incoherent use of ‘conceptual entities’ • (e.g. the digestive system as a conceptual part of the organism) • Full of errors http://ncor.us

  12. UMLS Semantic Network • Edges in the graph represent merely “possible significant relations”: • Bacterium causes Experimental Model of Disease • Experimental Model of Disease affects Fungus • Experimental model of diseaseis_a Pathologic Function http://ncor.us

  13. UMLS Semantic Network • Unclear what the nodes of the graph are: • Drug Delivery Device contains Clinical Drug • Drug Delivery Device narrower_in_meaning_than Manufactured Object • The use-mention confusion: • “Swimming is healthy and has 8 letters” http://ncor.us

  14. The UglyClinical Terms Version 2 (The Read Codes) • Classifies chemicals into: • chemicals whose name begins with ‘A’, • chemicals whose name begins with ‘B’, • chemicals whose name begins with ‘C’, ... http://ncor.us

  15. The Astonishingly (Criminally?) Ugly • Health Level 7 • HL7 is a UML-based standard for exchange of information between clinical information systems • has proved very crumbly as a standard • The HL7 Reference Information Model (RIM) is supposed to overcome this problem by defining the universe of healthcare data in a rigorous way http://ncor.us

  16. HL7-RIM • Animal • Definition: A subtype of Living Subject representing any animal-of-interest to the Personnel Management domain. • Person • A subtype of Living Subject representing single human being [sic] who, in the context of the Personnel Management domain, must also be uniquely identifiable through one or more legal documents. • LivingSubject • Definition: A subtype of Entity representing an organism or complex animal, alive or not. http://ncor.us

  17. HL7 RIM: The Problem of Circularity • Person = Person with documents • has the form: ‘An A is an A which is B’ • – useless in practical terms since neither we nor the machine can use them to find out what ‘A’ means • – incorporate a vicious infinite regress • – have the effect of making it impossible to refer to A’s which are not Bs, for example to an undocumented person http://ncor.us

  18. HL7 Logically Incoherent • act = the record of an act • This has the form: An X is the Y of an X • again worse than circular http://ncor.us

  19. HL7-RIM: Logically Contradictory Definitions • Definition of Act: An Act is an action of interest that has happened, can happen, is happening, is intended to happen, or is requested/demanded to happen. • Definition of Act: An Act is the record of something that is being done, has been done, can be done, or is intended or requested to be done. http://ncor.us

  20. HL7 RIM Ontologically Incoherent • The truth about the real world is constructed through a combination and arbitration of attributed statements ... • As such, there is no distinction between an activity and its documentation. http://ncor.us

  21. HL7 Incredibly Successful • embraced as US federal standard; • central part of $15 billion program to integrate all UK hospital information systems • made mandatory by Canada Health Infoway • adopted by Oracle as basis for its EHR support programs http://ncor.us

  22. HL7 Merchandizing http://ncor.us

  23. From molecules to diseases • A good ontology should enable us to organize our information resources in such a way that we can bridge the granularity gap between genomics and proteomics data and phenotype (clinical, pharmacological, patient-centered) data http://ncor.us

  24. good ontologies require: Coherent upper level taxonomy distinguishing • continuants (cells, molecules, organisms ...) • occurrents (events, processes) • dependent entities (qualities, functions ...) • independent entities (their bearers) • universals (types, kinds) • instances (tokens, instances) Coherent relation ontology supporting inference both within and between ontologies. http://ncor.us

  25. good ontologies require: Consistent use of terms, supported by logically coherent (non-circular) definitions, in both human-readable and computable formats http://ncor.us

  26. Open Biomedical Ontologies (OBO) Upper Biomedical Ontology (UBO) • root UBO:0000001:top • subclass BFO:continuant:continuant • subclass BFO:dependent_entity:dependent_entity • subclass UBO:0000023:quality • subclass UBO:0000026:phenotype • subclass UBO:0000025:state • subclass UBO:0000027:disease • subclass UBO:0000005:function • subclass GO:0003674:molecular_function • subclass BFO:disposition:disposition • subclass BFO:independent_entity:independent_entity • subclass UBO:0000002:substance • subclass UBO:0000019:protein • subclass GO:0005575:cellular_component • subclass UBO:0000006:anatomical_entity • subclass UBO:0000008:gross_anatomical_entity • subclass UBO:0000007:organism • subclass UBO:0000015:microbe • subclass UBO:0000014:plant • subclass UBO:0000017:animal • subclass BFO:fiat_part_of_substance:fiat_part_of_substance • subclass BFO:boundary_of_substance:boundary_of_substance • subclass BFO:aggregate_of_substances:aggregate_of_substances • subclass BFO:occurrent:occurrent • subclass BFO:dependent_occurrent:dependent_occurrent • subclass UBO:0000004:process • subclass GO:0008150:biological_process • subclass BFO:fiat_part_of_process:fiat_part_of_process • subclass UBO:0000029:life_cycle_stage • subclass BFO:aggregate_of_processes:aggregate_of_processes • subclass EO:0007359:environment ontology • subclass BFO:temporal_boundary_of_process:temporal_boundary_of_process • subclass BFO:independent_occurrent:independent_occurrent http://ncor.us

  27. OBO Relation Ontology (RO) • Clear distinction between universals (classes, kinds, types and instances (individuals, tokens • Precise formal definitions of relations • Automatic applicability to time-indexed instance-data e.g. in Electronic Health Record • Consistency with the Relation Ontology now a criterion for admission to the OBO ontology library • see Genome Biology Apr. 2006 http://ncor.us

  28. Three types of relations • between instances: • Mary’s heart part_of Mary • between an instance and a universal: • Mary instance_of homo sapiens • between universals: • gastrulation part_of embryonic development http://ncor.us

  29. A suite of primitive instance-level relations • identical_to • part_of • located_in • adjacent_to • earlier • derives_from • ... http://ncor.us

  30. A suite of defined relations between universals http://ncor.us

  31. GALEN: Vomitus contains carrot • All portions of vomit contain all portions of carrot • All portions of vomit contain some portion of carrot • Some portions of vomit contain some portion of carrot • Some portions of vomit contain all portions of carrot http://ncor.us

  32. all-some structure • A part_of B =def. given any instance a of A there is some instance b of B such that a part_of b on the instance level • Allows automatic ontology integration via cascading reasoning: • A R1 B • B R2 C •  A R3 C http://ncor.us

  33. adjacent_to • cell wall adjacent_to cytoplasm • intron adjacent_to exon • Golgi apparatus adjacent_to endoplasmic • reticulum • periplasm adjacent_to plasma membrane • presynaptic membrane adjacent_to synaptic cleft http://ncor.us

  34. A adjacent_to B • every instance of A stands in the instance-level adjacent_to relation to some instance of B http://ncor.us

  35. adjacent_to as a relation between universals is not symmetric • nucleus adjacent_to cytoplasm • Not: cytoplasm adjacent_to nucleus • seminal vesicle adjacent_to urinary bladder • Not: urinary bladderadjacent_to seminal vesicle http://ncor.us

  36. The Granularity Gulf • most existing data-sources are of fixed, single granularity • many (all?) clinical phenomena cross granularities http://ncor.us

  37. Main obstacle to integrating genetic and EHR data No facility for dealing with time and instances (particulars, individuals) in current ontologies http://ncor.us

  38. Key idea • To define ontological relations like • part_of, develops_from • it is not enough to look just at universals / classes / types / ‘concepts’ : • we need also to take account of instances and time http://ncor.us

  39. transformation_of • A transformation_of B • =def. any instance of A was at some earlier time an instance of B http://ncor.us

  40. same instance C1 C c att c att1 time transformation_of mature RNA transformation_of pre-RNA adult transformation_of child carcinomatous colon transformation_of colon http://ncor.us

  41. C1 C c att c att1 transformation_of relations cross both time and granularity http://ncor.us

  42. Advantages of the methodology of enforcing commonly accepted coherent definitions • promote quality assurance (better coding) • guarantee automatic reasoning across ontologies and across data at different granularities • yields direct connection to times and instances in the EHR http://ncor.us

More Related