part i biomedical ontologies a critical survey n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Part I: Biomedical Ontologies: A Critical Survey PowerPoint Presentation
Download Presentation
Part I: Biomedical Ontologies: A Critical Survey

Loading in 2 Seconds...

play fullscreen
1 / 167

Part I: Biomedical Ontologies: A Critical Survey - PowerPoint PPT Presentation


  • 174 Views
  • Uploaded on

Part I: Biomedical Ontologies: A Critical Survey. Barry Smith http://ontology.buffalo.edu/smith. I: Biomedical Ontologies: A Critical Survey

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Part I: Biomedical Ontologies: A Critical Survey' - hieu


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
part i biomedical ontologies a critical survey
Part I:Biomedical Ontologies: A Critical Survey
  • Barry Smith
  • http://ontology.buffalo.edu/smith
slide2
I:Biomedical Ontologies: A Critical Survey
  • Ontologies, terminologies and thesauri are now in common use in the domain of biomedical informatics. Their goal is to support search and retrieval, but also to advance genuine reasoning about biomedical phenomena and to enable re-use of heterogeneous data through the use of common systems of annotations. We examine a representative collection of biomedical ontologies in light of these criteria, and draw (somewhat sad) conclusions as to the current state of the field.
  • II. The Ontology of Biomedical Reality (terminology)
  • Ontologies to support scientific research and clinical medicine have special characteristics, which we shall outline in terms of a distinction between three levels: (1) the level of reality; (2) the level of cognitive representations; and (3) the level of the publicly accessible concretizations of such cognitive representations for example in ontologies. Against this background we shall clarify the relations between ontologies, terminologies, information models, databases, and similar artifacts.
  • III. The OBO Foundry Project: Towards Scientific Standards and Principles-Based Coordination in Biomedical Ontology Development
  • The OBO Foundryis a collaborative experiment, involving a group of ontology developers who have agreed in advance to the adoption of a growing set of principles specifying best practices in ontology development. The primary objective is to establish gold standard reference ontologies, one for each core domain of biomedical science. We shall describe how this objective is already being realized, and show how it can not only help solve the problems of data retrieval and re-use but also foster the development of the powerful tools that will be needed to reason with biomedical data in the future.
slide3
Problem:
  • how to reason with data deriving from different sources, each of which uses its own system of classification ?
solution
Solution:

Ontology !

examples of current needs for ontologies in biomedicine
Examples of current needs for ontologies in biomedicine
  • to enforce semantic consistency within a database
  • to enable data retrieval, sharing and re-use
  • to enable data integration (bridging across data at multiple granularities)
  • to allow querying
general trend
General trend
  • on the part of NIH, FDA and other bodies to consolidate ontology-based standards for the communication and processing of biomedical data.
old approach
Old approach
  • gather terminologies in libraries
  • Unified Medical Language System
  • National Library of Medicine
slide8

U M L S

SNOMED

DEMONS

new approach
New Approach
  • MusicBeanz
slide11
Semantic Web deposits
  • Pet Profile Ontology
  • Review Vocabulary
  • Band Description Vocabulary
  • Musical Baton Vocabulary
  • MusicBrainz Metadata Vocabulary
  • Kissology
http www w3 org1
http://www.w3.org/
  • Beer Ontology
  •  all instances of hops that have ever existed are necessarily ingredients of beer.
slide13

Both UMLS- and OWL-type responses involve ad hoc creation of new terminologies by each separate community, and an open-door policy for admission Many of these terminologies remain as torsos, gather dust, poison the wells, ...

owl s syntactic regimentation is not enough to ensure high quality ontologies
OWL’s syntactic regimentation is not enough to ensure high-quality ontologies
  • – the use of a common syntax and logical machinery and the careful separating out of ontologies into namespaces does not solvethe problem of ontology integration
from ontological engineering
from Ontological Engineering
  • location =def. a spatial point identified by a name (p. 12)
  • arrivalPlace =def. a journey ends at a location (p. 13)
  • facet = def. ternary relation that holds between a frame, a slot, and the facet (p. 51)
  • an example of function is Pays, which obtains the price of a room after applying a discount (p. 13)
from handbook of ontology
from Handbook of Ontology
  • On 'achieving consistency from multiple sources‘:
  • if exact semantic identity is lacking, terms can be unified at a higher level, and information that is possibly related can be retrieved as well. When the application objective is to study and understand, the end-user can reject misleading records. (p. 94)
  • owl:InverseFunctionalProperty defines a property that for which two different objects cannot have the same value, e.g. isTheSocialSecurityNumberOf (a social number is assigned to one person only) (p. 78)
slide17

U M L S

SNOMED

DEMONS

The Good, the Bad, and the UGLY

a methodology for quality assurance of ontologies
A methodology for quality-assurance of ontologies
  • tested thus far in the biomedical domain on:
    • FMA
    • GO + other OBO Ontologies
    • FuGO
    • SNOMED
    • UMLS Semantic Network
    • NCI Thesaurus
    • ICF (International Classification of Functioning, Disability and Health)
    • ISO Terminology Standards
    • HL7-RIM
the good
The Good
  • Foundational Model of Anatomy (FMA)
  • Pro
  • clear statement of scope: structural human anatomy, at all levels of granularity, from the whole organism to the biological macromolecule
  • Powerful treatment of definitions, from which the entire FMA hierarchy is generated – can serve as basis for formal reasoning
  • Con
  • Some unfortunate artifacts in the ontology deriving from its specific computer representation (Protégé)
slide21

Organ Part

Organ

Subdivision

Anatomical Space

Anatomical

Structure

Organ Cavity

Subdivision

Organ

Cavity

Organ

Organ

Component

Serous Sac

Tissue

Serous Sac

Cavity

Subdivision

Serous Sac

Cavity

is_a

Pleural Sac

Pleura(Wall

of Sac)

Pleural

Cavity

part_of

Parietal

Pleura

Visceral

Pleura

Interlobar

recess

Mediastinal

Pleura

Mesothelium

of Pleura

the foundational model of anatomy
The Foundational Model of Anatomy
  • Follows formal rules for ‘Aristotelian’ definitions
  • When A is_a B, the definition of ‘A’ takes the form:
  • an A =def. a B which ...
  • a human being =def. an animal which is rational
fma example
FMA Example
  • Cell =def. ananatomical structure which consists ofcytoplasmsurrounded by a plasma membrane with or without a cell nucleus
  • Plasma membrane=def. acell part that surrounds the cytoplasm
the fma regimentation
The FMA regimentation
  • Each definition reflects the position in the hierarchy to which a defined term belongs.
  • The position of a term within the hierarchy enriches its own definition by incorporating automatically the definitions of all the terms above it.
  • The entire information content of the FMA’s term hierarchy can be translated very cleanly into a computer representation
principle
Principle
  • Use Aristotelian definitions
  • An A is a B which C’s.
intermediate
Intermediate
  • GALEN
  • Pro
  • Allows formal representation of clinical information
  • Allows multiple views of relevant detail as needed
  • Uses powerful Description Logic (DL)-based formal structure
  • Makes definitions easy to formulate
  • Con
  • Remains only partially developed
  • Contains errors: Vomitus contains carrot
  • – which DLs did not prevent
principle1
Principle
  • An ontology should not remain a torso
principle2
Principle
  • An ontology should have a properly personed help desk
principle3
Principle
  • An ontology should have procedures for up-dating in light of scientific advance
intermediate1
Intermediate
  • The Gene Ontology
  • Con
  • Poor formal architecture
  • Full of errors
  • menopause part_of death
  • Poor support for automatic reasoning and error-checking
  • Poor treatment of definitions
  • Not trans-granular
  • No relation to time or instances
the gene ontology
The Gene Ontology
  • Pro
  • Open Source
  • Cross-Species
  • ... has recognized the need for reform, including explicit representation of granular levels
old go definitions
Old GO Definitions
  • hemolysis =def. the causes of hemolysis
go now adopting structured definitions which contain both genus and differentiae
GO now adopting structured definitions which contain both genus and differentiae

Species =def Genus + Differentiae

neuron cell differentiation =def

differentiation by which a cell acquires features of a neuron

ontology alignment one of the current goals of go is to align
cone cell fate commitment

retinal_cone_cell

Ontology alignmentOne of the current goals of GO is to align:
  • Cell Types in GO
  • Cell Types in the Cell Ontology

with

  • keratinocyte
  • keratinocyte differentiation
  • fat_cell
  • adipocyte differentiation
  • dendritic_cell
  • dendritic cell activation
  • lymphocyte
  • lymphocyte proliferation
  • T_lymphocyte
  • T-cell homeostasis
  • garland_cell
  • garland cell differentiation
  • heterocyst
  • heterocyst cell differentiation
alignment of the two ontologies will permit the generation of consistent and complete definitions

id: CL:0000062

name: osteoblast

def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629]

is_a: CL:0000055

relationship: develops_from CL:0000008

relationship: develops_from CL:0000375

Alignment of the two ontologies will permit the generation of consistent and complete definitions

GO

+

Cell type

=

Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.

New Definition

other ontologies to be aligned with go
Other Ontologies to be aligned with GO
  • Chemical ontologies
    • 3,4-dihydroxy-2-butanone-4-phosphate synthase activity
  • Anatomy ontologies
    • metanephros development
principle4
Principle
  • Exploit existing ontologies when formulating definitions
the bad
The Bad
  • Reactome
  • Pro
  • Rich catalogue of biological process
  • Con
  • Incoherent treatment of categories:
  • ReferentEntity (embracing e.g. small molecules) is a sibling of PhysicalEntity (embracing complexes, molecules, ions and particles).
  • Similarly CatalystActivity is a sibling of Event.
principle5
Principle
  • An ontology should be in agreement with the truths of basic science (e.g. that molecules are physical entities)
the ugly disease ontology icd 10
The UglyDisease Ontology / ICD-10
  • Other problems with special functionsTuberculosis of unspecified bones and joints, tubercle bacilli not found by bacteriological or histological examination, but tuberculosis confirmed by other methods (inoculation of animals)Other mineral salts, not elsewhere classified, causing adverse effects in therapeutic useOther general medical examination for administrative purposes Assault by other specified means
the ugly disease ontology icd 101
The UglyDisease Ontology / ICD-10
  • Other accidental submersion or drowning in water transport accident injuring other specified personAccident to powered aircraft, other and unspecified, injuring occupant of military aircraft, any rankOther accidental submersion or drowning in water transport accident injuring occupant of other watercraft - crew
the ugly disease ontology icd 102
The UglyDisease Ontology / ICD-10
  • Normal pregnancyFall on stairs or ladders in water transport injuring occupant of small boat, unpoweredRailway accident involving collision with rolling stock and injuring pedal cyclistInjury due to war operations by lasersNontraffic accident involving motor-driven snow vehicle injuring pedestrian
the ugly disease ontology icd 103
The UglyDisease Ontology / ICD-10
  • Donors of other specified organ or tissueFitting and adjustment of wheelchairHot (boiling) tap waterTraining in use of lead dog for the blindPerson consulting on behalf of another person
principle6
Principle
  • An ontology should have a clearly specified domain (captured by its root node)
slide45
“Circular Hierarchical Relationships in the UMLS:Etiology, Diagnosis, Treatment, Complications and Prevention”Olivier Bodenreider
  • Topographic regions: General terms
  • Physical anatomical entity
  • Anatomical spatial entity
  • Anatomical surface
  • Body regions
  • Topographic regions
principle7
Principle
  • Avoid cycles
slide47
MeSH
  • National Socialism is_a Political Systems
  • National Socialism is_a Anthropology ...
principle8
Principle
  • Use singular nouns
slide49
MeSH
  • National Socialism is_a MeSH Descriptor
plant ontology
Plant Ontology
  • cell = def. structural and physiological unit of a living organism; it (i.e., plant cell) consists of protoplast and cell wall; ...
principle9
Principle
  • For the sake of interoperability with other ontologies, do not give special meanings to terms with established general meanings
  • (Don’t use ‘cell’ when you mean ‘plant cell’)
icnp international classification of nursing procedures
ICNP: International Classification of Nursing Procedures
  • water =def. a type of Nursing Phenomenon of Physical Environment with the specific characteristics: clear liquid compound of hydrogen and oxygen that is essential for most plant and animal life influencing life and development of human beings.
the ncit reflects a recognition of the need
The NCIT reflects a recognition of the need
  • for high quality shared ontologies and terminologies the use of which by clinical researchers in large communities can ensure re-usability of data collected by different research groups
slide55
NCIT
  • “a biomedical vocabulary that provides consistent, unambiguous codes and definitions for concepts used in cancer research”
  • “exhibits ontology-like properties in its construction and use”.
goals
Goals
  • to make use of current terminology “best practices” to relate relevant concepts to one another in a formal structure, so that computers as well as humans can use the Thesaurus for a variety of purposes, including the support of automatic reasoning;
  • to speed the introduction of new concepts and new relationships in response to the emerging needs of basic researchers, clinical trials, information services and other users.
formal definitions
Formal Definitions
  • of 37,261 nodes, 33,720 were stipulated to be primitive in the DL sense
  • Thus only a small portion of the NCIT ontology can be used for purposes of automatic classification and error-checking by using OWL.
principle10
Principle
  • Supply definitions wherever possible
  • (both human-understandable natural language definitions, and equivalent formal definitions)
verbal definitions
Verbal Definitions
  • About half the NCIT terms are assigned verbal definitions
  • Unfortunately some are assigned more than one
disease progression
Disease Progression
  • Definition1
  • Cancer that continues to grow or spread.
  • Definition2
  • Increase in the size of a tumor or spread of cancer in the body.
  • Definition3
  • The worsening of a disease over time. This concept is most often used for chronic and incurable diseases where the stage of the disease is an important determinant of therapy and prognosis.
principle11
Principle
  • Each term should have at most one definition*
  • *which may have both natural-language and formal versions
to make matters worse disease progression has as subclass
To make matters worse Disease Progression has as subclass:
  • Cancer Progression
  • Definition:
  • The worsening of a cancer over time. This concept is most often used for incurable cancers where the stage of the cancer is an important determinant of therapy and prognosis.
cancer
Cancer
  • a process (of getting better or worse)
  • an object (which can grow and spread)
principle12
Principle
  • Distinguish continuant entities (molecule, cell, tumor, organism) from occurrent entities (processes of growth, change, ...)
two kinds of entities
Two kinds of entities
  • occurrents (processes, events, happenings)
  • cell division, ovulation, death
  • continuants (objects, qualities, ...)
  • cell, ovum, organism, temperature of organism, ...
ncit confuses definitions with descriptions
NCIT confuses definitions with descriptions
  • Tuberculosis
  • Definition
  • A chronic, recurrent infection caused by the bacterium Mycobacterium tuberculosis. Tuberculosis (TB) may affect almost any tissue or organ of the body with the lungs being the most common site of infection. The clinical stages of TB are primary or initial infection, latent or dormant infection, and recrudescent or adult-type TB. Ninety to 95% of primary TB infections may go unrecognized. Histopathologically, tissue lesions consist of granulomas which usually undergo central caseation necrosis. Local symptoms of TB vary according to the part affected; acute symptoms include hectic fever, sweats, and emaciation; serious complications include granulomatous erosion of pulmonary bronchi associated with hemoptysis. If untreated, progressive TB may be associated with a high degree of mortality. This infection is frequently observed in immunocompromised individuals with AIDS or a history of illicit IV drug use.
confuses definitions with descriptions
Confuses definitions with descriptions
  • Tuberculosis
  • Definition
  • A chronic, recurrent infection caused by the bacterium Mycobacterium tuberculosis. Tuberculosis (TB) may affect almost any tissue or organ of the body with the lungs being the most common site of infection. The clinical stages of TB are primary or initial infection, latent or dormant infection, and recrudescent or adult-type TB. Ninety to 95% of primary TB infections may go unrecognized. Histopathologically, tissue lesions consist of granulomas which usually undergo central caseation necrosis. Local symptoms of TB vary according to the part affected; acute symptoms include hectic fever, sweats, and emaciation; serious complications include granulomatous erosion of pulmonary bronchi associated with hemoptysis. If untreated, progressive TB may be associated with a high degree of mortality. This infection is frequently observed in immunocompromised individuals with AIDS or a history of illicit IV drug use.
a better definition
A better definition
  • Tuberculosis
  • Definition:
  • A chronic, recurrent infection caused by the bacterium Mycobacterium tuberculosis.
  • IS THIS CORRECT? (An infection is not a disease)
the use mention confusion
the use-mention confusion
  • Conceptual Entities =Def.
  • An organizational header for concepts representing mostly abstract entities.
  • Confuses use and mention (swimming is healthy and has eight letters)
principle13
Principle
  • Don’t confuse an entity with the name of an entity
duratec lactobutyrin stilbene aldehyde
Duratec, Lactobutyrin, StilbeneAldehyde
  • are classified by the NCIT as Unclassified Drugs and Chemicals
problematic synonyms
Problematic synonyms
  • Anatomic Structure, System, or Substance ~ Anatomic Structures and Systems
  • Does ‘anatomic’ apply only to structure or also to system and substance?
  • Biological Function ~ Biological Process
  • some biological processes are the exercises of biological functions
  • others (e.g. pathological processes, side effects) not
  • Genetic Abnormality ~ Molecular Abnormality (with subtype: Molecular Genetic Abnormality) (definitions not supplied)
three disjoint classes of plants
Threedisjoint classes of plants
  • Vascular Plant
  • Non-vascular Plant
  • Other Plant
three kinds of cells
Three kinds of cells
  • Abnormal Cell is a top-level class (thus not subsumed by Cell
  • Normal Cell is a subclass of Microanatomy.
  • Cell is a subclass of Other Anatomic Concept (so that cells themselves are concepts)
ncit as now constituted will block automatic reasoning
NCIT as now constituted will block automatic reasoning
  • Neither Normal Cells nor Abnormal Cells are Cells within the context of the NCIT
some consolations
Some consolations
  • NCIT is open source
  • NCIT has broad coverage
  • NCIT has some formal structure (OWL-DL)
  • NCIT is much, much better than (for example) the HL7-RIM
  • NCIT has realized the errors of its ways
what might have been
What might have been
  • http://www.cbd-net.com/index.php/search/show/938464
  • = “Review of NCI Thesaurus and Development of Plan to Achieve OBO Compliance”
fragment of pre ncit hierarchy

Welcome to the Pre-NCIT:

http://nciterms.nci.nih.gov/NCIBrowser/Dictionary.do

Fragment of Pre-NCIT Hierarchy
  • Murine Tissue Type Body Fluids and Substances (MMHCC) Cardiovascular System (MMHCC) Blood Vessel (MMHCC) Heart (MMHCC) Digestive System (MMHCC)
slide80
MeSH
  • MeSH Descriptors Index Medicus Descriptor Anthropology, Education, Sociology and Social Phenomena (MeSH Category) Social Sciences
  • Political Systems National Socialism
  • National Socialism is_a Political Systems
  • National Socialism is_a Anthropology ...
slide81
MeSH
  • National Socialism is_a MeSH Descriptors
  • The Bodenreider Defence:
  • MeSH is not an ontology
birnlex1
BIRNLex
  • The eye =def.
  • The eyeball and its constituent parts, e.g. retina
  • mouse =def.
  • common name for the species mus musculus
principle14
Principle
  • Avoid circular definitions
  • (The term defined should not appear in its own definition)
more ugly umls semantic network
More UglyUMLS Semantic Network
  • Pros
  • Broad coverage; no multiple inheritance
  • Cons
  • Incoherent use of ‘conceptual entities’
  • (e.g. the digestive system as a conceptual part of the organism)
  • Full of errors
umls semantic network
UMLS Semantic Network
  • Edges in the graph represent merely “possible significant (= some-some) relations”:
    • Bacterium causes Experimental Model of Disease
    • Experimental Model of Disease affects Fungus
    • Experimental model of diseaseis_a Pathologic Function
umls semantic network1
UMLS Semantic Network
  • Unclear what the nodes of the graph are:
  • Drug Delivery Device contains Clinical Drug
  • Drug Delivery Device narrower_in_meaning_than Manufactured Object
  • The use-mention confusion:
  • “Swimming is healthy and has 8 letters”
umls semantic network2
UMLS Semantic Network
  • Edges in the graph represent merely “possible significant (= some-some) relations”:
    • Bacterium causes Experimental Model of Disease
    • Experimental Model of Disease affects Fungus
    • Experimental Model of Diseaseis_a Pathologic Function
location of
location_of
  • Fungus location_ofVitamin
  • Tissue location_ofMental or Behavioral Dysfunction
fungus location of vitamin
Fungus location_ofVitamin
  • Every instance of vitamin is located in some fungus?
  • Some instances of vitamin are located in some fungi?
  • Some instances of fungi have instances of vitamin located in them?
  • Every instance of vitamin is located in every instance of fungus?
umls semantic network3
UMLS Semantic Network
  • Unclear what the nodes of the graph are:
  • Drug Delivery Device contains Clinical Drug
  • Drug Delivery Device narrower_in_meaning_than Manufactured Object
  • The use-mention confusion:
  • “Swimming is healthy and has 8 letters”
ncit inherits this ontological and terminological incoherence from source vocabularies in umls
NCIT inherits this ontological and terminological incoherence from source vocabularies in UMLS
  • Conceptual Entities =def
  • An organizational header for concepts representing mostly abstract entities.
  • Includes as subtypes:
  • action, change, color, death, event, fluid, injection, temperature
the umls
The UMLS
  • Unified Medical Language System
  • Metathesaurus
  • Semantic Network (SN)
birnlex and umls sn
BIRNLex and UMLS-SN
  • Rest =SN Daily or Recreational Activity
  • Principal Investigator =SN Professional or Occupational Group
  • Left handedness =SN Organism Attribute
  • Ambidextrous =SN Finding
  • Brain Imaging =SN Diagnostic Procedure
  • Brain Mapping =SN Diagnostic Procedure & Research Activity
  • Healthy Adult =SN Finding
slide101

To build a high quality shared ontology requires hard work and staying powerYou cannot cheat by borrowing from UMLSUMLS (= the UMLS Metathesaurus) is not an ontology

is a sensu umls
is_a (sensu UMLS)
  • A is_a B =def
  • ‘A’ is narrower in meaning than ‘B’
  • grows out of the heritage of dictionaries, which reflect meanings, not biological reality
concepts concept names and their identifiers in the umls
Concepts, Concept Names, and their Identifiers in the UMLS
  • The Metathesaurus is organized by concept. One of its primary purposes is to connect different names for the same concept from many different vocabularies.
the desperate search for mappings
The desperate search for ‘mappings’
  • A concept is a meaning. A meaning can have many different names. A key goal of Metathesaurus construction is to understand the intended meaning of each name in each source vocabulary and to link all the names from all of the source vocabularies that mean the same thing (the synonyms).
the desperate search for mappings1
The desperate search for ‘mappings’
  • This is not an exact science. ... Metathesaurus editors decide what view of synonymy to represent in the Metathesaurus concept structure. Please note that each source vocabulary’s view of synonymy is also present in the Metathesaurus, irrespective of whether it agrees or disagrees with the Metathesaurus view.
these strange mapping
These strange mapping
  • between names as they appear in different source vocabularies created for widely different purposes can still be very useful
  • but the source vocabularies themselves are of variable quality
  • (not all mappings are created equal)
  • and the sorts of search which the UMLS supports reflects an already outmoded technology
is a sensu umls1
is_a (sensu UMLS)
  • congenital absent nipple is_a nipple
  • surgical procedure not carried out because of patient’s decision is_a surgical procedure
  • cancer documentation is_a cancer
  • disease prevention is_a disease
  • living subject is_a information object representing an animal or complex organism
  • individual allele is_a act of observation
  • limb is_a tissue
is a sensu umls2
is_a (sensu UMLS)
  • both testes is_a testis
  • plant leaves is_a plant
  • smoking is_a individual behavior
  • walking is_a social behavior
advantages of the methodology of shared coherently defined ontologies
Advantages of the methodology of shared coherently defined ontologies
  • once the interoperable gold standard reference ontologies are there, it will make sense to reformulate parts of existing incompatible terminologies (e.g. in UMLS) in terms of the standard ontologies in order to achieve greater domain coverage and alignment of different but veridical views. Thus not everything that was done in the past turns out to be a waste.
is a sensu umls3
is_a (sensu UMLS)
  • A is_a B =def
  • ‘A ’ is narrower in meaning than ‘B ’
  • grows out of the heritage of dictionaries
  • (which ignore the basic distinction between universals and instances)
hl7 marketing
HL7 Marketing
  • HL7 V3 claims to be:
  • “The foundation of healthcare interoperability”
  • “The data standard for biomedical informatics”
  • from blood banks to Electronic Health Records to clinical genomics
hl7 incredibly successful
HL7 Incredibly Successful
  • adopted by Oracle as basis for its Electronic Health Record technology; supported by IBM, GE, Sun ...
  • embraced as US federal standard
  • central part of $35 billion program to integrate all UK hospital information systems
problem v3 of hl7 is designed to address
Problem V3 of HL7 is designed to address
  • in HL7 V2 the realization of the messaging task allows ad hoc interpretations of the standard by each sending or receiving institution.
  • Result: vendor products never properly interoperable, and always require mapping software.
slide116
The solution to this problem (V3) is the HL7 RIM
  • or Reference Information Model
  • = a world standard for exchange of information between clinical information systems
the v3 solution
The V3 solution
  • Remove optionality by having the RIM serve as a master model of all health information, from blood banks to Electronic Health Records to clinical genomics
the hype
The hype
  • “HL7 V3 is the standard of choice for countries and their initiatives to create national EHR and EHR data exchange standards as it provides a level of semantic interoperability unavailable with previous versions and other standards. Significant V3 national implementations exist in many countries, e.g. in the UK (e.g. the English NHS), the Netherlands, Canada, Mexico, Germany and Croatia.”
the reality i asked them
The reality (I asked them)
  • “None of the implementations have a national scope” (e.g. Stockholm City Council)
  • The paradigm Dutch national HL7 V3 EHR implementation uses HL7 technology exclusively for exchanging data (i.e. messaging). The EHR architectures themselves are HL7-free.
the oracle healthcare transaction base htb
The Oracle Healthcare Transaction Base (HTB)
  • Oracle itself refers (April 2006) to three implementations of HTB described as being 'live for EHR projects':1) Byrraju Foundation (BSRF) in India (Live)2) Stockholm County (planned to go live by May 2006)3) Louisiana (planned to go live by May 2006)
slide121
Regarding the Byrraju case, I am told that there is no V3 application running in India today and that the Byrraju Foundation is presently not using any telemedicine application that utilizes HL7.As to the Stockholm case, the HTB was purchased and deployed in late 2004. An attempt to port a pilot system was made during the spring of 2005. This attept was abandoned, as I understand from my Swedish colleagues, partly because of poor performance (the new application performed significantly less well than the system it was designed to replace, even though it was being run on considerably more expensive hardware), and partly because of a lack of fault tolerance, which made it inadequate as a mechanism for integrating legacy systems marked by a high degree of variation in data quality. During the spring of 2006, it seems, an attempt will be made to construct a new pilot application, this time with the more modest goal of handling referrals.
the hype1
The hype
  • The RIM is “credible, clear, comprehensive, concise, and consistent”
  • It is “universally applicable” and “extremely stable”
the reality
The reality
  • HL7 V3 documentation is 542,458 KB, divided into 7,573 files
  • It remains subject to frequent revisions
  • It is very difficult to understand
the reality1
The reality
  • The decision to adopt the RIM was made already in 1996, yet the promised benefits of interoperability still, after 10 years, remain elusive.
  • HL7 has bet the farm on the RIM – technology has advanced in these 10 years
too many combinations
Too many combinations
  • as the traffic on HL7’s own vocabulary mailing list reveals, there is no adequate mechanism for ensuring that the vast number of combinations of coded terms within actual messages can be controlled in such a way that messages will be understood in the same way by designers, senders and receivers.
these pre defined attributes
These pre-defined attributes
  • code, class_code, mood_code,
  • status_code, etc.
  • yield a combinatorial explosion:
  • class_code (61 values) x mood_code (13 values) x code (estimate 200) x status_code (10 codes) = 1.58 million combinations.
  • Adding in the other codes this becomes 810 billion.
why does the rim embody so many combinations
Why does the RIM embody so many combinations?
  • To ensure in advance that everything can be said in conformity to the standard
the rim methodology
The RIM methodology
  • defines a set of ‘normative’ classes (Act, Role, and so on), with which are associated a rich stock of attributes from which one must make a selection when applying the RIM to each new domain (pharmacy, clinical genomics ...),
  • Compare: attempting to create manufacturing software by drawing from a store containing pre-established parts (so that the store would need to have the bits needed for making every conceivable manufacturable thing, be it a lawnmower, a refrigerator, a hunting bow, and so on).
the rim methodology1
The RIM methodology
  • are there examples where a methodology of this sort has been made to work? Does the RIM yield a coherent basis for constructing well-designed software artifacts for functions like the EHR or computerized decision support?
this methodology does not impede the formation of local dialects
This methodology does not impede the formation of local dialects
  • Different teams produce different message designs for the very same topic.
  • In the UK, the £ 35 bn. NHS National Program “Connecting for Health” has applied the RIM rigorously, using all the normative elements, and it discovered that it needed to create dialects of its own to make the V3-based system work for its purposes (it still does not work)
the rim documentation
The RIM documentation
  • is subject to multiple and systematic internal inconsistencies and unclarities:
  • is marked by sloppy and unexplained use of terms such as ‘act’, ‘Act’, ‘Acts’, ‘action’, ‘ActClass’ ‘Act-instance’, ‘Act-object’
  • and uncertain cross-referencing to other HL7 documents
  • no publicly available teaching materials (no HL7 for Dummies)
from hl7 email forum do not circulate
from HL7 email forum (do not circulate)
  • “I am ... frightened when I contemplate the number of potential V3ers who ... simply are turned away by the difficulty of accessing the product.
  •   “Some of them attend V3 tutorials which explain V3 as the hugely complex process of creating a message and are turned off. [They] simply do not have the stamina, patience, endurance, time, or brain-cells to understand enough for them to feel comfortable contributing to debates / listserves, etc., so they remain silent.”
problems of scope
Problems of scope
  • Only two main classes in the RIM
  • Act = roughly: intentional action
  • Entity = persons, places, organizations, material
  • How can the RIM deal transparently with information about, say, disease processes, drug interactions, wounds, accidents, bodily organs, documents?
diseases in the rim
Diseases in the RIM
  • ... are not Acts
  • ... are not Entities
  • ... are not Roles, Participations ...
  • So what are they?
  • At best: a case of pneumonia is identified as the Act of Observation of a case of pneumonia
  • Note: RIM’s treatment of SNOMED codes
hl7 clinical document architecture
HL7 Clinical Document Architecture
  • defines a document as an Act
  • HL7’s Clinical Genomics Standard Specifications
  • defines an individual allele as an Act of Observation
why the centrality of act
Why the centrality of ‘Act’
  • because of HL7’s roots in US hospital messaging – and thus in US hospital billing:
  • intentional actions are what can be billed
mayo rim discussion of the meaning of act as intentional action
Mayo RIM discussion of the meaning of ‘Act’ as “intentional action”
  • Is a snake bite or bee sting an "intentional action"?
  • Is a knife stabbing an intentional action?
  • Is a car accident an intentional action?
  • When a child swallows the contents of a bottle of poison is that an intentional action?
the rim has no coherent criteria for deciding
The RIM has no coherent criteria for deciding
  • For this reason, too, dialects are formed – and the RIM does not do its job. One health information system might conceive snakebites and gunshots as Procedures. Another might classify them with diseases, and so treat them as Observations.
  • If basic categories cannot be agreed upon for common phenomena like snakebites, then the RIM is in serious trouble.
slide142
Are definitions like this a good basis for achieving semantic interoperability in the biomedical domain?:
  • LivingSubject
  • Definition: A subtype of Entity representing an organism or complex animal, alive or not.
person from hl7 glossary
Person (from HL7 Glossary)
  • Definition: A Living Subject representing single human being [sic] who is uniquely identifiable through one or more legal documents
the problem of circularity
The Problem of Circularity
  • A Person =def. A person with documents
  • ‘An A is an A which is B’
  • – useless in practical terms, since neither we nor the machine can use it to find out what ‘A’ means
  • – incorporates a vicious infinite regress
  • – has the effect of making it impossible to refer to A’s which are not Bs, for example to undocumented persons
what is the rim about
What is the RIM about?
  • blood pressure measurement = an information item
  • blood pressure = something in reality which exists independently of any recording of information, and which the measurement measures
  • Q: Is the RIM about information, or about the reality to which such information relates?
  • A: There is no difference between the two
rim philosophy
RIM Philosophy
  • “The truth about the real world is constructed through a combination and arbitration of attributed statements ...
  • “As such, there is no distinction between an activity and its documentation.”
the rim as an information model
The RIM as an Information Model
  • ‘a static (UML) model of health and health care information’
  • The scope of the RIM’s class hierarchy consists in packets of information:
  • the information content of invoices, statements of observations, lab reports, …
a good general constraint on a theory of meaning
A good, general constraint on a theory of meaning
  • For each linguistic expression ‘E’
  • ‘E’ means E
  • ‘snow’ means snow
  • ‘pneumonia’ means pneumonia
from the perspective of the rim on the information model conception
From the perspective of the RIM on the Information Model conception
  • ‘medication’ does not mean: medication
  • rather it means:
  • the record of medication in an information system
  • ‘stopping a medication’ does not mean: stopping a medication
  • rather it means:
  • change of state in the record of a Substance Administration Act from Active to Aborted
the rim s entity class
The RIM’s Entity class
  • persons, places, organizations, material
states of entity
States of Entity
  • • active: The state representing the fact that the Entity is currently active.
  • • nullified: The state representing the termination of an Entity instance that was created in error.
  • • inactive: The state representing the fact that an entity can no longer be an active participant in events.
  • • normal: The “typical” state. Excludes “nullified”, which represents the termination state of an Entity instance that was created in error
persons are entities
Persons are Entities
  • What do ‘active’ and ‘nullifed’ mean as applied to Person?
  • Is there a special kind of death-through-nullification in the case of those instances of Person who were created in error?
hl7 glossary
HL7 Glossary
  • Definition of Animal: A subtype of Living Subject representing any animal-of-interest to the Personnel Management domain.
  • An Animal is not an animal. Rather (an) Animal represents an animal: it is an information item which represents a certain highly specific kind of animal-of-interest, namely an animal that is of interest to the Personnel Management domain.
double standards
Double Standards
  • The RIM is a confusion of two separate artifacts:
  • 1. an “information model”, relating to names of persons, records of observations, social security numbers, etc.
  • 2. a reference ontology, relating to persons, observations, documents, acts, etc.
the examples provided to illustrate the rim s classes
The examples provided to illustrate the RIM’s classes
  • are almost always in conformity with the Reference Ontology Conception of the RIM
  • They involve the familiar kinds of things and processes in reality (medication, patients, devices, paper documents, surgery, diet, supply of bedding) with which healthcare messages are concerned.
hl7 glossary1
HL7 Glossary:
  • Instances of Person include: John Smith, RN, Mary Jones, MD, etc.
  • not: information about John Smith ...
hl7 s backbone act class
HL7’s backbone ‘Act’ class
  • Definition of Act:
  • A record of something that is being done, has been done, can be done, or is intended or requested to be done
  • An Act is the record of an Act
  • “There is no difference between an activity and its documentation”
acts are records but the examples of act given by the rim are as follows
Acts are records: but the examples of Act given by the RIM are as follows:
  • “The kinds of acts that are common in health care are (1) a clinical observation, (2) an assessment of health condition (such as problems and diagnoses), (3) healthcare goals, (4) treatment services (such as medication, surgery, physical and psychological therapy), ...
the class procedure a subclass of act
The class Procedure (a subclass of Act)
  • Definition of Procedure: An Act whose immediate and primary outcome (post-condition) is the alteration of the physical condition of the subject
  • Examples:
  • chiropractic treatment, acupuncture, straightening rivers, draining swamps.
what is an information model
What is an information model ?
  • Is it a model of entities in reality (an ontology)?
  • Or of information about entities in reality (an ontology)?
  • The RIM is an incoherent mixture of the two
  • Does this matter?
what s gone wrong
What’s gone wrong? 
  • People of good will are making mistakes because of insufficient concern for clarity and consistency
  • Even large ontologies are built in the spirit of the amateur hobbyist
  • Money is wasted on megasystems that cannot be used
lessons for semantic interoperability
Lessons for Semantic Interoperability
  • Clear and easily accessible documentation – based on an intuitive ontology (understandable to all classes of users)
  • Business model should be such that those responsible for creating documentation do not have an incentive for it to be unclear
  • Centralized control of documentation, to ensure consistency (too much democracy is a bad thing)
lessons for standards for semantic interoperability
Lessons for Standards for Semantic Interoperability
  • Create standards on the basis of thorough pilot testing
  • (Avoid systems like the RIM, which is imposed from the top down, on a wing and a prayer)
what should take the place of the rim
What should take the place of the RIM?
  • A Reference Ontology of the types of biomedical entity such as thing, process, person, disease, infection, molecule, procedure, etc.,
  • A Reference Ontology of the types of biomedical information entity such as message, document, record, image, diagnosis, interpretation, etc.
  • 1. provides a high-level framework in terms of which the lower-level types captured in vocabularies like SNOMED CT could be coherently organized
  • 2. helps to specify how information can be combined into meaningful units and used for further processing.