slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
James J. Cimino Department of Medical Informatics Columbia University PowerPoint Presentation
Download Presentation
James J. Cimino Department of Medical Informatics Columbia University

Loading in 2 Seconds...

play fullscreen
1 / 42

James J. Cimino Department of Medical Informatics Columbia University - PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on

Battling Scylla and Charybdis: The Search for Redundancy and Ambiguity in the 2001 UMLS Metathesuarus. James J. Cimino Department of Medical Informatics Columbia University. 2001 Metathesaurus. 99 sources (92 in 2000) 1,734,707 strings (1,598,176 in 2000) 797,360 concepts (730,155 in 2000).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'James J. Cimino Department of Medical Informatics Columbia University' - iris-ball


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Battling Scylla and Charybdis:The Search for Redundancy and Ambiguity in the 2001 UMLS Metathesuarus

James J. Cimino

Department of Medical Informatics

Columbia University

2001 metathesaurus
2001 Metathesaurus
  • 99 sources (92 in 2000)
  • 1,734,707 strings (1,598,176 in 2000)
  • 797,360 concepts (730,155 in 2000)
lumping vs splitting

Cold (temperature)

COLD (temperature)

Cold (infection)

COLD (COPD)

Redundancy!

Lumping vs. Splitting

Cold (temperature)

COLD (temperature)

Cold (infection)

COLD (COPD)

Ambiguity!

three auditing methods
Three Auditing Methods
  • Ambiguity through of multiple semantic types
  • Redundancy through semantic string matching
  • Inconsistency in parent-child semantic types
previous results 1995

*

* Cimino JJ. Auditing the Unified Medical Language System with semantic methods. Journal of the American Medical Informatics Association; 1998;5:41-51.

Previous Results: 1995

Possible ambiguity 1,817

Possible redundancy 5,031

Actually redundancy 3,274

Parent-Child problems 544

tools and rules
Tools and Rules
  • Simple Metathesaurus data model
  • Normalized word index
  • “Mutually exclusive semantic types”
  • “Mutual concept subsumption”
simple metathesaurus data model

L0486186:

S0837575: “Chronic Obstructive Airway Disease”

L0486186:

S0837576: “Chronic Obstructive Lung Disease”

Simple Metathesaurus Data Model

C0024117: Chronic Obstructive Airway Disease

L0009264:

S0829315: “COLD <3>”

S0474508: “COLD”

Semantic type: T04: Disease or Syndrome

simple metathesaurus data model1
Simple Metathesaurus Data Model

C0024117: Chronic Obstructive Airway Disease

S0837575: “Chronic Obstructive Airway Disease”

S0837576: “Chronic Obstructive Lung Disease”

S0829315: “COLD <3>”

S0474508: “COLD”

Semantic type: T04: Disease or Syndrome

simple metathesaurus data model2
Simple Metathesaurus Data Model

C0024117: Chronic Obstructive Airway Disease

“Chronic Obstructive Airway Disease”

“Chronic Obstructive Lung Disease”

“COLD <3>”

“COLD”

Semantic type: T04: Disease or Syndrome

simple metathesaurus data model3

C0035242: Respiratory Tract Diseases

Semantic type: T04: Disease or Syndrome

Parent-Child

(is-a)

C0024117: Chronic Obstructive Airway Disease

Chronic Obstructive Airway Disease

Chronic Obstructive Lung Disease

COLD <3>

COLD

Semantic type: T04: Disease or Syndrome

Simple Metathesaurus Data Model
umls semantic types

Substance

Animal

Plant

Invertebrate

Food

Alga

UMLS Semantic Types

Physical

Object

Organism

mutually inclusive semantic types
Mutually Inclusive Semantic Types

Physical

Object

Organism

Substance

Animal

Plant

Invertebrate

Food

Alga

mutually exclusive semantic types
Mutually Exclusive Semantic Types

Physical

Object

Organism

Substance

Animal

Plant

Food

Invertebrate

Alga

rules for multiple semantic types
Rules for Multiple Semantic Types

3. Concepts can have two Substance types, except:

a) Element, Ion or Isotope and Chemicals Viewed Structurally

b) Inorganic Chemical and Organic Chemicals

5. Concepts can have two Conceptual Entity types, except:

Molecular Sequence and Geographic Area

Molecular Sequence and Body Location or Region

Geographic Area and Body Location or Region

7. Concepts can have two Event types, except:

Diagnostic Procedure and Laboratory Procedure

8. Concepts can have two types that ancestors/descendants

detection of ambiguity by mutually exclusive semantic types
Detection of Ambiguity by Mutually Exclusive Semantic Types

If a concept has multiple semantic types

And if any pair of the types are mutually exclusive

Then the concept may have multiple meanings (ambiguity)

Or the semantic type assignment is incorrect

ambiguity examples
Ambiguity Examples

C0015155: Euglena gracilis

Alga and Invertebrate

C0223537: Fourth lumbar vertebra

Body Part, Organ, or Organ Component and Disease or Syndrome

C0035510: Toxicodendron

Plant and Disease or Syndrome

C0242789: Crown-Rump Length

Organism Attribute and Diagnostic Procedure

C0007608: Cell Movement

Cell Function and Biomedical Occupation or Discipline

C0030756: Lice Infestations

Invertebrate and Disease or Syndrome

C0008715: Chronically Ill

Disease or Syndrome and Patient or Disabled Group

normalized word index
Normalized Word Index
  • UMLS Normalized Word Index
    • e.g., “lungs”  “lung”
    • 293,004 words
  • Keyword synonyms
    • e.g., “lung”  “pulmonary”
    • 9,650 mappings
  • Translated strings
  • Built word index
word normalization

C0035242: Respiratory Tract Diseases

Semantic type: T04: Disease or Syndrome

Parent-Child

(is-a)

C0024117: Chronic Obstructive Airway Disease

Chronic Obstructive Airway Disease

Chronic Obstructive Lung Disease

COLD <3>

COLD

Semantic type: T04: Disease or Syndrome

Word Normalization
word normalization1
Word Normalization

C0035242: Respiratory Tract Diseases

Semantic type: T04: Disease or Syndrome

Parent-Child

(is-a)

C0024117: Chronic Obstructive Airway Disease

chronic obstructive airway disease

chronic obstructive lung disease

cold 3

cold

Semantic type: T04: Disease or Syndrome

word normalization2
Word Normalization

C0035242: Respiratory Tract Diseases

Semantic type: T04: Disease or Syndrome

Parent-Child

(is-a)

C0024117: Chronic Obstructive Airway Disease

chronic obstructive airway disease

chronic obstructive pulmonary disease

cold 3

cold

Semantic type: T04: Disease or Syndrome

word normalization3
Word Normalization

C0035242: Respiratory Tract Diseases

Semantic type: T04: Disease or Syndrome

Parent-Child

(is-a)

C0024117: Chronic Obstructive Airway Disease

chronic obstructive airway disorder

chronic obstructive pulmonary disorder

cold 3

cold

Semantic type: T04: Disease or Syndrome

word normalization4
Word Normalization

C0035242: Respiratory Tract Diseases

Semantic type: T04: Disease or Syndrome

Parent-Child

(is-a)

C0024117: Chronic Obstructive Airway Disease

chronic obstructive airway disorder

chronic obstructive pulmonary disorder

cold three

cold

Semantic type: T04: Disease or Syndrome

word index
Word Index

C0035242: Respiratory Tract Diseases

Semantic type: T04: Disease or Syndrome

Parent-Child

(is-a)

C0024117: Chronic Obstructive Airway Disease

chronic obstructive airway disorder

airway

chronic

cold

disorder

obstructive

pulmonary

three

chronic obstructive pulmonary disorder

cold three

cold

Semantic type: T04: Disease or Syndrome

mutual string subsumption
Mutual String Subsumption

1) If Concept A has String A1

And all words in A1 are in Concept B’s word list

Then B subsumes A1

2) If B subsumes any string in A

And A subsumes any string in B

Then A and B are mutually subsumptive

mutual string subsumption1

C0009443: Common Cold

C0009264: cold temperature

common cold

cold two

cold

cold

common

two

cold temperature

cold one

cold

cold

one

temperature

T04: Disease or Syndrome

T070: Natural Phenomenon or Process

C0024117: Chronic Obstructive Airway Disease

chronic obstructive airway disorder

chronic obstructive pulmonary disorder

cold three

cold

airway

chronic

cold

disorder

obstructive

pulmonary

three

T04: Disease or Syndrome

Mutual String Subsumption
mutual string subsumption2
Mutual String Subsumption

C0009443: Common Cold

C0009264: cold temperature

common cold

cold two

cold

cold

common

two

cold temperature

cold one

cold

cold

one

temperature

T04: Disease or Syndrome

T070: Natural Phenomenon or Process

C0024117: Chronic Obstructive Airway Disease

chronic obstructive airway disorder

chronic obstructive pulmonary disorder

cold three

cold

airway

chronic

cold

disorder

obstructive

pulmonary

three

T04: Disease or Syndrome

mutual string subsumption3
Mutual String Subsumption

C0009443: Common Cold

C0009264: cold temperature

common cold

cold two

cold

cold

common

two

cold temperature

cold one

cold

cold

one

temperature

T04: Disease or Syndrome

T070: Natural Phenomenon or Process

C0024117: Chronic Obstructive Airway Disease

chronic obstructive airway disorder

chronic obstructive pulmonary disorder

cold three

cold

airway

chronic

cold

disorder

obstructive

pulmonary

three

T04: Disease or Syndrome

mutual string subsumption4
Mutual String Subsumption

C0009443: Common Cold

C0009264: cold temperature

common cold

cold two

cold

cold

common

two

cold temperature

cold one

cold

cold

one

temperature

T04: Disease or Syndrome

T070: Natural Phenomenon or Process

C0024117: Chronic Obstructive Airway Disease

chronic obstructive airway disorder

chronic obstructive pulmonary disorder

cold three

cold

airway

chronic

cold

disorder

obstructive

pulmonary

three

T04: Disease or Syndrome

detection of redundancy by string subsumption
Detection of Redundancy by String Subsumption

If A and B are mutually subsumptive

And semantic types of A and B are mutually inclusive

Then A and B may be redundant

detection of redundancy by string subsumption1
Detection of Redundancy by String Subsumption

C0009443: Common Cold

C0009264: cold temperature

common cold

cold two

cold

cold

common

two

cold temperature

cold one

cold

cold

one

temperature

T04: Disease or Syndrome

T070: Natural Phenomenon or Process

C0024117: Chronic Obstructive Airway Disease

chronic obstructive airway disorder

chronic obstructive pulmonary disorder

cold three

cold

airway

chronic

cold

disorder

obstructive

pulmonary

three

T04: Disease or Syndrome

redundancy examples
Redundancy Examples

C0673603: NPS-R-467 (Organic Chemical)

C0673604: NPS R-467 (Organic Chemical)

C0673769: des-Arg(10)-(Leu(9))kallidin (Amino Acid, Peptide or Protein)

C0673771: kallidin, des-Arg(10)-(Leu(9))-) (Amino Acid, Peptide or Protein)

C0266133: Congenital diverticulum of esophagus (Congenital Abnormality)

C0555218: Congenital esophageal pouch (Congenital Abnormality)

redundancy false positives
Redundancy False Positives
  • Partial names as synonyms:

C0687720: Central Diabetes Insipidus

has “Diabetes Insipidus” as synonym

so it is mutually subsumptive with

C0011848: Diabetes Insipidus

  • Incorrect synonymy (MeSH translations)

C0013005: Dolphins

has synonyms “ORCA” (Span.) and

"FALSA BALEIA ASSASSINA“ (Port.)

so it is mutually subsumptive with

C0325138: Whale, False Killer

which has synonym "FALSA ORCA" (Span.)

detecting semantic type problems through parent child relations
Detecting Semantic Type Problems through Parent-Child Relations

If Concept A is Parent of Concept B

And Concept A has semantic type X

And Concept B has semantic type Y

And if X and Y are different

And X is not an ancestor of Y (in Semantic Net)

Then one (or both) semantic types are wrong

Or the parent-child relation is wrong

detecting semantic type problems through parent child relations1

Skate

(manufactured

object)

Shark

(vertebrate)

Stingray

(animal)

Dogfish

(fish)

Detecting Semantic Type Problems through Parent-Child Relations

Cartilaginous Fish

(vertebrate)

Parent-Child Relations

OK

Wrong Type or

Wrong Concept

Nonspecific Semantic Type

OK

parent child examples
Parent-Child Examples

C00013769: Elbow

has type Body Location or Regions

which is in the Conceptual Entity hierarchy

Is parent of:

C0230353: Right elbow

has type Body Part, Organ, or Organ Component which is in the Physical Object hierarchy

results 1995 vs 2001
Results: 1995 VS. 2001

Possible ambiguity 1,817

Possible redundancy 5,031

Actually redundant 3,274

Parent-Child problems 544

8,082

38,140

not done

2,868

Number of concepts: 222,927 797,359 (3.6x)

Parent-Child relations 100,586 607,043 (6.0x)

results 1995 vs 20011
Results: 1995 VS. 2001

Possible ambiguity 1,817 (0.82%) 8,082 (1.01%)

Possible redundancy 5,031 (2.26%) 38,140 (4.78%)

Actually redundant 3,274 (1.47%) not done

Parent-Child problems 544 (0.54%) 2,868 (0.47%)

Number of concepts: 222,927 797,359 (3.6x)

Parent-Child relations 100,586 607,043 (6.0x)

discussion ambiguity detection
Discussion: Ambiguity Detection
  • Small number (1.01%) is a good sign
  • Allows focusing manual review
  • Semantic type definitions need to be clarified
  • Semantic type assignment rules need to be clarified
discussion redundancy detection
Discussion: Redundancy Detection
  • Specificity is worse, without improved sensitivity
  • Normalized string index is part of the reason
  • “Incomplete” names are a bigger part of the reason
  • Manual review will be relatively inefficient
  • Incorrect mappings detected, especially foreign language
discussion parent child relations
Discussion: Parent-Child Relations
  • Mostly detects errors in semantic type assignment
  • Strict hierarchy in Semantic Net causes problems
conclusions
Conclusions
  • Specific “answers” not possible
    • Domain expertise needed for assessment of chemical names
    • Assessments are necessarily subjective
    • NLM gets to make the rules
    • NLM hasn’t finished making the rules
  • Methods provide focus for manual review
  • Methods highlight where clearer definitions are needed
  • The results show the UMLS is doing well at a difficult task
acknowledgments
Acknowledgments
  • NLM: Bill Hole, Alexa McCray and Betsy Humphreys
  • Home: Rachel and Rebecca Cimino