1 / 120

Basic Introduction to Ontology-based Language Technology (LT) (2nd year Ms in Social Medicine, UG, Belgium)

Basic Introduction to Ontology-based Language Technology (LT) (2nd year Ms in Social Medicine, UG, Belgium). Werner Ceusters European Centre for Ontological Research Universität des Saarlandes Saarbrücken, Germany. Lecture overview. Problem description: patient eligibility for clinical trial

Download Presentation

Basic Introduction to Ontology-based Language Technology (LT) (2nd year Ms in Social Medicine, UG, Belgium)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Introduction toOntology-basedLanguage Technology (LT)(2nd year Ms in Social Medicine, UG, Belgium) Werner Ceusters European Centre for Ontological Research Universität des Saarlandes Saarbrücken, Germany

  2. Lecture overview • Problem description: patient eligibility for clinical trial • Meaning theories • Medical Language and terminologies • realist ontology for medical natural language understanding • Natural language understanding today

  3. The Medical Informatics Dogma Everything should be structured • Fact: computers can only deal with structuredrepresentations of reality: • structured data: • relational databases, spreadsheets • structured information: • XML simulates context • structured knowledge: • rule-based knowledge systems • Typical conclusion (Dogma?): • there is a need for structured data, hence … • … there is a need for structured data entry

  4. Structured data entry • Current technical solutions: • rigid data entry forms • coding and classification systems • But: • the description of biological variability requires the flexibility of natural language and it is generally desirable not to interfere with the traditional manner of medical recording (Wiederhold, 1980) • Initiatives to facilitate the entry of narrative data have focused on the control rather than the ease of data entry (Tanghe, 1997)

  5. Drawbacks of structured data entry • Loss of information • qualitatively • limited expressiveness and inherent defects of coding and classification systems, controlled vocabularies, and “traditional” medical terminologies • use of purpose oriented systems • don’t use data for another purpose than originally foreseen (J VDL) • quantitatively • too time-consuming to code all information manually • Speech recognition and forms for structured data entry are not best friends

  6. Areas for application of medical natural language understanding • Coding patient data • Structured information extraction from unstructured clinical notes • Clinical protocols and guidelines • Assessing patient eligibility for clinical trial entry • Triggering and alerts • Linking case descriptions to scientific literature • Easy access to content • ... towards a medical semantic web

  7. Clinical history description • Mr. Kovács is an 83-year-old man with a past medical history of hypertension, congestive heart failure, atrial fibrillation, hypercholesterolemia, and ahistory of CVA who presented himself to Budapest Emergency Room on April 25 with primary complaint of right-sided chest pain since April 24. The patient was in his usual state of health until April 24 when he experienced right-sided chest pain after 10 minutes of bicycling exercise at the YMCA. He described the chest pain as a dull ache in the right side of his chest radiating posteriorly to the right scapular area. He rated the intensity as 7 out of 10. The chest pain lasted about 3 minutes and resolved with rest. That same night, the patient once again experienced right-sided chest pain while lying in bed just before he went to sleep. He describes the pain as right-sided chest pain with same radiation to posterior at an intensity of 6-7 out of 10. The chest pain lasted about 10 minutes and resolved spontaneously.

  8. Inclusion criteria of the INVEST study • 1. Male or female • 2. Age 50 to no upper limit • 3. a) Hypertension documented as according to the 6th report of the Joint National Committee on Detection and Evaluation of the treatment of high BP (JNC VI) , b) and the need for drug therapy (previously documented hypertension in patients currently taking antihypertensive agents is acceptable) • 4. Documented CAD (e.g., classic angina pectoris; stable angina pectoris; Heberden angina pectoris), myocardial infarction three or more months ago, abnormal coronary angiography, or concordant abnormalities on two different types of stress tests • 5. Willingness to sign informed consent

  9. Do they match ? • Mr. Kovács is … an 83-year-old man with past medical history of hypertension, congestive heart failure, atrial fibrillation, hypercholesterolemia, history of CVA who presented to Budapest Emergency Room on April 25 with chief complaint of right-sided chest pain since April 24. The patient was in his usual state of health until April 24 when he experienced right-sided chest pain after 10 minutes of bicycling exercise at YMCA. He described the chest pain as a dull ache in the right side of his chest radiating posteriorly to the right scapular area. He rated the intensity as 7 out of 10. The chest pain lasted about 3 minutes and resolved with rest. That same night, the patient once again experienced right-sided chest pain while lying in bed right before he went to sleep. He describes the pain as right-sided chest pain with same radiation to posterior at an intensity of 6-7 out of 10. The chest pain lasted about 10 minutes and resolved spontaneously. • 1. Male or female • 2. Age 50 to no upper limit • 3. Hypertension documented according to the 6th report of the Joint National Committee on Detection and Evaluation of the treatment of high BP (JNC VI) and the need for drug therapy (previously documented hypertension in patients currently taking antihypertensive agents is acceptable) • 4. Documented CAD (e.g., classic angina pectoris (stable angina pectoris; Heberden angina pectoris), myocardial infarction three or more months ago, abnormal coronary angiography, or concordant abnormalities on two different types of stress tests) • 5. Willingness to sign informed consent 

  10. If the computer is to make this deduction ... • 1. Male or female • 2. Age 50 to no upper limit • 3. Hypertension documented according to the 6th report of the Joint National Committee on Detection and Evaluation of the treatment of high BP (JNC VI) and the need for drug therapy (previously documented hypertension in patients currently taking antihypertensive agents is acceptable) • 4. Documented CAD (e.g., classic angina pectoris (stable angina pectoris; Heberden angina pectoris), myocardial infarction three or more months ago, abnormal coronary angiography, or concordant abnormalities on two different types of stress tests) • 5. Willingness to sign informed consent • Mr. Kovács is …an 83-year-old man with past medical history of hypertension, congestive heart failure, atrial fibrillation, hypercholesterolemia, history of CVA who presented to Budapest Emergency Room on April 25 with chief complaint of right-sided chest pain since April 24. The patient was in his usual state of health until April 24 when he experienced right-sided chest pain after 10 minutes of bicycling exercise at YMCA. He described the chest pain as a dull ache in the right side of his chest radiating posteriorly to the right scapular area. He rated the intensity as 7 out of 10. The chest pain lasted about 3 minutes and resolved with rest. That same night, the patient once again experienced right-sided chest pain while lying in bed right before he went to sleep. He describes the pain as right-sided chest pain with same radiation to posterior at an intensity of 6-7 out of 10. The chest pain lasted about 10 minutes and resolved spontaneously. ... it must be able to understand !

  11. What is understanding ? • To understand something is to know what its significance is. • What 'knowing significance' amounts to may be very different in different contexts: thus understanding a piece of music requires different things of us than understanding a sentence in a language we are learning, for instance. It would be useful, then, for theorists to look at the different kinds of understanding that there are, and examine them in detail and without prejudice, rather than looking for the essence of understanding. (Tim Crane, philosopher of mind) • The significance of a single sentence, too, can vary from context to context.

  12. The etymology of “understanding” • “understanding”  Latin “substare” • literally: “to stand under” • Websters Dictionary (1961) understanding =the power to render experience intelligible by bringing perceived particulars under appropriate concepts. • “particulars” = what is NOT SAID of a subject (Aristotle) • substances: this patient, that tumor, ... • qualities: the red of that patient’s skin, his body temperature, blood pressure, ... • processes: that incision made by that surgeon, the rise of that patient’s temperature,... • “concepts”: may be taken in the above definition as Aristotle’s “universals” = what is SAID OF a subject • Substantial concepts: patient, tumor, ... • Quality concepts: white, temperature • ...

  13. What is natural language understanding? • NLU is constructing meaning from “written” language by which the degree of understanding involves a multifaceted meaning-making process that depends on knowledge about language and knowledge about the world. ( cf. “reading comprehension” by humans. ) • But then: what is “meaning”

  14. Dyadic models of “meaning” • Saussure (language philosopher): • signe / signifiant (sign/concept) • Ron Stamper (information scientist): • thing-A STANDS-FOR thing-B • Major drawback: • excludes the “referent” from the model, i.e. that what the sign/symbol/word/... denotes

  15. Current “state of the art” onmeaning in healthcare informatics • A pervasive bias towards “concepts” • Content wise: • Work based on ISO/TC37 that advocates the Ogden-Richards theory of meaning • Corresponds with a linguistic reading of “concept” • Architecture wise: • In Europe: work based on CEN/TC251 WG1 & WG2 that follow ISO/TC37 • In the US: HL7, inspired by Speech Act Theory • “Concepts” used as elements of information models, hence mixing a linguistic and engineering reading.

  16. Triadic models of meaning: The Semiotic/Semantic triangle Reference: Concept / Sense / Model / View / Partition Sign: Language/ Term/ Symbol Referent: Reality/ Object

  17. Aristotle’s triadic meaning model Words spoken are signs or symbols (symbola) of affections or impressions (pathemata) of the soul (psyche); written words (graphomena) are the signs of words spoken (phoné). As writing (grammatta), so also is speech not the same for all races of men. But the mental affections themselves, of which these words are primarily signs (semeia), are the same for the whole of mankind, as are also the objects (pragmata) of which those affections are representations or likenesses, images, copies (homoiomata). Aristotle, 'On Interpretation', 1.16.a.4-9, Translated by Cooke & Tredennick, Loeb Classical Library, William Heinemann, London, UK, 1938. pathema semeia  gramma/ phoné pragma

  18. my your understanding understanding Richards’ semantic triangle • Reference (“concept”): “indicates the realm of memory where recollections of past experiences and contexts occur”. • Hence: as with Aristotle, the reference is “mind-related”: thought. • But: not “the same for all”, rather individual mind-related reference symbol referent

  19. R1 R2 R3 mole (skin lesion) mole (unit) mole (animal) Don’t confuse with homonymy ! “mole”

  20. One concept understanding of x understanding of y referent symbol Different thoughts Homonymy R2 R3 R1 mole “skinlesion” mole “unit” “mole” mole “animal”

  21. And by the way, synonymy... the Aristotelian view Richards’ view “sweat” “sweat” “perspiration” “perspiration”

  22. Frege’s view • “sense” is an objective feature of how words are used and not a thought or concept in somebody’s head • 2 names with the same reference can have different senses • 2 names with the same sense have the same reference (synonyms) • a name with a sense does not need to have a reference (“Beethoven’s 10th symphony”) sense name reference (=referent)

  23. conception concept actor definition representation referent term referent Tetrahedric extensions CEN/TC251 ENV 12264 FRISCO model (information science)

  24. Requirements for NLU • Knowledge about terms and how they are used in valid constructions within natural language; • Knowledge about the world, i.e. how the referents denoted by the terms interrelate in reality and in given types of context; • An algorithm that : • is able to calculate a language user’s representation of that part of the world described in the utterances that are the subject of the analysis. • can track the ways in which people express what does NOT represent anything in reality (eg for medico-legal reasons)

  25. The medical language

  26. Some figures about the estimated size of “clinical language” • number of unique medical expressions: 107 • In one domain (AIDS) : 150.000 candidate term phrases of 1 to 5 words found • 100-200 subdomains in medicine • estimated 2-word expressions: 4*106 • assumes 20.000 meaningful single words • assumes 10% combination rate (Evans & Patel ‘91)

  27. Some figures about the estimated size of “clinical language” • 0.5 x 106 entries in Oxford Dictionary of English • 0.3 x 106 word occurrences in Snomed 3.1 • 0.15 x 106 meanings in Meta-1.3 • 0.10 x 106 entries in Dorland’s Medical Dictionary • 0.05 x 106 entries in Webster’s Collegiate Dict. • 0.01 x 106 words in average human recognition voc. • 0.005 x 106 words in “basic English” Tuttle & Nelson ‘94

  28. Specificities of the medical sublanguage • Extensive use of acronyms • reasons • consequence of sublanguage shaping and use by a relatively closed community • efficient and economical in use • forms • simple: NIDDM: non insulin-dependent diabetes mellitus • compound: GABAuria: GABA in the urine • Combined use of numerals and letters • for: types, stage, severity, position, measures • exmpl: IgG, IQ 50-70, type A1, ...

  29. Specificities of the medical sublanguage • compounding and complex nouns • extensive use of affixing • embedded affixes: -pathy, -osis, … • linked affixes: -related, -induced, -linked, … • also outside the medical domain: pseudo-, -like, … • foreign language importation • words/expressions in Latin: kyphosis dorsalis juvenilis • Latin/Greek based words with English lexicalisation: • headache, cephalgia, cephalgic • tooth, dens, dentis, dental, dente

  30. Specificities of the medical sublanguage • abundance of synonyms (and pseudo-synonyms) • abundance of proper nouns • toponyms: Thogoto virus, Rio Bravo Fever • eponyms: Laennec’s cirrhosis, de Quervain’s disease • use of ellipsis • Otto’s fever • parachute mitral valve • abundance of uncountable nouns • substances: paracetamol, antibiotic • mass nouns: acne, prurigo, air, materia alba • process describing nouns: calcification, amelogenesis • state describing nouns: hypoglycemia, anemia

  31. Specificities of the medical sublanguage • large noun phrase structures • congenital absence of auricle with stenosis of auditory canal • acute narcotising cutaneous leishmaniasis • explicit use of prepositions • density of information • multiple (pseudo-)synonymous entries (eg ICD): • 487.1: Influenza, NOS • 487.1: Flu • 487.1: Grippe CAVE ! Same category does not imply same semantics

  32. The sublanguage of the “clinical narrative”: syntactic incompleteness. • Deleted verb and object / subject: • stiff neck and fever • Deleted tense and verb “be”: • brain scan negative • Deleted subject, tense, and verb be: • positive for heart disease and diabetes • Deleted subject: • was seen by local doctor (Sager 1982)

  33. Taming medical language ... Classification systemsClinical vocabulariesCoding SystemsNomenclaturesThesauri...

  34. About nomenclatures and other strange animals (1) • nomenclature: system of terms which is elaborated according to pre-established naming rules. In principle, there is a one-to-one relationship with the concepts of the subject field. • terminology: set of terms representing the concept system of a particular subject field • vocabulary: list of terms in a specific subject field, with their definitions • terminological system: system that includes at least one concept set and one or more terminologies and / or coding schemes • thesaurus: set of terms formally organised so that relationships between concepts (for example as 'broader' and 'narrower') are made explicit.

  35. About nomenclatures and other strange animals (2) • coding scheme: collection of rules to represent items of one set with the elements of another set • coding system: terminological system consisting in a combination of a concept system, a terminology, a set of code values, and a coding scheme to relate the codes to the concepts and/ or the terms. • classification: terminological system whose concept system is connected by generic relations

  36. Coding systems and nomenclatures in healthcare • Main purpose: to stabilise the terminology • Mechanism: assign a code to every single term • Uses: • EDI • data storage and archiving • NLP • Disadvantages: • no internal structure • difficulties in finding specific terms • does not account for synonyms

  37. Characteristics of an ideal medical knowledge system? • a unique code for each term (word, phrase) ? • each code-term being defined • each term independent, not defined as the result of other terms in the system ? • synonyms recognisable through the codes • to each codes could be attached codes of related terms ? • the system would encompass all of medicine • the system would be in the public domain • the format of the KB should be functionally described, independent from hard- or software (C. Bishop, 1989)

  38. Main problems associated with Bishop’s view • A unique code for each term: unaware of the difference between terms and concepts • each term independent: he probably ran into problems with compositionality due to misperception of the real issues • attachment of related codes: this approach misses a formal ground

  39. Requirements for clinical vocabularies (1) • Domain completeness: coverage of all possible terms that lie within a vocabulary’s domain • Non-vagueness: the term should represent the concept behind it as close as possible • Non-ambiguity: the same term cannot refer to more than one concept • Non-redundancy: each concept must be represented by one unique identifier (Cimino, 1989)

  40. Requirements for clinical vocabularies (2) • Synonomy: multiple ways for expressing a word (or concept) must be allowed • Multiple classification: concepts must be allowed to be classified in multiple hierarchies • Consistency of view: concepts must have the same relationships in all views • Explicit relationships: all relationships (e.g. class, synonymy,…) must be explicitly labelled.

  41. Mesh:Medical Subject Headings • Designed for bibliographic indexing, eg Index Medicus • Basis for MedLINE • focuses on biomedicine and other basic healthcare sciences • clinically very impoverished • Consistency amongst indexers: • 60% for headings • 30% for sub-headings

  42. MeSH Tree Structures - 2004 •  Anatomy [A] •  Organisms [B] •  Diseases [C] •  Chemicals and Drugs [D] •  Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] •  Psychiatry and Psychology [F] •  Biological Sciences [G] •  Physical Sciences [H] •  Anthropology, Education, Sociology and Social Phenomena [I] •  Technology and Food and Beverages [J] •  Humanities [K] •  Information Science [L] •  Persons [M] •  Health Care [N] • Geographic Locations [Z]

  43. MeSH Tree Structures - 2004 • Cardiovascular Diseases [C14] • Heart Diseases [C14.280] • Arrhythmia [C14.280.067] + • Carcinoid Heart Disease [C14.280.129] • Cardiomegaly [C14.280.195] + • Endocarditis [C14.280.282] + • Heart Aneurysm [C14.280.358] • Heart Arrest [C14.280.383] + • Heart Defects, Congenital [C14.280.400] • Aortic Coarctation [C14.280.400.090] • Arrhythmogenic Right Ventricular Dysplasia [C14.280.400.145] • Cor Triatriatum [C14.280.400.200] • Coronary Vessel Anomalies [C14.280.400.210] • Crisscross Heart [C14.280.400.220] • Dextrocardia [C14.280.400.280] +

  44. MeSH Tree Structures - 2004 • Body Regions [A01] • Extremities [A01.378] • Lower Extremity [A01.378.610] • Buttocks [A01.378.610.100] • Foot [A01.378.610.250] • Ankle [A01.378.610.250.149] • Forefoot, Human [A01.378.610.250.300] + • Heel [A01.378.610.250.510] • Hip [A01.378.610.400] • Knee [A01.378.610.450] • Leg [A01.378.610.500] • Thigh [A01.378.610.750]

  45. MeSH Tree Structures - 2004 • Body Regions [A01] • Abdomen [A01.047] + • Back [A01.176] + • Breast [A01.236] + • Extremities [A01.378] • Amputation Stumps [A01.378.100] • Lower Extremity [A01.378.610] + • Upper Extremity [A01.378.800] + • Head [A01.456] + • Neck [A01.598] • Pelvis [A01.673] + • Perineum [A01.719] • Thorax [A01.911] + • Viscera [A01.960]

  46. SNOMED International (1995) • Multi-axial coding system: • morphology, disease, function, procedure, ... • Each axis has an hierarchical structure • Translations in other languages than English only for older versions • Informal internal structuring • Being translated in CG formalism, but with only internal consistency • Possibility to generate meaningless concepts • Mixing of hierarchies: • Bone • Long Bone • Periosteum • Shaft

  47. Snomed International Number of records (V3.1) • T Topography 12,385 • M Morphology 4,991 • F Function 16,352 • L Living Organisms 24,265 • C Drugs &Biological Products 14,075 • A Physical Agents, Forces and Activities 1,355 • D Disease/ Diagnosis 28,623 • P Procedures 27,033 • S Social Context 433 • J Occupations 1,886 • G General Modifiers 1,176 • TOTAL RECORDS 132,641

  48. T - 3 5 3 2 2 Snomed International:knowledge in the codes. posterior anatomic leaflet mitral cardiac valve cardiovascular CAVE ! This scheme is not consistently used throughout the system.

  49. Snomed International :multiple ways to express the same thing D5-46210 Acute appendicitis, NOS D5-46100 Appendicitis, NOS G-A231 Acute M-41000 Acute inflammation, NOS G-C006 In T-59200 Appendix, NOS G-A231 Acute M-40000 Inflammation, NOS G-C006 In T-59200 Appendix, NOS

  50. The International Classification of diseases (WHO). • ... • Chapter II: Neoplasms (C00-D48) • Chapter III: Diseases of the Blood and Blood-forming organs and certain disorders involving the immune mechanism (D50-D89) • Excludes : auto-immune disease (systemic) NOS (M35.9) • .... • Nutritional Anemias (D50-D53) • D50 Iron deficiency anaemia • Includes: ... • D50.0 Iron deficiency anaemia secondary to blood loss (chronic) • Excludes : ... • D50.1 ... • D51 Vit B12 deficiency anaemia • Haemolytic Anemias (D55-D59) • ... • Chapter IV: ...

More Related