Medical factnet
Download
1 / 66

Medical FactNet - PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on

Medical FactNet. Barry Smith University at Buffalo and IFOMIS, Leipzig Christiane Fellbaum Princeton University and Berlin Academy. Online-Inquiry to MEDLINEplus. Online-Inquiry to MEDLINEplus.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Medical FactNet' - beate


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Medical factnet

Medical FactNet

Barry Smith

University at Buffalo and IFOMIS, Leipzig

Christiane Fellbaum

Princeton University and Berlin Academy





A new methodology for the construction and validation of information resources for consumer health
A New Methodology for the Construction and Validation of Information Resources for Consumer Health


Mwn specific aims
MWN: SPECIFIC AIMS Information Resources for Consumer Health

  • to extend and validate WordNet 2.0’s medical coverage in light of recent advances in medical terminology research

  • focusing initially on the English-language single word expressions used and understood by non-experts

  • provision of a mapping to UMLS, MeSH, and other expert terminologies

  • use as interlingua for MWNs in other languages


Wordnet miller fellbaum
WordNet (Miller, Fellbaum) Information Resources for Consumer Health

  • Large lexical database; ubiquitous tool of NLP

  • coverage comparable to collegiate dictionary, over 130,000 word forms

  • 40 wordnets in different languages

  • WordNet: rich medical coverage, but pooly validated and poor formal architecture

  • How create a validated Medical WordNet (MWN)?


Building blocks of wordnet synsets concepts in medical terminology
Building blocks of WordNet = ‘synsets’ Information Resources for Consumer Health= ‘concepts’ in medical terminology

  • terms in same synset = they are interchangeable in some sentential contexts without altering truth-value:

  • {car, automobile}, {shut, close}

  • synsets linked via small number of binary relations:

  • is-a

  • part-of

  • verb entailments: (walk-limp, forget-know).


Strengths of wordnet 2 0
Strengths of WordNet 2.0 Information Resources for Consumer Health

  • Open source

  • Very broad coverage

  • Is-a / part-of architecture

  • Tool for automatic sense disambiguation


13 senses for feel is a verb
13 senses for Information Resources for Consumer Healthfeel is a verb

  • experience – She felt resentful

  • find – I feel that he doesn't like me

  • feel – She felt small and insignificant;

  • feel – We felt the effects of inflation

  • feel – The sheets feel soft

  • grope –He felt for his wallet

  • finger – Feel this soft cloth!

  • explore – He felt his way around the dark room)

  • feel – It feels nice to be home again

  • feel – He felt the girl in the movie theater)


Medical senses of feel
Medical senses of ‘feel’ Information Resources for Consumer Health

  • palpate – examine a body part by palpation:

  • The nurse palpated the patient's stomach;The runner felt her pulse.

  • sense – perceive by a physical sensation, e.g. coming from the skin or muscles:

  • He felt his flesh crawl;She felt the heat when she got out of the car; He feels pain when he puts pressure on his knee.

  • feel – seem with respect to a given sensation:

  • My cold is gone – I feel fine today;She felt tired after the long hike.


MWN Information Resources for Consumer Health

  • many word units are monosemic (clinician, stethoscope)

  • most common words are polysemic

  • lexicon of the order of 4000 word units

  • with some 3,000 distinct word senses.

  • tested by incorporation in NLP applications used for purposes of information retrieval, machine translation, question-answer systems, text summarization


How to validate medical wordnet how to fix the scope of non expert
How to validate Medical WordNet? Information Resources for Consumer HealthHow to fix the scope of ‘non-expert’?


Answer medical factnet mfn
Answer: Medical FactNet (MFN) Information Resources for Consumer Health

  • a large corpus of natural-language sentences providing medically validated contexts for MWN terms.

  • pilot corpus: 40,000 sentences

  • full MFN (for common diseases): ~250,000 sentences

  • accredited as intelligible by non-experts

  • and as true by experts


Medical beliefnet mbn
Medical BeliefNet (MBN) Information Resources for Consumer Health

  • = totality of sentences about medical phenomena to which non-experts assent

  • comes for free, given our methodology for creating MFN


Sources for mfn
Sources for MFN Information Resources for Consumer Health

  • WordNet glosses and arcs

  • Online health information services targeted to consumers

  • NetDoctor, MEDLINEplus

  • (factsheets on common diseases)


Constructing mbn and mfn

Medical BeliefNet Information Resources for Consumer Health

Medical FactNet

Constructing MBN and MFN

  • sources (WordNet, MEDLINEplus …)

  • filtering for intelligibility by non-experts

  • pool of natural language sentences

  • filtering for non-expert assent filtering for validation by experts

?


Mfn specific aims
MFN: SPECIFIC AIMS Information Resources for Consumer Health

  • To create a pilot open-source corpus of sentences about medical phenomena in the English language

  • restricted to natural language

  • grammatically complete

  • logically and syntactically simple sentences

  • rated as understandable by non-expert human subjects in controlled questionnaire-based experiments


Mfn specific aims1
MFN: SPECIFIC AIMS Information Resources for Consumer Health

  • = sentences must be self-contained

  • make no reference to any prior context

  • not contain any proper names, indexical expressions or other linguistic devices that need to be interpreted with respect to other sentences.


Constructing mfn
Constructing MFN Information Resources for Consumer Health

  • Sentences in MFN must receive high marks for correctness on being assessed by medical experts.

  • MFN designed to constitute a representative fraction of the true beliefs about medical phenomena which are intelligible to non-expert English-speakers.


Constructing mbn
Constructing MBN Information Resources for Consumer Health

  • Sentences in MBN must receive high marks for assent on being assessed by non-experts.

  • MBN designed to constitute a representative fraction of the beliefs about medical phenomena (both true and false beliefs) distributed through the population of English speakers.


Compiling mfn and mbn in tandem
Compiling MFN and MBN in tandem Information Resources for Consumer Health

  • will allow systematic assessment of the disparity between lay beliefs and vocabulary as concerns medical phenomena and the exactly corresponding expert medical knowledge.

  • will allow us to establish automatically for any given sub-population which areas its beliefs about medical phenomena differ most significantly from validated medical knowledge


Uses of mfn
USES OF MFN Information Resources for Consumer Health

  • for quality assurance of MWN

  • to support the population of MWN by yielding new families of words and word senses

  • medical education

  • consumer health information

  • (in conjunction with MBN) allow new sorts of experiments in the linguistics, psychology and anthropology of consumer health


Evaluation of mfn
Evaluation of MFN Information Resources for Consumer Health

  • measure the benefits it brings when incorporated into an existing on-line consumer health portal based on term-search technology.

  • test whether exploiting the resources of MFN can lead to improved results in the retrieval of expert information


Differences between expert and non expert medical language
Differences between expert and non-expert medical language Information Resources for Consumer Health

  • mismatch between expert and non-expert language

  • taxonomies reflecting popular lexicalizations have small coverage relative to technical vocabularies

  • and shallow hierarchies:

  • no popular terms linking infectious disease and mumps


Differences between expert and non expert medical language1
Differences between expert and non-expert medical language Information Resources for Consumer Health

  • popular medical terms (flu) often fuzzier than technical terms

  • extension of non-expert term used also by experts sometimes smaller, sometimes larger

  • hypothesis: with few exceptions the focal meanings coincide in their extensions


Mismatches in doctor patient communication
Mismatches in Doctor-Patient Communication Information Resources for Consumer Health

  • Practical skills of physician in acquiring and conveying relevant and reliable information by using non-expert language tailored to individual patient

  • The physician, too, is a human being, thus ex officio a member of the wider community of non-experts

  •  continues to use non-expert language for everyday purposes


But there are problems
But there are problems Information Resources for Consumer Health


  • Question Information Resources for Consumer Health: My seven-year-old son developed a rash today … a friend of mine had her 10-day-old baby at my home last evening before we were aware of the illness. … I have read that chickenpox is contagious up to two days prior to the actual rash. Is there cause for concern at this point?

  • Answer: Chickenpox is the common name for varicella infection. ...

  • You are correct in that a person with chickenpox can be contagious for 48 hours before the first vesicle is seen. ...

  • Of concern, though, is the fact that newborns are at higher risk of complications of varicella, including pneumonia. ...

  • There is a very effective means to prevent infection after exposure. A form of antibody to varicella called varicella-zoster immune globulin (VZIG) can be given up to 48 hours after exposure and still prevent disease. ...

  • (from Slaughter)


Lexical mismatches
Lexical mismatches Information Resources for Consumer Health

  • rooted in legal concerns?

  • both primary care physician and online information system must respond primarily with generic, or case- or context-independent, information

  • most requests relate to specific and episodic phenomena (occurrences of pain, fever, reactions to drugs, etc.).

  • Hence focus of MFN on generic sentences = context-independent statements about causality, about types of persons or diseases or about typical or possible courses of a disease.


MFN Information Resources for Consumer Health

  • designed to map the generic medical information which non-experts are able to understand


Corpus and fact based approaches to information retrieval
Corpus- and fact-based approaches to information retrieval Information Resources for Consumer Health

  • meanings of highly polysemous terms cannot be discriminated without consideration of their contexts.

  • People do this without apparent difficulties

  • New NLP methodologies to harness computers to manipulate large text corpora

  • Train automatic systems on large numbers of semantically annotated sentences, exploit standard pattern-recognition and statistical techniques for purposes of disambiguation.


Use of wordnet in medical informatics
Use of WordNet in medical informatics Information Resources for Consumer Health

  • e.g. as tool for simplifying information extraction from the corpus of MEDLINE abstracts:

  • by replacing verbs with corresponding synsets and so reducing the number of relations that need to be taken account of in the analysis of texts


Example framenet
Example: Information Resources for Consumer HealthFrameNet

  • 500 Frames, each with a plurality of Frame Elements

  • Medical Frames:

  • Addiction, Birth, Biological Urge, Body Mark, Cure, Death, Health Response, Medical Conditions, Medical Instruments, Medical Professional, Medical Specialtiesand Observable Body Parts.


Frame cure
Frame: Information Resources for Consumer HealthCure

  • Frame Elements:

  • alleviate. v, alleviation. n, curable. a, curative. a, curative. n, cure. n, cure. v, ease. v, heal. v, healer. n, incurable. a, palliate. v, palliation. n, palliative. a, palliative. n, rehabilitate. v, rehabilitation. n, rehabilitative. a, remedy. n, resuscitate. v, therapeutic. a, therapist. n, therapy. n, treat. v, treatment. n.


Example penn proposition bank
Example: Information Resources for Consumer HealthPenn Proposition Bank

  • designed as a corpus of coherent texts. The intention is to train an automatic system to ‘learn’ the contexts for words and their context-specific meanings.

  • corpus characterized by a specific logical (function-argument-based) architecture.


Both framenet and proposition bank
Both FrameNet and Proposition Bank Information Resources for Consumer Health

  • have poor medical coverage

  • Both focus on word usage in general, rather than on domain-specific contexts.

  • Neither concerned with the questions of factuality or validation of statements


Example cyc knowledge base
Example: Information Resources for Consumer HealthCYC knowledge base

  • collection of hundreds of thousands of statements mostly about the external world:

  • The earth is round

  • Mountains are one kind of landform

  • Albany is the capital of New York

  • parcelled into micro-theories


In contrast to cyc
In contrast to CYC, Information Resources for Consumer Health

  • MFN focuses on one single (albeit very large) domain

  • MFN stores English sentences (CYC is language non-specific);

  • MFN discriminates folk beliefs and expert knowledge (designed to be consistent with the body of established science;

  • MFN will be publicly available.


Existing princeton wordnet 2 0
Existing Princeton WordNet 2.0 Information Resources for Consumer Health

  • labels 504 word-forms ‘medicine’:

  • infection#1 {(the pathological state resulting from the invasion of the body by pathogenic microorganisms)}

  • infection#3 {(the invasion of the body by pathogenic microorganisms and their multiplication which can lead to tissue damage and disease)}

  • infection#4 {infection, contagion, transmission – (an incident in which an infectious disease is transmitted)}


Maturation
Maturation Information Resources for Consumer Health

  • maturation#2 {growth, growing, maturation, development, ontogeny, ontogenesis – ((biology) the process of an individual organism growing organically; a purely biological unfolding of events involved in an organism changing gradually from a simple to a more complex level; he proposed an indicator of osseous development in children)}

  • maturation#3 {festering, suppuration, maturation – (the formation of morbific matter in an abscess or a vesicle and the discharge of pus)}


But it mixes up expert and non expert vocabulary
But Information Resources for Consumer Healthit mixes up expert and non-expert vocabulary,

  • both current and medieval:

  • suppuration#2 {pus, purulence, suppuration, ichor, sanies, festering – (a fluid product of inflammation)}


And it contains medically relevant errors
And it contains medically relevant errors: Information Resources for Consumer Health

  • snore-sleep linked via verb entailment: “if someone snores, then he necessarily also sleeps.”

  • In medicine: quite possible to snore while awake, since snoring implies the respiratory induced vibration of glottal tissues as associated not only (and most usually) with sleep but also with relaxation or obesity.

  • Methodology for constructing MFN will provide us with a systematic means to detect such errors.


Snore sleep
snore Information Resources for Consumer Health sleep

  • Constructing MBN will give us the resources to do justice to the reason why such cases were included in the first place:

  • People can only snore when they are asleep and similar sentences belong precisely to the folk beliefs about medicine which MBN will document


Extracting sentences from online consumer health information sources
Extracting sentences from online consumer health information sources

  • In one experiment sentences were derived by researchers in medical informatics from factsheets on Airborne allergens in NIAID’s Health Information Publications and on Hay fever and perennial allergic rhinitis in the UK NetDoctor’s Diseases Encyclopedia.


Output sentences
Output sentences sources

  • use simple syntax and draw on natural-language terms used in original sources

  • Sentences containing anaphora, instructions, warnings, … are replaced by complete statements constructed via simple syntactic modifications – or ignored.


Output sentences1
Output Sentences sources

  • 1644 sentences produced (= 20 person hours of effort)500 sentences were subjected to a preliminary evaluation by pairs of medical students (on a score of 1-5 …)

  • 58% were rated by with a score of 2 x 5

  • but: measures for inter-rater agreement too low for these results to be statistically significant.


Validation methods

Medical BeliefNet sources

Medical FactNet

Validation methods

  • sources

  • A: filtering for intelligibility by non-experts

  • pool

  • B: filtering for non-expert assent C: filtering for validation by

  • experts


Validation methods1
Validation methods sources

  • sources

  • filtering for intelligibility by non-experts

  • pool

  • filtering for non-expert assent filtering for validation by experts


This will provide an empirical delineation of the scope of natural language non expert language
This will provide an empirical delineation of the sourcesscope of ‘natural language’ (non-expert language)

  • Natural language = language (typical) non-experts (think they) can understand

  • Does ‘depillation’ belong to natural language? ‘suppuration’? ‘auto-immune’? ‘tomograph’? ‘hypertension’? ‘radiologist’?


Method
Method sources

  • 400 x 250 statements will be rated for understandability by two participants, making for a total of 200,000 ratings in response to the question:

  • on a scale from 1-5, would you describe this sentence as hard to understand or easy to understand?

  • Raters will be encouraged not to reflect on successive statements

  • Only those statements which receive a score of at least 4 from each of 2 subjects will pass on to the pool


Validation methods2
Validation methods sources

  • sources

  • filtering for intelligibility by non-experts

  • pool

  • filtering for non-expert assent filtering for validation by

  • experts


Method1
Method sources

  • Collections of 200 statements from the pool will be rated for assent by each of 250 participants.

  • on a scale from 1-5, would you describe this sentence with the words do not agree at all … agree completely?

  • Raters will be encouraged to reflect upon their answers if necessary

  • Statements receiving a score of at least 4 from each of two raters will be stored as components of Medical BeliefNet (MBN).


Validation methods3
Validation methods sources

  • sources

  • filtering for intelligibility by non-experts

  • pool

  • filtering for non-expert assent filtering for validation by

  • experts


Method2
Method sources

  • Raters, selected from medical faculty and advanced medical students, will be subject to a pre-evaluation as follows.

  • A set of 40 sentences in the pool will be validated as true or false by the relevant specialists

  • Only those candidate participants with very high scores in matching these validations will be selected to serve as raters in the validations of sentences for MFN.


Method3
Method sources

  • Rating for MFN will involve no time constraints

  • raters will be encouraged to use reference works

  • On a scale from 1-5, how strongly do you believe this statement?

  • Only sentences receiveing scores of 5 from each of two raters will be added to the MFN database.

  • Thus in relation to those sentences which receive a score of less than 5, raters will be encouraged to propose alternative statements, which will be used as new input to the non-expert phase for assessment.


Training of expert raters for mfn
Training of expert raters for MFN sources

  • will include e.g. guidance as to the treatment of statements which relate only to what holds for the most part or in most cases.

  • people with a cold sometimes sneeze

  • could mean either: not all people with a cold sneeze, contradicting the fact that sneezing is a mandatory symptom for a cold,

  • or all people with a cold sneeze, but not all the time, which would be rated as correct.


Evaluation of mwn and mfn
Evaluation of MWN and MFN sources

  • users of a consumer health information portal will be randomly assigned to one of four groups: 1. access to the unsupplemented portal; 2. access also to MWN, 3. access to MFN, 4. access to both MWN and MFN

  • then apply Saracevic Kantor method for evaluating user satisfaction with internet query services


Future work
Future work sources

  • application of MBN/MFN methodology to evaluate the reliability of the medical knowledge of different non-expert communities

  • by preserving data pertaining to the sources of entries in MBN it will be possible to keep track of specific kinds of false beliefs as originating in specific kinds of informants. This may prove a valuable source of information in targeting specific groups for specific types of remedial medical education.


Future work1
Future work sources

  • experiments in the tradition of E. Rosch to investigate how the domain of medical phenomena is conceptualized by non-expert human subjects ()

  • Basic level words: tomato, cabbage vs.

  • bean vs. vegetable (too general) / cherry tomato (too specific)

  • what is the basic level of lexical specification in the domain of medical phenomena?

  • what are the basic kinds in the ontology of medicine of natural-language-using subjects?


Different roles of mfn and mbn
Different roles of MFN and MBN sources

  • MFN associated with constructing practical tools designed to assist users in coming to believe what is true

  • MBN associated with researchregarding what people believe about medical phenomena.


Towards a comprehensive assay of consumer health knowledge
Towards a comprehensive assay of consumer health knowledge sources

  • Ultimate goal: to document in an ontologically coherent fashion the entirety of the medical knowledge that is capable of being understood by average adult consumers of healthcare services in the United States today.


Just as english wordnet
Just as English WordNet sources

  • serves as an interlingual index between wordnets in different languages,

  • so MWN and MFN can function as an inter-ontology index between different expert factnets prepared for different parts of technical biomedical knowledge

  • NLM goal of expert medical factnet


Aristotle
ARistOTLE sources

  • Aggregative Realist Ontology of Total Language


The End sources


ad