Loading in 2 Seconds...
Loading in 2 Seconds...
VUB Leerstoel 2009-2010 Theme: Ontology for Ontologies, theory and applications Ontologies and Natural Language Understanding May 20, 2010; 17h00-19h00 Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels Room D2.01 . Prof. Werner CEUSTERS, MD
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
VUB Leerstoel 2009-2010Theme: Ontology for Ontologies, theory and applicationsOntologies and Natural Language UnderstandingMay 20, 2010; 17h00-19h00Vrije Universiteit Brussel, Pleinlaan 2, 1050 BrusselsRoom D2.01 Prof. Werner CEUSTERS, MD Ontology Research Group, Center of Excellence in Bioinformatics and Life Sciences and Department of Psychiatry, University at Buffalo, NY, USA
Knowledge Representation Informatics Linguistics Ontology Philosophy Computational Linguistics Realism-Based Ontology Medical Natural Language Understanding Electronic Health Records Translational Research Referent Tracking Medicine Pharmacogenomics Performing Arts Defense & Intelligence Biology Pharmacology Context of this lecture series
Today’s topic • May 20: ontologies and Natural Language Understanding Informatics Linguistics Computational Linguistics Realism-Based Ontology Medical Natural Language Understanding Electronic Health Records Medicine
A human being with function enhancing electronic implants A ‘doctor’ who is in fact some sort of computer program capable of making medical diagnoses A tiny scanner capable of detecting bodily anomalies Amazing technology Flawless communication between a human and a computer
Or not amazing? … towards a bionic eye http://bionicvision.org.au/
Or not ? … mobile diagnostics SilhouetteMobile™ scans and stores information about a wound's width and depth, which helps nurses track healing over time as new tissue fills in the injury GlucoPack™ reads and transmits glucose readings
Or not ? … Transhumanism • "Philosophies of life that seek the continuation and acceleration of the evolution of intelligent life beyond its currently human form and human limitations by means of science and technology, guided by life-promoting principles and values." Max More
to … mind uploading ? Ray Kurzweil receives National Medal of Technology (1999).
But for today:How to communicate with computers naturally ? The supercomputer HAL from 2001: A Space Odyssey.
Michael Scott’s solution http://aboulet.files.wordpress.com/2007/05/traveling-salesmen1.jpg
Fact: computers can only deal with a structured representation of reality: structured data: relational databases, spread sheets structured information: XML simulates context structured knowledge: rule-based knowledge systems Conclusion: a need for structured data entry My interest in NLU: the medical informatics dogma
Current technical solutions: rigid data entry forms coding and classification systems But: the description of biological variability requires the flexibility of natural language and it is generally desirable not to interfere with the traditional manner of medical recording (Wiederhold, 1980) Initiatives to facilitate the entry of narrative data have focused on the control rather than the ease of data entry (Tanghe, 1997) Structured data entry
Loss of information qualitatively limited expressiveness of coding and classification systems, controlled vocabularies, and “traditional” medical terminologies use of purpose oriented systems don’t use data for another purpose than originally foreseen (J VDL) quantitatively to time-consuming to code all information manually Speech recognition and structured data entry forms are not best friends Drawbacks of structured data entry
The pilars of healthcare informatics • Clinical language • medical narrative • Clinical terminologies • coding and classification systems • nomenclatures • formal ontologies • Electronic Healthcare Record Systems
Text based EHCRS able to generate structured data An EHCR exclusively build around a collection of coded data generated out of free text A multimedia EHCRS with clinical narrative registration and structured data generation A multimedia EHCRS with structured data entry and text generation An EHCR exclusively build around texts generated out of controled vocabularies An EHCR exclusively build around a collection of structured data able to generate text The possibilities A multimedia EHCRS with clinical narrative registration and structured data generation
Main issues of MNLU • Medical natural language understanding is: • Making computers understand medical language • Allowing computers to turn unstructured texts in structured information • Medical NLU is NOT: • medical reasoning performed by computers • reducing the richness of clinical language to a closed set of codes
contextual spell checking information retrieval topic selection relevance ranking coding and classification software agents for clinical studies unstructured data registration for structured reporting Typical examples of MNLU
Coding patient data Structured information extraction from unstructured clinical notes Clinical protocols and guidelines Assessing patient eligibility for clinical trial entry Triggering and alerts Linking case descriptions to scientific literature Easy access to content ... towards a medical semantic web Areas for application of MNLU
A wealth of communication related applications (1) • Speech as input: • voice recognition: • who is the sender? • speech recognition: • dictation: what is the corresponding text? • irrespective of meaning • command and control • language learning (pronunciation checking) • question answering • spoken natural language understanding
A wealth of communication related applications (2) • Text as input: • speech generation (text-to-speech) • spell checking • grammar checking • plagiarism detection • indexing – semantic indexing – topic detection • document retrieval • return documents that tell me when Bonaparte was born • information retrieval • find in documents the date Bonaparte was born and return only the date • clinical coding
Speech generation (1) She lives near the highway where three lives were lost.
Speech generation (2) Chapter III is about Henry III.
Text-to-speech basics http://upload.wikimedia.org/wikipedia/en/a/af/Festival_TTS_Telugu.jpg
Simple speech recognition algorithm raw speech signal analysis acoustic models sequential constraints train speech frames word sequence frame scores acoustic analysis time alignment segmentation From the INRIA Parole project
Dialogue systems with automatic translation http://www.oxygen.lcs.mit.edu/images/Speech.jpg
The disambiguation problem • Some examples: • ‘lives’: from ‘to live’ or plural of ‘life’ • ‘III’: as ‘three’ or ‘the third’ • ‘bow’: the weapon or from ‘to bow’ • Statistical models (n-grams): • most often sufficient • quite fast analysis • Syntactic analysis • Semantic analysis (deep or shallow)
A toy ontology for communication (1) • Patterned particular (PP): • piece of text: combination of characters • sound wave • series of signs in sign language, smoke • combination and sequence of smells ? • Some sender which generated a PP with the intention to provoke something in some receiver, the PP thus becoming a linguistic patterned particular (LPP) • standard messages, questions, commands • carry meaning directly encoded in the message • poems, lies, deceptions, nonsense: • no or partial directly encoded information • Being a PP is not sufficient to be an LPP. There has to be a sender! • a bird or insect flying in a pattern that looks like an LPP in some language
A toy ontology for communication (2) • Aboutness relation from certain elementary LPPs to real world entities when created under certain circumstances • ‘me’, ‘I’, ‘mine’ • ‘current’, ‘president’, United States’, ‘king, ‘France’ • Pattern types • morphologic, syntactic, semantic and discourse conventions • ‘current President of the United States’ • ‘current king of France’
A toy ontology for communication (3) • Questionable entities: • ‘propositions’ • sort of factual, linguistically undressed statements about the world • ‘bare meanings’
Text analysis ‘The doctor checks Seven of Nine’s blood pressure’
sentence verb phrase noun phrase noun phrase prepositional phrase noun phrase det noun verb det compound noun prep person name Syntactic analysis The doctor checks the blood pressure of Seven of Nine
checking sentence verb phrase has-object has-agent noun phrase noun phrase prepositional phrase noun phrase det noun verb det compound noun prep person name person clinical sign person doctor checks Seven of Nine belongs-to Semantic analysis The doctor checks the blood pressure blood pressure of Seven of Nine
sentence verb phrase checking agent instrument object noun phrase noun phrase noun phrase det noun verb det noun prep noun det The doctor uses an instrument a The doctor examines the patient with hammer
checking agent object noun phrase Here the patient has the hammer ! sentence noun phrase verb phrase prepositional phrase noun phrase noun phrase det noun verb det noun prep noun det a The doctor examines the patient with hammer
The problem of reference • ‘The surgeon examined Maria. She found a small tumor on the left side of her liver. She had it removed three weeks later.’ • Ambiguities: • who denotes the first ‘she’: the surgeon or Maria ? • on whose liver was the tumor found ? • who denotes the second ‘she’: the surgeon or Maria ? • what was removed: the tumor or the liver ? • Here ontology can come to aid.
Ontologies and NLP • A two-way collaboration: • using NLP techniques to assist the development of ontologies, • using ontologies to make better NLP applications, • bootstrapping: NLP applications that require ontologies in some stage and intend to make these ontologies better.
C-Tex: corpus-based term extraction • Based on Deniz Yuret’s PhD thesis • good news: (a particular) language independent automatic linguistic knowledge extractor • relationships between words • grammar generation • term extraction • synonym / homonym detector (???) • bad news: • large corpora required (occ > 500 * different tokens) • big PC required (3.000.000 words/day, DOS, PII-350)
C-Tex: term extraction TERM Occurrences (5679 reps) • magnetic resonance 100 • san francisco 12 • invasive fungal sinusitis 7 • rhinosinusitis disability index 3 • intensive care unit 178 • food allergy 31 • th1 and th2 32 • positron emission 29
C-Tex grammar induction • Sentence encountered: • Sentence analyzed:
C-Tex’s linguistic principles • Words in natural language sentences: • tend to collocate with a certain strength, • are not linked in circular ways, • have links that don’t cross.
s6 s5 s4 s3 s2 s1 C-Tex processing I saw a man carry a telescope
C-Tex processing s6 s5 s4 s3 s2 s7 s1 I saw a man carry a telescope
C-Tex processing s6 s5 s4 s3 s8 s7 s1 I saw a man carry a telescope
C-Tex processing s11 s10 s9 s8 s7 s1 I saw a man carry a telescope
C-Tex processing s11 s10 s9 s8 s12 s7 s1 I saw a man carry a telescope
C-Tex processing s11 s10 s9 s12 s7 s1 I saw a man carry a telescope