Knowledge engineering and s emi automatic population of medical ontologies using nlp methodologies
Download
1 / 16

Knowledge Engineering - PowerPoint PPT Presentation


  • 368 Views
  • Updated On :

Knowledge Engineering and S emi-Automatic Population of Medical Ontologies Using NLP Methodologies. Munich 11.06.2007 Pinar Oezden Wennerberg [email protected] Agenda. Knowledge Engineering and Ontology Definitions, methodologies, guidelines

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Knowledge Engineering ' - LeeJohn


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Knowledge engineering and s emi automatic population of medical ontologies using nlp methodologies l.jpg

Knowledge Engineering and Semi-Automatic Population of Medical Ontologies Using NLP Methodologies

Munich 11.06.2007

Pinar Oezden Wennerberg

[email protected]


Agenda l.jpg
Agenda

  • Knowledge Engineering and Ontology

    • Definitions, methodologies, guidelines

  • Medical Terminology and Natural Language Processing (NLP)

    • The problem of medical terminology

    • The context: users, tasks, types of information in the medical domain

    • The role of NLP and knowledge engineering

  • Motivation for Semi-Automatic Ontology Population

    • The knowledge acquisition bottleneck

    • Vast amount of knowledge available in (un- / semi-)structured text, WWW, databases etc.

  • One example approach

    • Ontology population via Supervised Machine Learning (ML)

  • Challenges


Knowledge engineering and ontologies l.jpg
Knowledge Engineering and Ontologies

  • Some Definitions:

    • Humans and software agents need knowledge about the world in order to reach good decisions

    • Such knowledge is typically stored in knowledgebases

    • „Knowledge engineering is the process of building a knowledgebase“

    • „A knowledge engineer is someone, who

      • investigates a particular domain,

      • determines what concepts and relations are important in that domain,

      • and creates a formal representation of objects and relations in that domain“.

        (Russel & Norvig, 1995)


Knowledge engineering and ontologies4 l.jpg
Knowledge Engineering and Ontologies

  • An ontology specifies a finite, controlled, extensible and machine processable vocabulary for a given knowledgebase

    • Consists of concepts, properties, relations, axioms…

  • Knowledge engineering guidelines

    • Decide what to talk about and on the vocabulary,

    • Encode general knowledge and a specific problem case

    • Execute queries and verify inference

      (Russel & Norvig 1995)


Medical terminologies and natural language processing nlp l.jpg
Medical Terminologies and Natural Language Processing (NLP)

  • Problem statement:

    • Numerous heterogenious medical terminologies and coding schemes exist that need to interoperate

      • e.g. Systemized Nomenclature of Medicine (SNOMED) for coding paptient notes, ICD (International Classification of Diseases), ICD-9-CM for billing purposes,RIZIV, IDEWE, ICPC-2, ATC etc.

    • Existing efforts UMLS, Galen, MeSH, etc.


Medical terminologies and natural language processing nlp6 l.jpg
Medical Terminologies and Natural Language Processing (NLP)

  • Definition of context :

    • Information types to be collected are about

      • Individuals (e.g. medical records)

      • Groups (e.g. data about epidemiology, public health…)

      • Institutions (e.g. planning, management in hospitals, clinics)

      • Domain specific knowledge (e.g. state-of-the-art publications, proceedings)

    • Domain relevant tasks

      • Data entry, query and retrieval about patients

      • Information sharing and integration from different applications and medical records


Medical terminologies and natural language processing nlp7 l.jpg
Medical Terminologies and Natural Language Processing (NLP)

Question

Answering

Information

Extraction

Knowledge

Representation

and Reasoning

Natural Language Processing

Machine

Learning

Information Retrieval

Knowledge Discovery,

Text Mining

Ontology

Engineering

Adapted from Jena University

www.julielab.de


Motivation for semi automatic ontology population l.jpg
Motivation for Semi-Automatic Ontology Population

  • The knowledge acquisition bottleneck

    • Ideally the knowledge engineer interviews the knowledge expert to get educated about the domain i.e. to acquire knowledge

       expensive in time and resources

       domain experts not alwaysavailable

  • Availability of vast amount knowledge

    • In resources such as medical databases, journals, publications, conference proceedings, medical reports etc.

    • World Wide Web


Ontology population via supervised machine learning l.jpg
Ontology Population via Supervised Machine Learning

  • Problem statement

    • Identify and extract relevant knowledge (terms, phrases, relations, facts) in text e.g.

      • Terms: “health disorder”, “malfunction”, “sickness”, “illness”, “maladie”, “Krankheit”  Disease

      • Smoking causes cancer  <Smoking, Cancer>

  • Goal

    • Assign them to the appropriate concepts of the ontology as instance

      • Concept: Disease

      • Relation: causes


Ontology population via supervised machine learning10 l.jpg
Ontology Population via Supervised Machine Learning

  • Processes

    • Annotate (i.e. supervised)

      • <CAU>Smoking<CAU/> <CAU-R>causes</CAU-R> <DIS>cancer</DIS>

      • CAU: DiseaseCause, CAU-R: causalRelation, DIS: Disease

    • Learn and extract from a training set (i.e. ideal world)

    • Extract from the test set (i.e. unknown world)

      • Apply the learned rules on new documents to discover and extract new knowledge


Ontology population via supervised machine learning11 l.jpg
Ontology Population via Supervised Machine Learning

  • Learn and extract from a training set (i.e. ideal world)

    • Recognize syntactic constructs such as NPs, VPs, PPs

    • Generate extraction rules

      • Rule for concept Disease

        • Disease:- <NP “smoking”><VP “causes”><NP DIS >

      • Rule for concept DiseaseCause

        • DiseaseCause:- <NP CAU><VP “causes”><NP “cancer” >

      • Rule for relation causalRelation

        • causalRelation:- <NP “smoking”><VP CAU-R><NP “cancer” >

    • Classify

      • Disease: cancer

      • DiseaseCause: smoking

      • causalRelation: causes


Ontology population via supervised machine learning12 l.jpg
Ontology Population via Supervised Machine Learning

  • Possible problems

    • More than one value was extracted for a given relation

    • Entities from different classes were extracted (multiple concept assignment i.e. ambiguity)

    • Nothing was extracted

  • Possible solutions

    • Present the user all possible values, let the user decide

    • To assist user with the decision process by assigning confidence scores to possible values

      • i.e. how much does the system believe what it suggests is relevant/true

      • Provide context information via text highlighting to justify the system’s confidence

    • Provide empty data entry slots for users to enter their knowledge


Challenges l.jpg
Challenges

  • General challenges

    • It is difficult to eliminate the knowledge acquisition problem entirely

      • Due to the sensitivity of the domain (human health) the knowledge experts cannot be completely avoided

      • Computer scientists need to work together with domain experts to a certain extent

    • Systems should be able to be used by non-technicians

    • Multilinguality

      • Healthcare workers, patients, administrators should be able to have access to information in their own language


Challenges14 l.jpg
Challenges

  • Knowledge/ontology engineering specific challenges

    • Implicit information (typical for natural language) i.e. not machine-processable (not explicit)

    • Different levels of detail (granularity) is required to meet different expectations

      • i.e. provide sufficient detail but abstract away irrelevencies

    • Poly-hierachies to support multiple views

      • may lead to ambiguities, contradictions

    • Adaptability, extensibility for changing user demands and for standards

    • Expressibility vs. computational tractibility

    • Achieving consensus between practitioners


Questions l.jpg
Questions?

  • Evaluation

    • How do we know if we have a good system?

    • Practitioners to evaluate the effficiency and reliability of the developedsystems?



ad