Intelligent information directory system for clinical documents
Download
1 / 36

Intelligent Information Directory System for Clinical Documents - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

Intelligent Information Directory System for Clinical Documents. Qinghua Zou 6/3/2005. Dr. Wesley W. Chu (Advisor). Keyword Search Problems Hard to compose good keywords Lack an outlook of the content Interchangeable words. When searching clinical reports. Intelligent Directory System.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Intelligent Information Directory System for Clinical Documents' - nellis


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Intelligent information directory system for clinical documents

Intelligent Information Directory System for Clinical Documents

Qinghua Zou

6/3/2005

Dr. Wesley W. Chu (Advisor)


When searching clinical reports

Keyword Search Documents

Problems

Hard to compose good keywords

Lack an outlook of the content

Interchangeable words

When searching clinical reports


Intelligent directory system
Intelligent Directory System Documents

  • 1. Overview

  • 2. Extracting Key Concepts

  • 3. Mining Topics

  • 4. Building Directories

  • 5. Searching

  • 6. Conclusion



2 concept extraction
2. Concept Extraction Documents

  • 2.1 Introduction

  • 2.2 Our approach: IndexFinder

    • Index Phase (Offline)

    • Search Phase (Real Time)

  • 2.3 Experiments

  • 2.4 Summary


2 1 motivation

Clinical Texts Documents

  • Extract key info.

  • Standard terms

2.1 Motivation

  • Clinical texts are valuable in medical practice

    • Search relevant reports

    • Search similar patients

  • What is key information?

  • UMLS provides

    • key medical concepts

  • Our Goal

    • Extract UMLS concepts from clinical texts


2 1 previous approaches

UMLS Documents

NLP Parser

Noun phrases

Mapping

ip

dp

i1

  • lambs

  • oats

UMLS Concepts

lambs

i0

vp

v0

dp

will

eat

oats

2.1 Previous Approaches

Free text


2 1 problems of previous approaches
2.1 Problems of Previous Approaches Documents

  • Concepts cannot be discovered if they are not in a single noun phrase.

    • E.g. In “second, third, and fourth ribs”, “Secondrib” can not be discovered.

  • Difficult to scale to large text computing.

    • Natural language processing requires significant computing resources


2 2 our approach indexfinder

Our approach: DocumentsUMLSfree text

Free text

NLP Parser

Index phase

(offline)

UMLS

2GB

Noun phrases

Indexing

Mapping

Concepts

Index Data

~80MB

UMLS

Extracting

Filtering

concepts

Search phase

(real time)

Free text

2.2 Our Approach: IndexFinder

Previous: free textUMLS

Suppose UMLS contains only

“Lung cancer”

We would discard all words in the text except “lung” and “cancer”.


2 2 our approach what s new
2.2 Our Approach: What’s New? Documents

  • Knowledge-based approach

    • Using the compact index data without using any database system

    • Permuting words in a sentence to generate UMLS concept candidates.

    • Using filters to eliminate irrelevant concepts.


2.2 Documents Concept Candidates Generation

Assumptions

  • Knowledge base provides a phrase table.

  • Each phrase (concept) is a set of words.

  • An input text T is represented as a set of words.

    Goal

  • Combining words in T to generate concept candidates

    Example

  • T={D,E,F}

    • Answer: 5


2 2 search phase filtering
2.2 Search Phase: Filtering Documents

Use filters to eliminate irrelevant concepts

  • Syntactic filter:

    • Word combination is limited within a sentence.

  • Semantic filter:

    • Filter out irrelevant concepts using semantic types (e.g. body part, disease, treatment, diagnose).

    • Filter out general concepts using the ISA relationship and keep the more specific ones.


MetaMap Documents

IndexFinder

2.3 Experiment Comparison with MetaMap [3]

Input:A small mass was found in the left hilum of the lung.


2 4 summary
2.4 Summary Documents

  • An efficient method that maps from UMLS to free text for extracting concepts without using any database system.

  • Syntactic and semantic filters are used to eliminate irrelevant candidates.

  • IndexFinder is able to find more specific concepts than NLP approaches.

  • IndexFinder is scalable and can be operated in real time.


3 mining topics smartminer
3. Mining Topics: SmartMiner Documents

  • 3.1 Introduction

  • 3.2 Search Space

  • 3.3 SmartMiner

  • 3.4 Experiment

  • 3.5 Summary


3 1 introduction
3.1 Introduction Documents

  • A Topic (assumption)

    • a set of concepts

    • a frequent pattern

  • Finding topics by data mining

    • Frequent patterns, or

    • Maximal frequent patterns

  • Require efficient data mining


3 1 data mining problem

Dataset Documents

id: item set

a, b, c, d, e,

1: a b c d e

2: a b c d

3: b c d

4: b e

5: c d e

ab, ac, ad, bc, bd, be, cd, ce, de,

abc, abd, acd, bcd, cde,

abcd

MinSup=2

3.1 Data Mining Problem

What itemsets are frequent itemsets (FI)?

Maximal frequent itemset(MFI):

No superset is frequent.

MFI

abcd, be, cde


3 1 why mfi not fi
3.1 Why MFI not FI? Documents

  • Mining FI is infeasible when there exists long FI.

    • E.g, Suppose we have a 20-item frequent set a1 a2 …a20. All of its subset are frequent, i.e., 220=1,048,576

  • Mining MFI is fast and we can generate all the FI.


3 1 previous work
3.1 Previous work Documents

  • Superset checking.

    • A study shows that CPU spends 40% time for superset checking.

  • Search tree is too large

  • A large number of support counting

  • Need more efficient method


3 2 search space

simplify Documents

Ø:abcde

:abcde

What is the space of ?

ab:cd

3.2 Search space

Given 5 items: a, b, c, d, e.

What is the search space?

Ø, a, b, c, d, e, ab, ac, ad, ae, bc, …, abcde

We use “head:tail” to denote the space as:

ab, abc, abd, abcd


3 2 space decomposition
3.2 Space decomposition Documents

For a space :abcde, if abcg is frequent,

  • Then, the known space

    • any subset of abc is frequent

    • known space is :abc

  • The unknown space are:

    • Any itemsets contain d or e.

    • d:abceande:abc

  • :abcde = d:abce + e:abc + :abc


3 3 the basic idea

A Documents1

A1

B1

B2

Bn

B1

B’

Creating B2 before exploring B1

Creating B’ after exploring B1

3.3 The basic idea

Using information from B to prune the space at B’

(b) SmartMiner Strategy

(a) Previous approach

SmartMiner takes advantages of the information from previous steps.


3 3 the tail information
3.3 The tail information Documents

  • For the space :abcde, if we know abcf, abcg and abfg are frequent, then we project them to the space.

    • abcfabc.

    • abcgabc.

    • abfgab.

  • Thus

    • Tinf(abcf,abcg, abfg|:abcde)={abc}



3 5 summary
3.5 Summary Documents

  • SmartMiner uses tail information to guide the mining, efficient since

    • A smaller search tree.

    • No superset checking.

    • Reduces the number of support counting.


4 building directories
4. Building Directories Documents

  • 4.1 Introduction

  • 4.2 Knowledge Hierarchies

  • 4.3 User Specification

  • 4.4 Directory Generation

  • 4.5 Integration various directories

  • 4.6 Summary


4 1 introduction

Three Inputs Documents

Topics

Key Content

Knowledge trees

Meaningful

User specs

Customized

4.1 Introduction


4 2 knowledge hierarchies
4.2 Knowledge Hierarchies Documents

  • UMLS concept hierarchies

    • PA: parent-child relationship

    • RA: rather-than relationship

  • Problems

    • A concept: several parents, different granularity

      • [lung cancer] [Neoplasms, Respiratory Tract]

      • [lung cancer] [Neoplasms, Respiratory System]

    • A concept: hundreds of paths to roots

      • [lung cancer]: 233 different paths in UMLS by PA


4 2 select proper hierarchies
4.2 Select Proper Hierarchies Documents

  • Set source preference order, e.g

    • [disease]: ICD9>SNOMED>MeSH

    • [body part]: SNOMED>ICD9

  • Select proper granularity

    • C: a set of concepts; n: a path node

    • Score function for selecting the node n

      • S(n)=|{ci| cin, ci in C}|

  • Expert review


4 3 user specifications
4.3 User Specifications Documents

  • A good directory ~ usage pattern

  • User spec  usage pattern

  • User may have different specs

  • A spec: a series of knowledge names

    • [disease] + [body part], or

    • [body part] + [disease]

  • Build a directory for a spec by the ordering


4 4 directory generation an example
4.4 Directory Generation DocumentsAn example

User spec 1: d + p

[disease] + [body part]

User spec 2: p + d

[body part] + [disease]


4 4 an example
4.4 ~ An example Documents

d + p

1

1

1

1

p + d

1

1

1

1



4 5 integration various directories

For each DocumentsDi, get all dir paths to Di

A Di is tree: XML

Key words can associate with tree nodes

Query: xpath

Exist redundant information

4.5 Integration various directories


4 5 simplified model
4.5 simplified model Documents

  • Keep only the first level knowledge trees

  • For //d6//p6, we use XPath query

    //doc[//d6 and //p6]

  • Size smaller, require some computation


4 6 summary
4.6 Summary Documents

  • Build directory by

    • Topics

    • Knowledge hierarchies

    • User specifications

  • Mapping directories to XML

    • By collecting directory paths for each document

    • Leverage on existing XML technologies


ad