Intelligent information directory system for clinical documents
Download
1 / 36

Intelligent Information Directory System for Clinical Documents - PowerPoint PPT Presentation


  • 99 Views
  • Uploaded on
  • Presentation posted in: General

Intelligent Information Directory System for Clinical Documents. Qinghua Zou 6/3/2005. Dr. Wesley W. Chu (Advisor). Keyword Search Problems Hard to compose good keywords Lack an outlook of the content Interchangeable words. When searching clinical reports. Intelligent Directory System.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Intelligent Information Directory System for Clinical Documents

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Intelligent Information Directory System for Clinical Documents

Qinghua Zou

6/3/2005

Dr. Wesley W. Chu (Advisor)


Keyword Search

Problems

Hard to compose good keywords

Lack an outlook of the content

Interchangeable words

When searching clinical reports


Intelligent Directory System

  • 1. Overview

  • 2. Extracting Key Concepts

  • 3. Mining Topics

  • 4. Building Directories

  • 5. Searching

  • 6. Conclusion


1. System Overview


2. Concept Extraction

  • 2.1 Introduction

  • 2.2 Our approach: IndexFinder

    • Index Phase (Offline)

    • Search Phase (Real Time)

  • 2.3 Experiments

  • 2.4 Summary


Clinical Texts

  • Extract key info.

  • Standard terms

2.1 Motivation

  • Clinical texts are valuable in medical practice

    • Search relevant reports

    • Search similar patients

  • What is key information?

  • UMLS provides

    • key medical concepts

  • Our Goal

    • Extract UMLS concepts from clinical texts


UMLS

NLP Parser

Noun phrases

Mapping

ip

dp

i1

  • lambs

  • oats

UMLS Concepts

lambs

i0

vp

v0

dp

will

eat

oats

2.1 Previous Approaches

Free text


2.1 Problems of Previous Approaches

  • Concepts cannot be discovered if they are not in a single noun phrase.

    • E.g. In “second, third, and fourth ribs”, “Secondrib” can not be discovered.

  • Difficult to scale to large text computing.

    • Natural language processing requires significant computing resources


Our approach: UMLSfree text

Free text

NLP Parser

Index phase

(offline)

UMLS

2GB

Noun phrases

Indexing

Mapping

Concepts

Index Data

~80MB

UMLS

Extracting

Filtering

concepts

Search phase

(real time)

Free text

2.2 Our Approach: IndexFinder

Previous: free textUMLS

Suppose UMLS contains only

“Lung cancer”

We would discard all words in the text except “lung” and “cancer”.


2.2 Our Approach: What’s New?

  • Knowledge-based approach

    • Using the compact index data without using any database system

    • Permuting words in a sentence to generate UMLS concept candidates.

    • Using filters to eliminate irrelevant concepts.


2.2 Concept Candidates Generation

Assumptions

  • Knowledge base provides a phrase table.

  • Each phrase (concept) is a set of words.

  • An input text T is represented as a set of words.

    Goal

  • Combining words in T to generate concept candidates

    Example

  • T={D,E,F}

    • Answer: 5


2.2 Search Phase: Filtering

Use filters to eliminate irrelevant concepts

  • Syntactic filter:

    • Word combination is limited within a sentence.

  • Semantic filter:

    • Filter out irrelevant concepts using semantic types (e.g. body part, disease, treatment, diagnose).

    • Filter out general concepts using the ISA relationship and keep the more specific ones.


MetaMap

IndexFinder

2.3 Experiment Comparison with MetaMap [3]

Input:A small mass was found in the left hilum of the lung.


2.4 Summary

  • An efficient method that maps from UMLS to free text for extracting concepts without using any database system.

  • Syntactic and semantic filters are used to eliminate irrelevant candidates.

  • IndexFinder is able to find more specific concepts than NLP approaches.

  • IndexFinder is scalable and can be operated in real time.


3. Mining Topics: SmartMiner

  • 3.1 Introduction

  • 3.2 Search Space

  • 3.3 SmartMiner

  • 3.4 Experiment

  • 3.5 Summary


3.1 Introduction

  • A Topic (assumption)

    • a set of concepts

    • a frequent pattern

  • Finding topics by data mining

    • Frequent patterns, or

    • Maximal frequent patterns

  • Require efficient data mining


Dataset

id: item set

a, b, c, d, e,

1: a b c d e

2: a b c d

3: b c d

4: b e

5: c d e

ab, ac, ad, bc, bd, be, cd, ce, de,

abc, abd, acd, bcd, cde,

abcd

MinSup=2

3.1 Data Mining Problem

What itemsets are frequent itemsets (FI)?

Maximal frequent itemset(MFI):

No superset is frequent.

MFI

abcd, be, cde


3.1 Why MFI not FI?

  • Mining FI is infeasible when there exists long FI.

    • E.g, Suppose we have a 20-item frequent set a1 a2 …a20. All of its subset are frequent, i.e., 220=1,048,576

  • Mining MFI is fast and we can generate all the FI.


3.1 Previous work

  • Superset checking.

    • A study shows that CPU spends 40% time for superset checking.

  • Search tree is too large

  • A large number of support counting

  • Need more efficient method


simplify

Ø:abcde

:abcde

What is the space of ?

ab:cd

3.2 Search space

Given 5 items: a, b, c, d, e.

What is the search space?

Ø, a, b, c, d, e, ab, ac, ad, ae, bc, …, abcde

We use “head:tail” to denote the space as:

ab, abc, abd, abcd


3.2 Space decomposition

For a space :abcde, if abcg is frequent,

  • Then, the known space

    • any subset of abc is frequent

    • known space is :abc

  • The unknown space are:

    • Any itemsets contain d or e.

    • d:abceande:abc

  • :abcde = d:abce + e:abc + :abc


A1

A1

B1

B2

Bn

B1

B’

Creating B2 before exploring B1

Creating B’ after exploring B1

3.3 The basic idea

Using information from B to prune the space at B’

(b) SmartMiner Strategy

(a) Previous approach

SmartMiner takes advantages of the information from previous steps.


3.3 The tail information

  • For the space :abcde, if we know abcf, abcg and abfg are frequent, then we project them to the space.

    • abcfabc.

    • abcgabc.

    • abfgab.

  • Thus

    • Tinf(abcf,abcg, abfg|:abcde)={abc}


3.4 Running time on Mushroom


3.5 Summary

  • SmartMiner uses tail information to guide the mining, efficient since

    • A smaller search tree.

    • No superset checking.

    • Reduces the number of support counting.


4. Building Directories

  • 4.1 Introduction

  • 4.2 Knowledge Hierarchies

  • 4.3 User Specification

  • 4.4 Directory Generation

  • 4.5 Integration various directories

  • 4.6 Summary


Three Inputs

Topics

Key Content

Knowledge trees

Meaningful

User specs

Customized

4.1 Introduction


4.2 Knowledge Hierarchies

  • UMLS concept hierarchies

    • PA: parent-child relationship

    • RA: rather-than relationship

  • Problems

    • A concept: several parents, different granularity

      • [lung cancer] [Neoplasms, Respiratory Tract]

      • [lung cancer] [Neoplasms, Respiratory System]

    • A concept: hundreds of paths to roots

      • [lung cancer]: 233 different paths in UMLS by PA


4.2 Select Proper Hierarchies

  • Set source preference order, e.g

    • [disease]: ICD9>SNOMED>MeSH

    • [body part]: SNOMED>ICD9

  • Select proper granularity

    • C: a set of concepts; n: a path node

    • Score function for selecting the node n

      • S(n)=|{ci| cin, ci in C}|

  • Expert review


4.3 User Specifications

  • A good directory ~ usage pattern

  • User spec  usage pattern

  • User may have different specs

  • A spec: a series of knowledge names

    • [disease] + [body part], or

    • [body part] + [disease]

  • Build a directory for a spec by the ordering


4.4 Directory GenerationAn example

User spec 1: d + p

[disease] + [body part]

User spec 2: p + d

[body part] + [disease]


4.4 ~ An example

d + p

1

1

1

1

p + d

1

1

1

1


4.4 ~ Algorithm


For each Di, get all dir paths to Di

A Di is tree: XML

Key words can associate with tree nodes

Query: xpath

Exist redundant information

4.5 Integration various directories


4.5 simplified model

  • Keep only the first level knowledge trees

  • For //d6//p6, we use XPath query

    //doc[//d6 and //p6]

  • Size smaller, require some computation


4.6 Summary

  • Build directory by

    • Topics

    • Knowledge hierarchies

    • User specifications

  • Mapping directories to XML

    • By collecting directory paths for each document

    • Leverage on existing XML technologies


ad
  • Login