slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Lexical Tools Briefing PowerPoint Presentation
Download Presentation
Lexical Tools Briefing

Loading in 2 Seconds...

play fullscreen
1 / 60

Lexical Tools Briefing - PowerPoint PPT Presentation


  • 181 Views
  • Uploaded on

Lexical Tools Briefing. The Lexical Systems Group NLM . LHNCBC . CGSB June, 2006. Table of Contents. Introduction Lexical Tools Lvg Norm Text Categorization Questions. Introduction. Introduction - LB. Introduction - Lexicon. Introduction - LC. Introduction - LA.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Lexical Tools Briefing' - chloe


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Lexical Tools Briefing

The Lexical Systems Group

NLM. LHNCBC. CGSB

June, 2006

slide2

Table of Contents

  • Introduction
  • Lexical Tools
  • Lvg
  • Norm
  • Text Categorization
  • Questions
slide14

Lexical Tools

Lexical

Tools

  • A suite of text utilities
slide15

Lexical Tools

Input

Lexical

Tools

  • A suite of text utilities take the given input
slide16

Output…

Output.3

Input

Lexical

Tools

Output.2

Output.1

Lexical Tools

  • A suite of text utilities that generate, mutate, and filter out lexical variants from the given input
slide17

Four Tools

Output…

Lvg

Norm

LuiNorm

WordIndex

Output.3

Input

Output.2

Output.1

slide18

Tool Types

  • Command line tools
    • lvg (Lexical Variants Generation)
    • norm
    • luiNorm
    • wordInd
  • Lexical Gui Tool (lgt)
  • Web Tools
  • Java API’s
slide19

Functions

  • Used in nature language processing for
    • aggressive text pattern matching
    • creating normalized and expanded terms
    • making word, term, phrase indexes
    • matching queries with indexed entries
    • increasing recall and/or precision
slide20

Facts

  • Release annually
  • 100% Java (since 2002)
  • Free distributed with open source code
  • Run on different platforms
  • One complete package
  • Documents & support
slide21

Lexical Variants Generation

Lexical Variants Generation

slide22

LVG, 2006

  • 58 flow components
  • 37 options
    • input filter options (3)
    • global behavior options (13)
    • flow specific options (2)
    • output filter options (19)
slide23

Flow Components

leave

leaves

leave

inflect

leaving

left

slide24

Command Line Tool

> lvg –f:i

leave

leave|leave|128|1|i|1|

leave|leave|128|512|i|1|

leave|leaves|128|8|i|1|

leave|left|1024|64|i|1|

leave|left|1024|32|i|1|

leave|leave|1024|1|i|1|

leave|leave|1024|262144|i|1|

leave|leave|1024|1024|i|1|

leave|leaves|1024|128|i|1|

leave|leaving|1024|16|i|1|

slide25

Fielded Output

> lvg –f:i

leave

1

leave

leave

i

1

128

|

|

|

|

|

Inflections

Input Term

Flow history

Output Term

Flow Number

Categories

slide26

A Serial Flow

lowercase

Strip diacritics

Input

term

Output

term

Remove possessive

Remove stop words

Strip punctuation

Word order sort

  • Flow components can be arranged so that the output of one is the input to another.
slide27

A Serial Flow - Example

> lvg –f:l:q:g:t:p:w

The Gougerot-Sjögren's Syndrome

The Gougerot-Sjögren's Syndrome|

gougerotsjogren syndrome|2047|

16777215|l+q+g+t+p+w|1|

slide28

Parallel Flows

Output

term

noOperation

Input

term

Uninflect

Output

terms

synonyms

  • Multiple flows can be defined
slide29

Parallel Flows - Example

> lvg –f:n –f:B:y

ear

ear|ear|2047|1048575|n|1|

ear|aural|1|1|B+y|2|

ear|auricularis|1|1|B+y|2|

ear|otic|1|1|B+y|2|

ear|otor|1|1|B+y|2|

slide30

Input Filter Options

Output

terms

Input

term

Take field 7 from the input

> lvg -f:u -t:7 -F:8:6

C0035440|ENG|S|L0035434|VW|S0003894|

Rheumatic carditis, acute

acute Rheumatic carditis|S0003894

slide31

Global Behavior Options

Output

terms

Input

term

Output

terms

> lvg -f:L –f:E –s:”\”

otitis

otitis\otitis\128\513\L\1

otitis\E0044452\128\513\E\2

Change separator to “\”

slide32

Input

term

Output

terms

Output Filter Options

> lvg -f:L -SC -SI

hot

hot|hot|<adj+verb>|<base+positive+infinitive+pres1p23p>|L|1|

Show the category and inflection names

slide33

Norm

  • Composed of 11 Lvg flow components to abstract away from:
    • case
    • punctuation
    • possessive forms
    • inflections
    • spelling variants
    • stop words
    • diacritics & ligatures
    • word order
slide34

Norm

g: remove genitives

rs: remove parenthetic plural forms

o: replace punctuation with spaces

t: strip stop words

q: strip diacritics

q2: split ligature

l: lowercase

B: uninflect each words in a term

Ct: retrieve citations

w: sort words by order

q4: get symbol names synonymy

slide35

Norm

Hodgkin's Diseases, NOS

g: remove genitives

rs: remove parenthetic plural forms

o: replace punctuation with spaces

t: strip stop words

q: strip diacritics

q2: split ligature

l: lowercase

B: uninflect each words in a term

Ct: retrieve citations

w: sort words by order

q4: get symbol names synonymy

slide36

Norm

Hodgkin's Diseases, NOS

g: remove genitives

Hodgkin Diseases, NOS

rs: remove parenthetic plural forms

o: replace punctuation with spaces

t: strip stop words

q: strip diacritics

q2: split ligature

l: lowercase

B: uninflect each words in a term

Ct: retrieve citations

w: sort words by order

q4: get symbol names synonymy

slide37

Norm

Hodgkin's Diseases, NOS

g: remove genitives

Hodgkin Diseases, NOS

rs: remove parenthetic plural forms

Hodgkin Diseases, NOS

o: replace punctuation with spaces

t: strip stop words

q: strip diacritics

q2: split ligature

l: lowercase

B: uninflect each words in a term

Ct: retrieve citations

w: sort words by order

q4: get symbol names synonymy

slide38

Norm

Hodgkin's Diseases, NOS

g: remove genitives

Hodgkin Diseases, NOS

rs: remove parenthetic plural forms

Hodgkin Diseases, NOS

o: replace punctuation with spaces

Hodgkin Diseases NOS

t: strip stop words

q: strip diacritics

q2: split ligature

l: lowercase

B: uninflect each words in a term

Ct: retrieve citations

w: sort words by order

q4: get symbol names synonymy

slide39

Norm

Hodgkin's Diseases, NOS

g: remove genitives

Hodgkin Diseases, NOS

rs: remove parenthetic plural forms

Hodgkin Diseases, NOS

o: replace punctuation with spaces

Hodgkin Diseases NOS

t: strip stop words

Hodgkin Diseases

q: strip diacritics

q2: split ligature

l: lowercase

B: uninflect each words in a term

Ct: retrieve citations

w: sort words by order

q4: get symbol names synonymy

slide40

Norm

Hodgkin's Diseases, NOS

g: remove genitives

Hodgkin Diseases, NOS

rs: remove parenthetic plural forms

Hodgkin Diseases, NOS

o: replace punctuation with spaces

Hodgkin Diseases NOS

t: strip stop words

Hodgkin Diseases

q: strip diacritics

Hodgkin Diseases

q2: split ligature

l: lowercase

B: uninflect each words in a term

Ct: retrieve citations

w: sort words by order

q4: get symbol names synonymy

slide41

Norm

Hodgkin's Diseases, NOS

g: remove genitives

Hodgkin Diseases, NOS

rs: remove parenthetic plural forms

Hodgkin Diseases, NOS

o: replace punctuation with spaces

Hodgkin Diseases NOS

t: strip stop words

Hodgkin Diseases

q: strip diacritics

HodgkinDiseases

q2: split ligature

Hodgkin Diseases

l: lowercase

B: uninflect each words in a term

Ct: retrieve citations

w: sort words by order

q4: get symbol names synonymy

slide42

Norm

Hodgkin's Diseases, NOS

g: remove genitives

Hodgkin Diseases, NOS

rs: remove parenthetic plural forms

Hodgkin Diseases, NOS

o: replace punctuation with spaces

Hodgkin Diseases NOS

t: strip stop words

Hodgkin Diseases

q: strip diacritics

HodgkinDiseases

q2: split ligature

Hodgkin Diseases

l: lowercase

hodgkin diseases

B: uninflect each words in a term

Ct: retrieve citations

w: sort words by order

q4: get symbol names synonymy

slide43

Norm

Hodgkin's Diseases, NOS

g: remove genitives

Hodgkin Diseases, NOS

rs: remove parenthetic plural forms

Hodgkin Diseases, NOS

o: replace punctuation with spaces

Hodgkin Diseases NOS

t: strip stop words

Hodgkin Diseases

q: strip diacritics

HodgkinDiseases

q2: split ligature

Hodgkin Diseases

l: lowercase

hodgkin diseases

B: uninflect each words in a term

hodgkin disease

Ct: retrieve citations

w: sort words by order

q4: get symbol names synonymy

slide44

Norm

Hodgkin's Diseases, NOS

g: remove genitives

Hodgkin Diseases, NOS

rs: remove parenthetic plural forms

Hodgkin Diseases, NOS

o: replace punctuation with spaces

Hodgkin Diseases NOS

t: strip stop words

Hodgkin Diseases

q: strip diacritics

HodgkinDiseases

q2: split ligature

Hodgkin Diseases

l: lowercase

hodgkin diseases

B: uninflect each words in a term

hodgkin disease

Ct: retrieve citations

hodgkin disease

w: sort words by order

q4: get symbol names synonymy

slide45

Norm

Hodgkin's Diseases, NOS

g: remove genitives

Hodgkin Diseases, NOS

rs: remove parenthetic plural forms

Hodgkin Diseases, NOS

o: replace punctuation with spaces

Hodgkin Diseases NOS

t: strip stop words

Hodgkin Diseases

q: strip diacritics

HodgkinDiseases

q2: split ligature

Hodgkin Diseases

l: lowercase

hodgkin diseases

B: uninflect each words in a term

hodgkin disease

Ct: retrieve citations

hodgkin disease

w: sort words by order

disease hodgkin

q4: get symbol names synonymy

slide46

Norm

Hodgkin's Diseases, NOS

g: remove genitives

Hodgkin Diseases, NOS

rs: remove parenthetic plural forms

Hodgkin Diseases, NOS

o: replace punctuation with spaces

Hodgkin Diseases NOS

t: strip stop words

Hodgkin Diseases

q: strip diacritics

HodgkinDiseases

q2: split ligature

Hodgkin Diseases

l: lowercase

hodgkin diseases

B: uninflect each words in a term

hodgkin disease

Ct: retrieve citations

hodgkin disease

w: sort words by order

disease hodgkin

q4: get symbol names synonymy

disease hodgkin

slide47

Norm: Example

  • Hodgkin Disease
  • HODGKINS DISEASE
  • Hodgkin's Disease
  • Disease, Hodgkin's
  • HODGKIN'S DISEASE
  • Hodgkin's disease
  • Hodgkins Disease
  • Hodgkin's disease NOS
  • Hodgkin's disease, NOS
  • Disease, Hodgkins
  • Diseases, Hodgkins
  • Hodgkins Diseases
  • Hodgkins disease
  • hodgkin's disease
  • Disease;Hodgkins
  • Disease, Hodgkin

disease hodgkin

slide48

Text Categorization

  • Based on Journal Descriptor Indexing (JDI) methodology
  • Uses a small set of high level descriptors, such as Journal Descriptors (JDs), Semantic Types (STs), Mesh subcategories, etc..
  • Used for categorize text, index contents, retrieve records, and word sense disambiguation
slide49

Text Categorization

  • Free distributed with open source code
  • 100 % in Java
  • Run on different platforms
  • One complete package
  • Documents & support
  • Provides Java APIs, command line tools, GUI tools, and Web tools
  • Planned first release, TC 2007
slide50

Text Categorization

  • Words Senses disambiguation (WSD)

Free Text

Metathesaurus

Concept

MetaMap

(MMTX)

slide51

Text Categorization

  • Words Senses disambiguation (WSD)

Concept 1

Free Text

Concept 2

MetaMap

(MMTX)

Concept n

slide52

Text Categorization

  • Words Senses disambiguation (WSD)

Concept 1

Free Text

Concept 2

MetaMap

(MMTX)

Concept n

Best

Concept

TC

slide53

Text Categorization

  • Words Senses disambiguation (WSD)

Patient Transport

(ST: Health Care Activity)

….. transport...

MetaMap

(MMTX)

Biological Transport

(ST: Cell Function)

Best

Concept

TC

slide54

Questions

  • Lexical Systems Group: http://umlslex.nlm.nih.gov
  • Lexical Tools: http://umlslex.nlm.nih.gov/lvg
slide55

Application

Metathesaurus

English

Strings

Normalized string index

norm

MRXNS.ENG

WordInd

Normalized word index

MRXNW.ENG

slide56

Application

Normalized string index

Normed

term

norm

Query

Normalized word index

SUIS

Metathesaurus

Concepts

Metathesaurus

concepts that match

the normalized query

slide57

Normed

term

Query

Example

Dry Eyes Syndrome

norm

dry eye syndrome

slide58

SUIS

Example (Cont.)

ENG|dry eye syndrome|C0013238|L0013238|S0004019|

ENG|dry eye syndrome|C0013238|L0013238|S0035652|

ENG|dry eye syndrome|C0013238|L0013238|S0090228|

ENG|dry eye syndrome|C0013238|L0013238|S0090454|

ENG|dry eye syndrome|C0013238|L0013238|S0220550|

ENG|dry eye syndrome|C0013238|L0013238|S0368350|

ENG|dry eye syndrome|C0013238|L0013238|S1459074|

Normed

term

slide59

Example (Cont.)

C0013238|ENG|P|L0013238|VS |S0004019|Dry eye syndrome

C0013238|ENG|P|L0013238|VS |S0368350|Dry Eye Syndrome

C0013238|ENG|P|L0013238|VS |S1459074|dry eye syndrome

C0013238|ENG|P|L0013238|VWS|S0090228|Syndrome, Dry Eye

C0013238|ENG|P|L0013238|VWS|S0220550|Dry, eye syndrome

C0013238|ENG|P|L0013238|VW |S0090454|Syndromes, Dry Eye

MRCON

SUIS

C0013238|ENG|P|L0013238|PF |S0035652| Dry Eye Syndromes

slide60

Questions

  • Lexical Systems Group: http://umlslex.nlm.nih.gov
  • Lexical Tools: http://umlslex.nlm.nih.gov/lvg