automatic acquisition of lexical classes and extraction patterns for information extraction
Download
Skip this Video
Download Presentation
Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction

Loading in 2 Seconds...

play fullscreen
1 / 35

Automatic Acquisition of - PowerPoint PPT Presentation


  • 274 Views
  • Uploaded on

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction. Kiyoshi Sudo Ph.D. Research Proposal New York University. Committee: Ralph Grishman Satoshi Sekine I. Dan Melamed. Outline. Introduction Research Proposal Problem Setting Approach

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Automatic Acquisition of' - Angelica


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
automatic acquisition of lexical classes and extraction patterns for information extraction

Automatic Acquisition ofLexical Classes and Extraction Patternsfor Information Extraction

Kiyoshi Sudo

Ph.D. Research Proposal

New York University

Committee:

Ralph Grishman

Satoshi Sekine

I. Dan Melamed

outline
Outline
  • Introduction
  • Research Proposal
    • Problem Setting
    • Approach
    • Application to Information Extraction
  • Discussion

Kiyoshi Sudo Thesis Proposal Presentation

muc scenario template task
MUC Scenario Template Task

MURREE, Pakistan (AP) -- Masked gunmen firing Kalashnikov rifles burst through the front gates of a Christian school Monday, killing six people and wounding three in the latest attack against Western interests since Pakistan joined the war against terrorism.

Kiyoshi Sudo Thesis Proposal Presentation

muc scenario template task4

Monday

Masked

gunmen

six people

Kalashnikov

rifles

a Christian

school

three

MUC Scenario Template Task

MURREE, Pakistan (AP) -- Masked gunmen firing Kalashnikov rifles burst through the front gates of a Christian schoolMonday, killing six people and wounding three in the latest attack against Western interests since Pakistan joined the war against terrorism.

Kiyoshi Sudo Thesis Proposal Presentation

high cost for acquiring knowledge base
High Cost forAcquiring Knowledge-Base
  • Find extraction patterns
    • Find relevant documents
    • Find relevant events
    • Analyze sentences
  • Find domain-specific lexicon
    • Find existing KB (e.g. thesaurus, gazetteers)

Kiyoshi Sudo Thesis Proposal Presentation

prior work
Prior Work

Automatic Knowledge Acquisition

Lexical Acquisition

Pattern Acquisition

Mutual Bootstrapping

(Riloff and Jones 1999)

Pattern Discovery with

Document Re-ranking

(Yangarber et al. 2000)

Simultaneous Multi-Semantic Class

(Thelen and Riloff 2002)

(Yangarber et al. 2002)

Pattern Acquisition for QA

(Ravichandran and Hovy 2002)

Kiyoshi Sudo Thesis Proposal Presentation

challenge

MUC-3:

Terrorism Event

Challenge

User

Seed Lexicon

Seed Pattern

Expanded Lexicon

Expanded Pattern Set

Knowledge

Base

Kiyoshi Sudo Thesis Proposal Presentation

meeting the challenge

Semantic Clustering

Scenario

Description

Semantic Cluster

Meeting the Challenge

User

Seed Lexicon

Seed Pattern

Expanded Lexicon

Expanded Pattern Set

Knowledge

Base

Kiyoshi Sudo Thesis Proposal Presentation

semantic clustering

Semantic Clustering

Scenario

Description

Semantic Cluster

Semantic Lexicon

Extraction Patterns

Semantic Clustering
  • Input:
  • Description specific enough
      • to define the scenario
  • (terrorism, bombing, kidnapping)
  • “Tell me about the terrorism action,
    • such as bombing and kidnapping.”
  • Goal:
  • Find Scenario-specific Semantic Clusters
  • each of which consists of
  • Semantic Lexicon
  • Extraction Patterns

Kiyoshi Sudo Thesis Proposal Presentation

benefit for user

Semantic Clustering

Scenario

Description

Semantic Cluster

Benefit for User
  • Simplify Domain Analysis
  • Low-cost

Knowledge-base Acquisition

for IE systems

Kiyoshi Sudo Thesis Proposal Presentation

extraction patterns

(x, bombs, himself)

Sequential:

context =

Case Frame:

(bomb (v), x (subj), himself (obj))

Dependency:

x

bomb

himself

Extraction Patterns
  • Definition

where

cunifies with the context that is defined by semantic class L

V:subj

V:obj

(cf. Sudo et al. 2001)

Kiyoshi Sudo Thesis Proposal Presentation

outline12
Outline
  • Introduction
  • Research Proposal
    • Problem Setting
    • Approach
    • Information Extraction
  • Evaluation

Kiyoshi Sudo Thesis Proposal Presentation

overview

Source

Information

Retrieval

Scenario

Description

Boot-

strapping

Query

Expansion

Semantic Cluster

Overview

Semantic Clustering

Kiyoshi Sudo Thesis Proposal Presentation

overview14

Source

Information

Retrieval

Scenario

Description

Boot-

strapping

Query

Expansion

Semantic Cluster

Overview

Semantic Clustering

Kiyoshi Sudo Thesis Proposal Presentation

information retrieval
Information Retrieval
  • Get Relevant Document set
  • Get list of lexical items and extraction patterns ordered by relevance to the scenario
    • TF/IDF scoring

R

Kiyoshi Sudo Thesis Proposal Presentation

example of tf idf scoring management succession business
Example of TF/IDF scoring(Management Succession: Business)

300 documents retrieved

From WSJ (7/94 - 8/94)

Extracted by MINIPAR (Lin 1998)

Kiyoshi Sudo Thesis Proposal Presentation

overview17

Source

Information

Retrieval

Scenario

Description

extraction

patterns

lexicon

Boot-

strapping

Query

Expansion

Semantic Cluster

Overview

Semantic Clustering

Kiyoshi Sudo Thesis Proposal Presentation

bootstrapping
Bootstrapping

Assumption:

  • Patterns provide Lexical Classes.
  • Lexicon provides contextual information.
  • Find one cluster that consists of Lexicon and Extraction Patterns

Riloff and Jones 1999

Agichtein and Gravano 2000

Kiyoshi Sudo Thesis Proposal Presentation

bootstrapping cont
Bootstrapping (Cont.)
  • Algorithm (cf. Riloff and Jones 1999)
    • Given
      • the ordered list of terms
      • the ordered list of extraction patterns
      • Lexicon = (), Pattern = ()
    • w the most relevant term in the list and add it into Lexicon
    • p the most relevant pattern among those that extract w.
    • Add p into Pattern
    • wthe most relevant term among those that are extracted by p
    • Add w into Lexicon
    • Go to 1

Kiyoshi Sudo Thesis Proposal Presentation

example of bootstrapping management succession business
Example of Bootstrapping(Management Succession: Business)

From WSJ (7/94 - 8/94)

Extracted by MINIPAR (Lin 1998)

Kiyoshi Sudo Thesis Proposal Presentation

example of bootstrapping management succession business21
Example of Bootstrapping(Management Succession: Business)

From WSJ (7/94 - 8/94)

Extracted by MINIPAR (Lin 1998)

Kiyoshi Sudo Thesis Proposal Presentation

problem polysemous lexicon pattern
Problem:Polysemous Lexicon, Pattern
  • Lexicon can be ambiguous
    • e.g. Clinton (Person, Organization, Location … )
  • Extraction patterns can be ambiguous
    • e.g. be killed in <x> (x: Location, Date … )
  • Needs more study
    • more restriction
    • Probabilistic Model ??

Kiyoshi Sudo Thesis Proposal Presentation

overview23

Scenario

Description

pt

lex

pattern

Semantic Cluster

lexicon

Overview

Semantic Clustering

Source

Information

Retrieval

Boot-

strapping

Query

Expansion

Kiyoshi Sudo Thesis Proposal Presentation

query expansion
Query Expansion
  • Generalize terms in a query with a newly discovered cluster
    • cf. Rocchio 1971 (Vector model)
    • Zhai and Lafferty 2001 (Language-modeling)

Kiyoshi Sudo Thesis Proposal Presentation

overview25

Scenario

Description

pt

lex

pattern

Semantic Cluster

lexicon

Overview

Semantic Clustering

Source

Information

Retrieval

Boot-

strapping

Query

Expansion

Kiyoshi Sudo Thesis Proposal Presentation

outline26
Outline
  • Introduction
  • Research Proposal
    • Problem Setting
    • Approach
    • Application to Information Extraction
  • Discussion

Kiyoshi Sudo Thesis Proposal Presentation

application to information extraction

Semantic Clustering

Preprocessing

Scenario

Description

Entity

Recognition

Event Recognition

Role Assignment

Semantic Cluster

Pattern Matching

Semantic Lexicon

Merging

Extraction Patterns

Application toInformation Extraction

Kiyoshi Sudo Thesis Proposal Presentation

human intervention
Human Intervention
  • Extraction patterns
    • Event pattern
      • Context contains a verb or nominalization of verb
      • Used for event extraction and role assignment
      • e.g. (terrorist, fire, x)
    • Local pattern
      • Context contains only enough information to recognize semantic class
      • Used for entity recognition only
      • e.g. (x,Inc.)
  • Association of Event Pattern to Role
    • e.g. (company, hire, x)PersonIn and (company, fire, x)PersonOut

Kiyoshi Sudo Thesis Proposal Presentation

outline29
Outline
  • Introduction
  • Research Proposal
    • Problem Setting
    • Approach
    • Application to Information Extraction
  • Discussion

Kiyoshi Sudo Thesis Proposal Presentation

discussion
Discussion
  • Domain Portability
    • User only needs to specify the scenario
  • Language Portability
    • Language-dependent Tools
      • Segmentation (Lemmatization)
      • Dependency Parsing

Kiyoshi Sudo Thesis Proposal Presentation

evaluation
Evaluation
  • MUC-style (Scenario-Template task)
    • Slot-base
      • Precision, Recall, F-measure
    • Domain Portability
      • Several pre-defined tasks that differ in difficulty
    • Language Portability
      • Japanese
      • English

Kiyoshi Sudo Thesis Proposal Presentation

contribution
Contribution
  • Tool for Domain Analysis
  • Low-cost Knowledge-base Acquisition
  • Towards Open-domain Information Extraction

Kiyoshi Sudo Thesis Proposal Presentation

conclusion
Conclusion
  • Proposed New Approach for Knowledge-base Acquisition (Semantic Clustering)
  • Discussed Application of Acquired KB to Information Extraction (Human Intervention and Local vs. Event patterns)
  • Discussed Evaluation with several predefined MUC-style tasks different in difficulty and across languages (Domain portability and Language portability)

Kiyoshi Sudo Thesis Proposal Presentation

slide34
ToDo
  • Implementation
  • Preparation for Evaluation
  • Evaluation

Kiyoshi Sudo Thesis Proposal Presentation

time for questions conclusion
Time for Questions(Conclusion)
  • Proposed New Approach for Knowledge-base Acquisition (Semantic Clustering)
  • Discussed Application of Acquired KB to Information Extraction (Human Intervention and Local vs. Event patterns)
  • Discussed Evaluation with several predefined MUC-style tasks different in difficulty and across languages (Domain portability and Language portability)

Kiyoshi Sudo Thesis Proposal Presentation

ad