Automatic acquisition of lexical classes and extraction patterns for information extraction l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction PowerPoint PPT Presentation


  • 252 Views
  • Updated On :
  • Presentation posted in: Others / Misc

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction. Kiyoshi Sudo Ph.D. Research Proposal New York University. Committee: Ralph Grishman Satoshi Sekine I. Dan Melamed. Outline. Introduction Research Proposal Problem Setting Approach

Related searches for Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction

Download Presentation

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Automatic acquisition of lexical classes and extraction patterns for information extraction l.jpg

Automatic Acquisition ofLexical Classes and Extraction Patternsfor Information Extraction

Kiyoshi Sudo

Ph.D. Research Proposal

New York University

Committee:

Ralph Grishman

Satoshi Sekine

I. Dan Melamed


Outline l.jpg

Outline

  • Introduction

  • Research Proposal

    • Problem Setting

    • Approach

    • Application to Information Extraction

  • Discussion

Kiyoshi Sudo Thesis Proposal Presentation


Muc scenario template task l.jpg

MUC Scenario Template Task

MURREE, Pakistan (AP) -- Masked gunmen firing Kalashnikov rifles burst through the front gates of a Christian school Monday, killing six people and wounding three in the latest attack against Western interests since Pakistan joined the war against terrorism.

Kiyoshi Sudo Thesis Proposal Presentation


Muc scenario template task4 l.jpg

Monday

Masked

gunmen

six people

Kalashnikov

rifles

a Christian

school

three

MUC Scenario Template Task

MURREE, Pakistan (AP) -- Masked gunmen firing Kalashnikov rifles burst through the front gates of a Christian schoolMonday, killing six people and wounding three in the latest attack against Western interests since Pakistan joined the war against terrorism.

Kiyoshi Sudo Thesis Proposal Presentation


High cost for acquiring knowledge base l.jpg

High Cost forAcquiring Knowledge-Base

  • Find extraction patterns

    • Find relevant documents

    • Find relevant events

    • Analyze sentences

  • Find domain-specific lexicon

    • Find existing KB (e.g. thesaurus, gazetteers)

Kiyoshi Sudo Thesis Proposal Presentation


Prior work l.jpg

Prior Work

Automatic Knowledge Acquisition

Lexical Acquisition

Pattern Acquisition

Mutual Bootstrapping

(Riloff and Jones 1999)

Pattern Discovery with

Document Re-ranking

(Yangarber et al. 2000)

Simultaneous Multi-Semantic Class

(Thelen and Riloff 2002)

(Yangarber et al. 2002)

Pattern Acquisition for QA

(Ravichandran and Hovy 2002)

Kiyoshi Sudo Thesis Proposal Presentation


Challenge l.jpg

MUC-3:

Terrorism Event

Challenge

User

Seed Lexicon

Seed Pattern

Expanded Lexicon

Expanded Pattern Set

Knowledge

Base

Kiyoshi Sudo Thesis Proposal Presentation


Meeting the challenge l.jpg

Semantic Clustering

Scenario

Description

Semantic Cluster

Meeting the Challenge

User

Seed Lexicon

Seed Pattern

Expanded Lexicon

Expanded Pattern Set

Knowledge

Base

Kiyoshi Sudo Thesis Proposal Presentation


Semantic clustering l.jpg

Semantic Clustering

Scenario

Description

Semantic Cluster

Semantic Lexicon

Extraction Patterns

Semantic Clustering

  • Input:

  • Description specific enough

    • to define the scenario

  • (terrorism, bombing, kidnapping)

  • “Tell me about the terrorism action,

    • such as bombing and kidnapping.”

  • Goal:

  • Find Scenario-specific Semantic Clusters

  • each of which consists of

  • Semantic Lexicon

  • Extraction Patterns

Kiyoshi Sudo Thesis Proposal Presentation


Benefit for user l.jpg

Semantic Clustering

Scenario

Description

Semantic Cluster

Benefit for User

  • Simplify Domain Analysis

  • Low-cost

    Knowledge-base Acquisition

    for IE systems

Kiyoshi Sudo Thesis Proposal Presentation


Extraction patterns l.jpg

(x, bombs, himself)

Sequential:

context =

Case Frame:

(bomb (v), x (subj), himself (obj))

Dependency:

x

bomb

himself

Extraction Patterns

  • Definition

where

cunifies with the context that is defined by semantic class L

V:subj

V:obj

(cf. Sudo et al. 2001)

Kiyoshi Sudo Thesis Proposal Presentation


Outline12 l.jpg

Outline

  • Introduction

  • Research Proposal

    • Problem Setting

    • Approach

    • Information Extraction

  • Evaluation

Kiyoshi Sudo Thesis Proposal Presentation


Overview l.jpg

Source

Information

Retrieval

Scenario

Description

Boot-

strapping

Query

Expansion

Semantic Cluster

Overview

Semantic Clustering

Kiyoshi Sudo Thesis Proposal Presentation


Overview14 l.jpg

Source

Information

Retrieval

Scenario

Description

Boot-

strapping

Query

Expansion

Semantic Cluster

Overview

Semantic Clustering

Kiyoshi Sudo Thesis Proposal Presentation


Information retrieval l.jpg

Information Retrieval

  • Get Relevant Document set

  • Get list of lexical items and extraction patterns ordered by relevance to the scenario

    • TF/IDF scoring

R

Kiyoshi Sudo Thesis Proposal Presentation


Example of tf idf scoring management succession business l.jpg

Example of TF/IDF scoring(Management Succession: Business)

300 documents retrieved

From WSJ (7/94 - 8/94)

Extracted by MINIPAR (Lin 1998)

Kiyoshi Sudo Thesis Proposal Presentation


Overview17 l.jpg

Source

Information

Retrieval

Scenario

Description

extraction

patterns

lexicon

Boot-

strapping

Query

Expansion

Semantic Cluster

Overview

Semantic Clustering

Kiyoshi Sudo Thesis Proposal Presentation


Bootstrapping l.jpg

Bootstrapping

Assumption:

  • Patterns provide Lexical Classes.

  • Lexicon provides contextual information.

  • Find one cluster that consists of Lexicon and Extraction Patterns

Riloff and Jones 1999

Agichtein and Gravano 2000

Kiyoshi Sudo Thesis Proposal Presentation


Bootstrapping cont l.jpg

Bootstrapping (Cont.)

  • Algorithm (cf. Riloff and Jones 1999)

    • Given

      • the ordered list of terms

      • the ordered list of extraction patterns

      • Lexicon = (), Pattern = ()

    • w the most relevant term in the list and add it into Lexicon

    • p the most relevant pattern among those that extract w.

    • Add p into Pattern

    • wthe most relevant term among those that are extracted by p

    • Add w into Lexicon

    • Go to 1

Kiyoshi Sudo Thesis Proposal Presentation


Example of bootstrapping management succession business l.jpg

Example of Bootstrapping(Management Succession: Business)

From WSJ (7/94 - 8/94)

Extracted by MINIPAR (Lin 1998)

Kiyoshi Sudo Thesis Proposal Presentation


Example of bootstrapping management succession business21 l.jpg

Example of Bootstrapping(Management Succession: Business)

From WSJ (7/94 - 8/94)

Extracted by MINIPAR (Lin 1998)

Kiyoshi Sudo Thesis Proposal Presentation


Problem polysemous lexicon pattern l.jpg

Problem:Polysemous Lexicon, Pattern

  • Lexicon can be ambiguous

    • e.g. Clinton (Person, Organization, Location … )

  • Extraction patterns can be ambiguous

    • e.g. be killed in <x> (x: Location, Date … )

  • Needs more study

    • more restriction

    • Probabilistic Model ??

Kiyoshi Sudo Thesis Proposal Presentation


Overview23 l.jpg

Scenario

Description

pt

lex

pattern

Semantic Cluster

lexicon

Overview

Semantic Clustering

Source

Information

Retrieval

Boot-

strapping

Query

Expansion

Kiyoshi Sudo Thesis Proposal Presentation


Query expansion l.jpg

Query Expansion

  • Generalize terms in a query with a newly discovered cluster

    • cf. Rocchio 1971 (Vector model)

    • Zhai and Lafferty 2001 (Language-modeling)

Kiyoshi Sudo Thesis Proposal Presentation


Overview25 l.jpg

Scenario

Description

pt

lex

pattern

Semantic Cluster

lexicon

Overview

Semantic Clustering

Source

Information

Retrieval

Boot-

strapping

Query

Expansion

Kiyoshi Sudo Thesis Proposal Presentation


Outline26 l.jpg

Outline

  • Introduction

  • Research Proposal

    • Problem Setting

    • Approach

    • Application to Information Extraction

  • Discussion

Kiyoshi Sudo Thesis Proposal Presentation


Application to information extraction l.jpg

Semantic Clustering

Preprocessing

Scenario

Description

Entity

Recognition

Event Recognition

Role Assignment

Semantic Cluster

Pattern Matching

Semantic Lexicon

Merging

Extraction Patterns

Application toInformation Extraction

Kiyoshi Sudo Thesis Proposal Presentation


Human intervention l.jpg

Human Intervention

  • Extraction patterns

    • Event pattern

      • Context contains a verb or nominalization of verb

      • Used for event extraction and role assignment

      • e.g. (terrorist, fire, x)

    • Local pattern

      • Context contains only enough information to recognize semantic class

      • Used for entity recognition only

      • e.g. (x,Inc.)

  • Association of Event Pattern to Role

    • e.g. (company, hire, x)PersonIn and (company, fire, x)PersonOut

Kiyoshi Sudo Thesis Proposal Presentation


Outline29 l.jpg

Outline

  • Introduction

  • Research Proposal

    • Problem Setting

    • Approach

    • Application to Information Extraction

  • Discussion

Kiyoshi Sudo Thesis Proposal Presentation


Discussion l.jpg

Discussion

  • Domain Portability

    • User only needs to specify the scenario

  • Language Portability

    • Language-dependent Tools

      • Segmentation (Lemmatization)

      • Dependency Parsing

Kiyoshi Sudo Thesis Proposal Presentation


Evaluation l.jpg

Evaluation

  • MUC-style (Scenario-Template task)

    • Slot-base

      • Precision, Recall, F-measure

    • Domain Portability

      • Several pre-defined tasks that differ in difficulty

    • Language Portability

      • Japanese

      • English

Kiyoshi Sudo Thesis Proposal Presentation


Contribution l.jpg

Contribution

  • Tool for Domain Analysis

  • Low-cost Knowledge-base Acquisition

  • Towards Open-domain Information Extraction

Kiyoshi Sudo Thesis Proposal Presentation


Conclusion l.jpg

Conclusion

  • Proposed New Approach for Knowledge-base Acquisition (Semantic Clustering)

  • Discussed Application of Acquired KB to Information Extraction (Human Intervention and Local vs. Event patterns)

  • Discussed Evaluation with several predefined MUC-style tasks different in difficulty and across languages (Domain portability and Language portability)

Kiyoshi Sudo Thesis Proposal Presentation


Slide34 l.jpg

ToDo

  • Implementation

  • Preparation for Evaluation

  • Evaluation

Kiyoshi Sudo Thesis Proposal Presentation


Time for questions conclusion l.jpg

Time for Questions(Conclusion)

  • Proposed New Approach for Knowledge-base Acquisition (Semantic Clustering)

  • Discussed Application of Acquired KB to Information Extraction (Human Intervention and Local vs. Event patterns)

  • Discussed Evaluation with several predefined MUC-style tasks different in difficulty and across languages (Domain portability and Language portability)

Kiyoshi Sudo Thesis Proposal Presentation


  • Login