automatic acquisition of lexical classes and extraction patterns for information extraction l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction PowerPoint Presentation
Download Presentation
Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction

Loading in 2 Seconds...

play fullscreen
1 / 35

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction - PowerPoint PPT Presentation


  • 289 Views
  • Uploaded on

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction. Kiyoshi Sudo Ph.D. Research Proposal New York University. Committee: Ralph Grishman Satoshi Sekine I. Dan Melamed. Outline. Introduction Research Proposal Problem Setting Approach

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Automatic Acquisition ofLexical Classes and Extraction Patternsfor Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee: Ralph Grishman Satoshi Sekine I. Dan Melamed

    2. Outline • Introduction • Research Proposal • Problem Setting • Approach • Application to Information Extraction • Discussion Kiyoshi Sudo Thesis Proposal Presentation

    3. MUC Scenario Template Task MURREE, Pakistan (AP) -- Masked gunmen firing Kalashnikov rifles burst through the front gates of a Christian school Monday, killing six people and wounding three in the latest attack against Western interests since Pakistan joined the war against terrorism. Kiyoshi Sudo Thesis Proposal Presentation

    4. Monday Masked gunmen six people Kalashnikov rifles a Christian school three MUC Scenario Template Task MURREE, Pakistan (AP) -- Masked gunmen firing Kalashnikov rifles burst through the front gates of a Christian schoolMonday, killing six people and wounding three in the latest attack against Western interests since Pakistan joined the war against terrorism. Kiyoshi Sudo Thesis Proposal Presentation

    5. High Cost forAcquiring Knowledge-Base • Find extraction patterns • Find relevant documents • Find relevant events • Analyze sentences • Find domain-specific lexicon • Find existing KB (e.g. thesaurus, gazetteers) Kiyoshi Sudo Thesis Proposal Presentation

    6. Prior Work Automatic Knowledge Acquisition Lexical Acquisition Pattern Acquisition Mutual Bootstrapping (Riloff and Jones 1999) Pattern Discovery with Document Re-ranking (Yangarber et al. 2000) Simultaneous Multi-Semantic Class (Thelen and Riloff 2002) (Yangarber et al. 2002) Pattern Acquisition for QA (Ravichandran and Hovy 2002) Kiyoshi Sudo Thesis Proposal Presentation

    7. MUC-3: Terrorism Event Challenge User Seed Lexicon Seed Pattern Expanded Lexicon Expanded Pattern Set Knowledge Base Kiyoshi Sudo Thesis Proposal Presentation

    8. Semantic Clustering Scenario Description Semantic Cluster Meeting the Challenge User Seed Lexicon Seed Pattern Expanded Lexicon Expanded Pattern Set Knowledge Base Kiyoshi Sudo Thesis Proposal Presentation

    9. Semantic Clustering Scenario Description Semantic Cluster Semantic Lexicon Extraction Patterns Semantic Clustering • Input: • Description specific enough • to define the scenario • (terrorism, bombing, kidnapping) • “Tell me about the terrorism action, • such as bombing and kidnapping.” • Goal: • Find Scenario-specific Semantic Clusters • each of which consists of • Semantic Lexicon • Extraction Patterns Kiyoshi Sudo Thesis Proposal Presentation

    10. Semantic Clustering Scenario Description Semantic Cluster Benefit for User • Simplify Domain Analysis • Low-cost Knowledge-base Acquisition for IE systems Kiyoshi Sudo Thesis Proposal Presentation

    11. (x, bombs, himself) Sequential: context = Case Frame: (bomb (v), x (subj), himself (obj)) Dependency: x bomb himself Extraction Patterns • Definition where cunifies with the context that is defined by semantic class L V:subj V:obj (cf. Sudo et al. 2001) Kiyoshi Sudo Thesis Proposal Presentation

    12. Outline • Introduction • Research Proposal • Problem Setting • Approach • Information Extraction • Evaluation Kiyoshi Sudo Thesis Proposal Presentation

    13. Source Information Retrieval Scenario Description Boot- strapping Query Expansion Semantic Cluster Overview Semantic Clustering Kiyoshi Sudo Thesis Proposal Presentation

    14. Source Information Retrieval Scenario Description Boot- strapping Query Expansion Semantic Cluster Overview Semantic Clustering Kiyoshi Sudo Thesis Proposal Presentation

    15. Information Retrieval • Get Relevant Document set • Get list of lexical items and extraction patterns ordered by relevance to the scenario • TF/IDF scoring R Kiyoshi Sudo Thesis Proposal Presentation

    16. Example of TF/IDF scoring(Management Succession: Business) 300 documents retrieved From WSJ (7/94 - 8/94) Extracted by MINIPAR (Lin 1998) Kiyoshi Sudo Thesis Proposal Presentation

    17. Source Information Retrieval Scenario Description extraction patterns lexicon Boot- strapping Query Expansion Semantic Cluster Overview Semantic Clustering Kiyoshi Sudo Thesis Proposal Presentation

    18. Bootstrapping Assumption: • Patterns provide Lexical Classes. • Lexicon provides contextual information. • Find one cluster that consists of Lexicon and Extraction Patterns Riloff and Jones 1999 Agichtein and Gravano 2000 Kiyoshi Sudo Thesis Proposal Presentation

    19. Bootstrapping (Cont.) • Algorithm (cf. Riloff and Jones 1999) • Given • the ordered list of terms • the ordered list of extraction patterns • Lexicon = (), Pattern = () • w the most relevant term in the list and add it into Lexicon • p the most relevant pattern among those that extract w. • Add p into Pattern • wthe most relevant term among those that are extracted by p • Add w into Lexicon • Go to 1 Kiyoshi Sudo Thesis Proposal Presentation

    20. Example of Bootstrapping(Management Succession: Business) From WSJ (7/94 - 8/94) Extracted by MINIPAR (Lin 1998) Kiyoshi Sudo Thesis Proposal Presentation

    21. Example of Bootstrapping(Management Succession: Business) From WSJ (7/94 - 8/94) Extracted by MINIPAR (Lin 1998) Kiyoshi Sudo Thesis Proposal Presentation

    22. Problem:Polysemous Lexicon, Pattern • Lexicon can be ambiguous • e.g. Clinton (Person, Organization, Location … ) • Extraction patterns can be ambiguous • e.g. be killed in <x> (x: Location, Date … ) • Needs more study • more restriction • Probabilistic Model ?? Kiyoshi Sudo Thesis Proposal Presentation

    23. Scenario Description pt lex pattern Semantic Cluster lexicon Overview Semantic Clustering Source Information Retrieval Boot- strapping Query Expansion Kiyoshi Sudo Thesis Proposal Presentation

    24. Query Expansion • Generalize terms in a query with a newly discovered cluster • cf. Rocchio 1971 (Vector model) • Zhai and Lafferty 2001 (Language-modeling) Kiyoshi Sudo Thesis Proposal Presentation

    25. Scenario Description pt lex pattern Semantic Cluster lexicon Overview Semantic Clustering Source Information Retrieval Boot- strapping Query Expansion Kiyoshi Sudo Thesis Proposal Presentation

    26. Outline • Introduction • Research Proposal • Problem Setting • Approach • Application to Information Extraction • Discussion Kiyoshi Sudo Thesis Proposal Presentation

    27. Semantic Clustering Preprocessing Scenario Description Entity Recognition Event Recognition Role Assignment Semantic Cluster Pattern Matching Semantic Lexicon Merging Extraction Patterns Application toInformation Extraction Kiyoshi Sudo Thesis Proposal Presentation

    28. Human Intervention • Extraction patterns • Event pattern • Context contains a verb or nominalization of verb • Used for event extraction and role assignment • e.g. (terrorist, fire, x) • Local pattern • Context contains only enough information to recognize semantic class • Used for entity recognition only • e.g. (x,Inc.) • Association of Event Pattern to Role • e.g. (company, hire, x)PersonIn and (company, fire, x)PersonOut Kiyoshi Sudo Thesis Proposal Presentation

    29. Outline • Introduction • Research Proposal • Problem Setting • Approach • Application to Information Extraction • Discussion Kiyoshi Sudo Thesis Proposal Presentation

    30. Discussion • Domain Portability • User only needs to specify the scenario • Language Portability • Language-dependent Tools • Segmentation (Lemmatization) • Dependency Parsing Kiyoshi Sudo Thesis Proposal Presentation

    31. Evaluation • MUC-style (Scenario-Template task) • Slot-base • Precision, Recall, F-measure • Domain Portability • Several pre-defined tasks that differ in difficulty • Language Portability • Japanese • English Kiyoshi Sudo Thesis Proposal Presentation

    32. Contribution • Tool for Domain Analysis • Low-cost Knowledge-base Acquisition • Towards Open-domain Information Extraction Kiyoshi Sudo Thesis Proposal Presentation

    33. Conclusion • Proposed New Approach for Knowledge-base Acquisition (Semantic Clustering) • Discussed Application of Acquired KB to Information Extraction (Human Intervention and Local vs. Event patterns) • Discussed Evaluation with several predefined MUC-style tasks different in difficulty and across languages (Domain portability and Language portability) Kiyoshi Sudo Thesis Proposal Presentation

    34. ToDo • Implementation • Preparation for Evaluation • Evaluation Kiyoshi Sudo Thesis Proposal Presentation

    35. Time for Questions(Conclusion) • Proposed New Approach for Knowledge-base Acquisition (Semantic Clustering) • Discussed Application of Acquired KB to Information Extraction (Human Intervention and Local vs. Event patterns) • Discussed Evaluation with several predefined MUC-style tasks different in difficulty and across languages (Domain portability and Language portability) Kiyoshi Sudo Thesis Proposal Presentation