Center for Computing
Download
1 / 17

Hiram Calvo and Alexander Gelbukh Presented by Igor A. Bolshakov - PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on

Center for Computing Research National Polytechnic Institute Mexico. Acquiring Selectional Preferences from Untagged Text for Prepositional Phrase Attachment Disambiguation. Hiram Calvo and Alexander Gelbukh Presented by Igor A. Bolshakov. Introduction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Hiram Calvo and Alexander Gelbukh Presented by Igor A. Bolshakov' - zoe-campbell


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Center for Computing ResearchNational Polytechnic InstituteMexico

Acquiring Selectional Preferences from Untagged Text for Prepositional Phrase Attachment Disambiguation

Hiram Calvo and Alexander Gelbukh

Presented by Igor A. Bolshakov


Introduction
Introduction

  • Entities must be identified adequately for database representation:

    • See the cat with a telescope

    • See [the cat] [with a telescope] 2 entities

    • See [the cat with a telescope]1 entity

  • Problem is known as Prepositional Phrase (PP) attachment disambiguation.


Existing methods 1
Existing methods - 1

  • Accuracy when using treebank statistics:

    • Ratnaparkhi et al.,Brill and Resnik: up to 84%

    • Kudo and Matsumoto: 95.8%

      • Needed weeks for training

    • Lüdtke and Sato: 94.9%

      • Only 3 hours for training

  • But there are no treebanks for many languages!


Existing methods 2
Existing methods - 2

  • Based on Untagged text:

    • Calvo and Gelbukh, 2003: 82.3% accuracy

    • Uses the web as corpus:

      • Slow (up to 18 queries for each PP attachment ambiguity)

  • Does this method work with very big local corpora?


Using a big local corpus
Using a big local corpus

  • Corpus

    • 3 years of publication of 4 newspapers

    • 161 million words

    • 61 million sentences

  • Results:

    • Recall: 36% Precision: 67%

    • Dissapointing!


What do we want
What do we want?

  • To solve PP attachment disambiguation with

    • Local corpora, not web

    • No treebanks

    • No supervision

    • High precision and recall

  • Solution proposed:

    • Selectional Preferences


Selectional preferences
Selectional Preferences

  • The problem of

    I see a cat with a telescope

    turns into

    I see {animal} with {instrument}


Sources for noun semantic classification
Sources for noun semantic classification

  • Machine-Readable dictionaries

  • WordNet ontology

    • We use the top 25 unique beginner concepts of WordNet

  • Examples: mouse is-a {animal}, ranch is-a {place}, root is-a part}, reality is-a {atrtibute}, race is-a {grouping}, etc.


Extracting selectional preferences
Extracting Selectional Preferences

  • Text is shallow parsed

  • Subordinate sentences are separated

  • Patterns are searched

    1. Verb NEAR Preposition NEXT_TO Noun

    2. Verb NEAR Noun

    3. Noun NEAR Verb

    4. Noun NEXT_TO Preposition NEXT_TO Noun

  • All Nouns are classified


Example
Example

  • Consider this toy-corpus:

    • I see a cat with a telescope

    • I see a ship in the sea with a spyglass

      The following patterns are extracted:

    • See,cat see,{animal}

    • See,with,telescope see,with,{instrument}

    • Cat,with,telescope {animal},with,{instrument}

    • See,ship see,{thing}

    • See,in,sea see,in,{place}

    • See,with,spyglass see,with,{instrument}

    • Ship,in,sea {thing},in,{place}


Example1
Example

  • See, with, {instrument} has two occurrences

  • {Animal}, with, {instrument} has one occurrence

  • Thus,

    • See with {instrument} is more probable than {animal} with {instrument}


Experiment
Experiment

  • Now, with a real corpus, we apply the following formula:

  • X can be a specific verb or a noun’s semantic class (see or {animal})

  • P is a preposition (with)

  • C2 is the class of the second noun {instrument}


Experiment1
Experiment

  • From the corpus of 161 million words of Spanish Mexican newspaper the system obtained:

  • 893,278 selectional preferences for 5,387verbs, and

  • 55,469noun patterns (like {animal} with {instrument})


Evaluation
Evaluation

  • We tested the obtained Selectional Preferences doing PP attachment disambiguation on 546 sentences from the LEXESP corpus (in Spanish).

  • Then we compared manually with the correct PP attachments.

  • Results: precision 78.2%, recall: 76.0%


Conclusions
Conclusions

  • Results not as good as those obtained by other methods (up to 95%)

  • But we don’t need any costly resources, such as:

    • Treebanks

    • Manually anotated corpora

    • Web as corpus


Future work
Future Work

  • To use not only 25 fixed semantic classes (top concepts) but the whole hierarchy

  • To use a WSD module

    • Currently if a word belongs to more than one class, all classes are taken into accoutb


Thank you

Thank you!

[email protected]

[email protected]


ad