Unsupervised word sense disambiguation rivaling supervised methods david yarowsky
Download
1 / 16

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods David Yarowsky - PowerPoint PPT Presentation


  • 273 Views
  • Uploaded on

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods David Yarowsky. G22.2591 Presentation, Sonjia Waxmonsky. Introduction. Presents unsupervised learning algorithm for word sense disambiguation that can be applied to completely untagged text

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Unsupervised Word Sense Disambiguation Rivaling Supervised Methods David Yarowsky' - enye


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Unsupervised word sense disambiguation rivaling supervised methods david yarowsky

Unsupervised Word Sense Disambiguation Rivaling Supervised MethodsDavid Yarowsky

G22.2591 Presentation, Sonjia Waxmonsky


Introduction
Introduction

  • Presents unsupervised learning algorithm for word sense disambiguation that can be applied to completely untagged text

  • Based on supervised machine learning algorithm that uses decision lists

  • Performance matches that of supervised system


Properties of language
Properties of Language

One sense per collocation :

Nearby words provide strong and consistent clues as to the sense of a target word

One sense per discourse :

The sense of a target word is highly consistent within a single document


Decision list algorithm

Pr(Sense-A| Collocationi)

Pr(Sense-B| Collocationi)

Log( )

Decision List Algorithm

  • Supervised algorithm

  • Based on ‘One sense per collocation’ property

  • Start with large set of possible collocations

  • Calculate log-likelihood ratio of word-sense probability for each collocation:

  • Higher log-likelihood = more predictive evidence

  • Collocations are ordered in a decision list, with most predictive collocations ranked highest


Decision list algorithm1
Decision List Algorithm

Decision list is used to classify instances of target word :

  • “the loss of animal and plant species

  • through extinction …”

Classification is based on the highest ranking rule that

matches the target context


Advantage of decision lists
Advantage of Decision Lists

  • Multiple collocations may match a single context

  • But, only the single most predictive piece of evidence is used to classify the target word

  • Result: The classification procedure combines a large amount of non-independent information without complex modeling


Bootstrapping algorithm
Bootstrapping Algorithm

Sense-A: life

Sense-B: factory

  • All occurrences of the target word are identified

  • A small training set of seed data is tagged with word sense


Selecting training seeds
Selecting Training Seeds

  • Initial training set should accurately distinguish among possible senses

  • Strategies:

    • Select a single, defining seed collocation for each possible sense.

      Ex: “life” and “manufacturing” for target plant

    • Use words from dictionary definitions

    • Hand-label most frequent collocates


Bootstrapping algorithm1
Bootstrapping Algorithm

  • Iterative procedure:

    • Train decision list algorithm on seed set

    • Classify residual data with decision list

    • Create new seed set by identifying samples that are tagged with a probability above a certain threshold

    • Retrain classifier on new seed set


Bootstrapping algorithm2
Bootstrapping Algorithm

Seed set grows and residual set shrinks ….


Bootstrapping algorithm3
Bootstrapping Algorithm

Convergence: Stop when residual set stabilizes


Final decision list
Final Decision List

  • Original seed collocations may not necessarily be at the top of the list

  • Possible for sample in the original seed data to be reclassified

  • Initial misclassifications in seed data can be corrected


One sense per discourse
One Sense per Discourse

Algorithm can be improved by applying “One Sense per Discourse” constraint …

  • After algorithm has converged:

    Identify tokens tagged with low confidence, label with dominant tag of that document

  • After each iteration:

    Extend tag to all examples in a single document after enough examples are tagged with a single sense


Evaluation
Evaluation

  • Test corpus: extracted from 460 million word corpus of multiple sources (news articles, transcripts, novels, etc.)

  • Performance of multiple models compared with:

    • supervised decision lists

    • unsupervised learning algorithm of Schütze (1992), based on alignment of clusters with word senses


Results
Results

Applying the “One sense per discourse” constraint improves performance:

Accuracy (%)


Results1
Results

Accuracy exceeds Schütze algorithm for all target words, and matches that of supervised algorithm:

Accuracy (%)


ad