Unsupervised word sense disambiguation rivaling supervised methods david yarowsky
1 / 16

- PowerPoint PPT Presentation

  • Uploaded on

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods David Yarowsky. G22.2591 Presentation, Sonjia Waxmonsky. Introduction. Presents unsupervised learning algorithm for word sense disambiguation that can be applied to completely untagged text

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about '' - enye

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Unsupervised word sense disambiguation rivaling supervised methods david yarowsky

Unsupervised Word Sense Disambiguation Rivaling Supervised MethodsDavid Yarowsky

G22.2591 Presentation, Sonjia Waxmonsky


  • Presents unsupervised learning algorithm for word sense disambiguation that can be applied to completely untagged text

  • Based on supervised machine learning algorithm that uses decision lists

  • Performance matches that of supervised system

Properties of language
Properties of Language

One sense per collocation :

Nearby words provide strong and consistent clues as to the sense of a target word

One sense per discourse :

The sense of a target word is highly consistent within a single document

Decision list algorithm

Pr(Sense-A| Collocationi)

Pr(Sense-B| Collocationi)

Log( )

Decision List Algorithm

  • Supervised algorithm

  • Based on ‘One sense per collocation’ property

  • Start with large set of possible collocations

  • Calculate log-likelihood ratio of word-sense probability for each collocation:

  • Higher log-likelihood = more predictive evidence

  • Collocations are ordered in a decision list, with most predictive collocations ranked highest

Decision list algorithm1
Decision List Algorithm

Decision list is used to classify instances of target word :

  • “the loss of animal and plant species

  • through extinction …”

Classification is based on the highest ranking rule that

matches the target context

Advantage of decision lists
Advantage of Decision Lists

  • Multiple collocations may match a single context

  • But, only the single most predictive piece of evidence is used to classify the target word

  • Result: The classification procedure combines a large amount of non-independent information without complex modeling

Bootstrapping algorithm
Bootstrapping Algorithm

Sense-A: life

Sense-B: factory

  • All occurrences of the target word are identified

  • A small training set of seed data is tagged with word sense

Selecting training seeds
Selecting Training Seeds

  • Initial training set should accurately distinguish among possible senses

  • Strategies:

    • Select a single, defining seed collocation for each possible sense.

      Ex: “life” and “manufacturing” for target plant

    • Use words from dictionary definitions

    • Hand-label most frequent collocates

Bootstrapping algorithm1
Bootstrapping Algorithm

  • Iterative procedure:

    • Train decision list algorithm on seed set

    • Classify residual data with decision list

    • Create new seed set by identifying samples that are tagged with a probability above a certain threshold

    • Retrain classifier on new seed set

Bootstrapping algorithm2
Bootstrapping Algorithm

Seed set grows and residual set shrinks ….

Bootstrapping algorithm3
Bootstrapping Algorithm

Convergence: Stop when residual set stabilizes

Final decision list
Final Decision List

  • Original seed collocations may not necessarily be at the top of the list

  • Possible for sample in the original seed data to be reclassified

  • Initial misclassifications in seed data can be corrected

One sense per discourse
One Sense per Discourse

Algorithm can be improved by applying “One Sense per Discourse” constraint …

  • After algorithm has converged:

    Identify tokens tagged with low confidence, label with dominant tag of that document

  • After each iteration:

    Extend tag to all examples in a single document after enough examples are tagged with a single sense


  • Test corpus: extracted from 460 million word corpus of multiple sources (news articles, transcripts, novels, etc.)

  • Performance of multiple models compared with:

    • supervised decision lists

    • unsupervised learning algorithm of Schütze (1992), based on alignment of clusters with word senses


Applying the “One sense per discourse” constraint improves performance:

Accuracy (%)


Accuracy exceeds Schütze algorithm for all target words, and matches that of supervised algorithm:

Accuracy (%)