Word sense disambiguation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 23

Word Sense Disambiguation PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Word Sense Disambiguation. Foundations of Statistical NLP Chapter 7. 2002. 1. 18. Kyung-Hee Sung. Contents. Methodological Preliminaries Supervised Disambiguation Bayesian classification / An information-theoretic approach

Download Presentation

Word Sense Disambiguation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Word sense disambiguation

Word Sense Disambiguation

Foundations of Statistical NLP

Chapter 7

2002. 1. 18.

Kyung-Hee Sung



  • Methodological Preliminaries

  • Supervised Disambiguation

    • Bayesian classification / An information-theoretic approach

    • Disambiguation based on dictionaries, thesauri and bilingual dictionaries

    • One sense per discourse, one sense per collocation

  • Unsupervised Disambiguation



  • Word sense disambiguation

    • Many words have several meanings or senses.

    • Many words have different usages.

      Ex) Butter may be used as noun, or as a verb.

    • The task of disambiguation is done by looking at the context of the word’s use.

Methodological preliminaries 1 2

Methodological Preliminaries (1/2)

Methodological preliminaries 2 2

Methodological Preliminaries (2/2)

  • Pseudowards : artificial ambiguous words

    • In order to test the performance of disambiguation algorithms. Ex) banana-door

  • Performance estimation

    • Upper bound : human performance

    • Lower bound (baseline) : the performance of the simplest possible algorithm, usually the assignment of all contexts to the most frequent sense.

Supervised disambiguation

Supervised Disambiguation



Bayesian classification 1 2

Bayesian classification (1/2)

  • Assumption : we have a corpus where each use of ambiguous words is labeled with its correct sense.

  • Bayes decision rule : minimizes the probability of errors

    • Decide s´ if P(s´| c) > P(sk| c)

← using Bayes’s rule

← P(c) is constant for all senses

Bayesian classification 2 2

Bayesian classification (2/2)

  • Naive Bayes independence assumption

    • All the structure and linear ordering of words within the context is ignored. → bag of words model

    • The presence of one word in the bag is independent of another.

  • Decision rule for Naive Bayes

    • Decide s´ if

← computed by MLE

An information theoretic approach 1 2

An Information-theoretic approach (1/2)

  • The Flip-Flop algorithm applied to finding indicators for disambiguation between two senses.

An information theoretic approach 2 2

An Information-theoretic approach (2/2)

  • To translate prendre (French) based on its object

    • Translation, {t1, … tm} = { take, make, rise, speak }

    • Values of Indicator, {x1, … xm} = { mesure, note, exemple, décision, parole }

Dictionary based disambiguation 1 2

Dictionary-based disambiguation (1/2)

  • A word’s dictionary definitions are good indicators for the senses they define.

Dictionary based disambiguation 2 2

Dictionary-based disambiguation (2/2)

  • Simplified example : ash

    • The score is number of words that are shared by the sense definition and the context.

Thesaurus based disambiguation 1 2

Thesaurus-based disambiguation (1/2)

  • Semantic categories of the words in a context determine the semantic categories of the context as a whole, and that this category in turn determines which word senses are used.

  • Each word is assigned one or more subject codes in the dictionary.

    • t(sk) : subject code of sense sk.

    • The score is the number of words that compatible with the subject code of sense sk.

Thesaurus based disambiguation 2 2

Thesaurus-based disambiguation (2/2)

  • Self-interest (advantage) is not topic-specific.

  • When a sense is spread out over several topics, the topic-based classification algorithm fails.

Disambiguation based on translations in a second language corpus

Disambiguation based on translations in a second-language corpus

  • In order to disambiguate an occurrence of interest in English (first language), we identify the phrase it occurs in and search a German (second language) corpus for instances of the phrase.

    • The English phrase showed interest: show(E) → ‘zeigen’(G)

    • ‘zeigen’(G) will only occur with Interesse(G) since ‘legal shares’ are usually not shown.

    • We can conclude that interest in the phrase to show interest belongs to the sense attention, concern.

One sense per discourse constraint

One sense per discourse constraint

  • The sense of a target word is highly consistent within any given document.

    • If the first occurrence of plant is a use of the sense ‘living being’, then later occurrences are likely to refer to living beings too.

One sense per collocation constraint

One sense per collocation constraint

  • Word senses are strongly correlated with certain contextual features like other words in the same phrasal unit.

    • Collocational features are ranked according to the ratio:

    • Relying on only the strongest feature has the advantage that no integration of different sources of evidence is necessary.

← The number of occurrences of sense sk

with collocation fm

Unsupervised disambiguation 1 3

Unsupervised Disambiguation (1/3)

  • There are situations in which even such a small amount of information is not available.

  • Sense tagging requires that some characterization of the senses be provided. However, sense discrimination can be performed in a completely unsupervised fashion.

  • Context-group discrimination : a completely unsupervised algorithm that clusters unlabeled occurrences.

An em algorithm 1 2

An EM algorithm (1/2)

1.Initialize the parameters of the model μ randomly.

Compute the log likelihood of the corpus C

2. While l(C|μ) is improving repeat:

(a) E-step. Estimatehik

← Naive Bayes assumption

An em algorithm 2 2

An EM algorithm (2/2)

(b) M-step. Re-estimate the parameters P(vj|sk) and P(sk) by way of MLE

Unsupervised disambiguation 2 3

Unsupervised Disambiguation (2/3)

  • Unsupervised disambiguation can be easily adapted to produce distinctions between usage types.

    • Ex) The distinction between physical bank ( in the context of bank robberies ) banks as abstract corporations ( in the context of corporate mergers )

  • The unsupervised algorithm splits dictionary senses into fine-grained contextual variants.

    • Usually, the induced clusters do not line up well with dictionary senses. Ex) ‘lawsuit’ → ‘civil suit’, ‘criminal suit’

Unsupervised disambiguation 3 3

Unsupervised Disambiguation (3/3)

  • Infrequent senses and senses that have few collocations are hard to isolate in unsupervised disambiguation.

  • Results of the EM algorithm

    • The algorithm fails for words whose senses are topic-independent such as ‘to teach’ for train.

← for ten experiments

with different initial


  • Login