Unsupervised Word Sense Disambiguation Rivaling Supervised Methods

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods 1998. 12. 10. Oh-Woog Kwon KLE Lab. CSE POSTECH

Introduction • An Unsupervised Algorithm for WSD • Avoids the need for costly hand-tagged training data • Using two powerful properties of human language 1. One sense per collocation (dictionary definition of collocation): 2. One sense per discourse: 동물의 눈은 물체를 보는 기관이다. Only one sense (eye), not two sense (eye or snow) …. …. bank Text:101 Same sense …. …. bank …. …. bank

One Sense Per Discourse • A Test for One Sense Per Discourse • Table of pp. 189 (using 37,232 hand-tagged examples) • Accuracy: discourse에서 같은 단어는 같은 의미로 사용되나? (99.8%) • Applicability: 한 discourse에서 두 번 이상 나타나는가? (50.1%) • Advantage of One Sense Per Discourse • Conjunction with separate models of local context for each word Local context of bank = …. …. bank Text:101 …. …. … bank … + …. …. … bank +

One Sense Per Collocation • The Type of Collocation (predictive degree) • Immediately adjacent collocations > collocations with distance • At equivalent distance, predicate-argument relationship > arbitrary associations • Collocations with content words > collocations with function words  adjacent content words can disambiguate word sense. • A Supervised Algorithm based on Above Property • Decision List Algorithm [Yarowsky, ACL94] • Accent Restoration in Spanish and French • Be used as a component of the proposed unsupervised algorithm

Decision List Algorithm Step 1: Identify the Ambiguities in the Target Word ex) 눈 : eye, snow Step 2: Collect Training Context, for Each Sense ex) eye : … 사람의 눈은 좋은 …, 곤충의 눈은 머리에 …, … … snow: … 하늘에서 눈이 내리고 …, … 어제 눈이 내려 …, … … Step 3: Measure Collocational Distribution ex) -1 w [사람 눈] : eye (1,000), snow (0) k w [하늘 within k words] : eye (2), snow (10,000) Step 4: Sort by Log-Likelihood into Decision Lists Step 5: Optional Pruning and Interpolation Step 6: Train Decision Lists for General Classes of Ambiguity Step 7: Classification using Decision Lists Using only the single most reliable collocation matched in the target context

Unsupervised Learning Algorithm - 1 • Illustrated by the disambiguation of 7,538 instances of plant • STEP 1: • Collect contexts in untagged training set (right column of pp. 190) • STEP 2: a) Choose a small number of seed collocations of each sense b) Tagging all training examples containing the seed collocates with seed’s sense label => two seed sets (left column of pp. 191, Figure 1) • Options for Training Seeds • Use words in dictionary definitions • Use a single defining collocate for each class (using thesaurus(WordNet)) • Label salient corpus collocates (not fully automatic): • use of words that co-occur with the target word • a human judge decide which one

Unsupervised Learning Algorithm - 2 • STEP 3: (pp. 192, Figure 2) a) Train the supervised classification algorithm on two seed sets b) Classify the entire sample set using the resulting classifier of (a) Add examples with probability above a threshold to the seed sets c) Using one-sense-per-discourse constraint (option) • Detect the dominate sense for each discourse (using threshold). • Augmentation: If the dominate sense exists, add previously untagged contexts to the seed set of the dominate sense • Filtering: Otherwise, return all instances in the discourse (where there is substantial disagreement for the dominate sense) to the residual set. d) Repeat Step 3. • Can escape from initial misclassification • Two techniques to avoid a local minimum • incrementally increasing the width of the context window periodically • randomly perturbing the class-inclusion threshold, similar to simulated annealing.

Unsupervised Learning Algorithm - 3 • STEP 4: Stop, when converging on a stable residual set. • STEP 5: Classify new data using final decision lists • For error correction, optionally use one-sense-per-discourse constraint.

Evaluation • The test data • extracted from a 460 million word corpus • the type of data: news article, scientific abstracts, spoken transcripts, and novels used in the previous researches. • Comparison System (see Table in pp. 194) • (5) : using supervised algorithm • (6) : using only two words as seeds • (7) : using the salient words of a dictionary definition as seeds • (8) : using quick hand tagging of a list of algorithmically-identified salient collocates • (9) : (7) + using one-sense-per-discourse only in classification procedure • (10) : (9) + using one-sense-per-discourse in the learning

Conclusion • Unsupervised Word Sense Disambiguation Rivaling Supervised Methods

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods

Presentation Transcript

Representing Meaning in Unsupervised Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Supervised, semi-supervised and Unsu pervised approaches for word sense disambiguation

Word Sense Disambiguation (WSD)

Word Sense Disambiguation

Evaluating Word Sense Induction and Disambiguation Methods

Word Sense Disambiguation

Unsupervised Word Sense Disambiguation

Word Sense Disambiguation

Part 4: Supervised Methods of Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation