1 / 56

Supervised, semi-supervised and Unsu pervised approaches for word sense disambiguation

Supervised, semi-supervised and Unsu pervised approaches for word sense disambiguation. Slides by Arindam Chatterjee & Salil Joshi. Under the guidance of. Prof. Pushpak Bhattacharyya. May 01, 2010. roadmap. Bird’s Eye View. Supervised Approaches. Semi-supervised Approaches.

tahlia
Download Presentation

Supervised, semi-supervised and Unsu pervised approaches for word sense disambiguation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supervised, semi-supervised and Unsupervised approaches for word sense disambiguation Slides by ArindamChatterjee & Salil Joshi Under the guidance of Prof. Pushpak Bhattacharyya May 01, 2010

  2. roadmap Supervised, Semi-supervised and Unsupervised Approaches in WSD Bird’s Eye View. Supervised Approaches. Semi-supervised Approaches. Unsupervised Approaches. Summary

  3. Bird’s eye view Hybrid Supervised, Semi-supervised and Unsupervised Approaches in WSD

  4. Supervised approaches Supervised, Semi-supervised and Unsupervised Approaches in WSD

  5. Supervised approaches WSD Classes = senses Classified based on its feature vector Money, finance Money, finance blood, plasma Water, river 5 training instances(words) Model trained from training data Class 1 Class 2 Class 3 (sense 1) (sense 2) (sense 3) Training Phase Testing Phase Supervised, Semi-supervised and Unsupervised Approaches in WSD

  6. Feature vector for wsd The feature vector consists of the following features: • Part Of Speech (POS) of w • Semantic & Syntactic features of w • Collocation vector (set of words around it)  typically consists of next word(+1), next-to-next word(+2), -2, -1 & their POS's. • Co-occurrence vector (number of times w occurs in bag of words around it) Supervised, Semi-supervised and Unsupervised Approaches in WSD In supervised WSD, the feature vector consists of four features

  7. Supervised approaches Supervised, Semi-supervised and Unsupervised Approaches in WSD Unifying thread of operation Use of annotated corpora. They are all target-word WSD approaches. Representation of words as feature vectors. Algorithms Decision List. Decision Tree. Naïve Bayes. Exemplar Based Approach. Support Vector Machines. Neural Networks. Ensemble Methods.

  8. 1. Decision lists Supervised, Semi-supervised and Unsupervised Approaches in WSD • Based on ‘One sense per collocation’ property. • Nearby words provide strong and consistent clues to the sense of a target word. • Decision List is an ordered set of if-then-else rules. • If (feature X)then sense (Si) • Each rule is weighted by a score. • In the Training phase the decision list is built from evidence in the corpus. • In the Testing phase, the sense with the highest score wins.

  9. For a particular word: 1. Decision lists(contd.) Training Phase Features are extracted from the corpus. An ordered decision list of the form {feature-value, sense, score} is created. The score of a feature f is the log-likelihood ratio of the sense given the feature as: . Supervised, Semi-supervised and Unsupervised Approaches in WSD Supervised, Semi-supervised and Hybrid Approaches in WSD

  10. 1. Decision lists(contd.) • The decision list for the word bank.(Courtesy Navigli, 2009) • Test Sentence: I went for a walk along the river bank Supervised, Semi-supervised and Unsupervised Approaches in WSD Supervised, Semi-supervised and Hybrid Approaches in WSD

  11. 3.Support Vector Machines E.g., If a word has 4 senses B This distance gives the confidence score for each SVM A The SVM with the highest confidence score becomes the winner sense Supervised, Semi-supervised and Unsupervised Approaches in WSD

  12. 3.Ensemble methods. • A collection of classifiers (C1, C2, …, Cn)are combined to improve the overall accuracy of WSD system. C1 S1 Total_Score(S1) Score Function C2 S2 Total_Score(S2) C3 Senses Ensemble Components (Classifiers) • For each approach, the score function varies. Supervised, Semi-supervised and Unsupervised Approaches in WSD

  13. A. Majority Voting. • Each ensemble component votes for one sense of targeted word. C1 S1 Winner sense C2 S2 C3 • Here the score function is a vote function. • The sense with largest number of ‘votes’ is selected as winner sense. Supervised, Semi-supervised and Unsupervised Approaches in WSD

  14. B. Probability Mixture. • The scoring function is a confidence score • The confidence score is normalized as • The normalized scores are summed up and the sense with maximum sum is selected as the winner sense. Total_Score(S1) = 1.0 +1.0 + 1.0 = 3.0 Total_Score(S2) = 0.7 +0.4 + 0.3 = 1.4 Supervised, Semi-supervised and Unsupervised Approaches in WSD

  15. B. Probability Mixture. Confidence Score/Normalized Score C1 0.6/1 S1 0.4/0.7 Winner sense 0.7/1 C2 Score = 3.0 0.3/0.4 S2 0.8/1 C3 0.2/0.3 Score = 1.4 Supervised, Semi-supervised and Unsupervised Approaches in WSD

  16. c. Rank based combination. • The score function is the rank of each sense. • The ranks are negated and summed up. • The sense with the highest sum wins. Total_Score: S1 = (-1) + (-2) + (-1) = -4, S2 = (-2) + (-1) + (-2) = -5 Supervised, Semi-supervised and Unsupervised Approaches in WSD

  17. c. Rank based combination. Rank/Negated Rank C1 1/-1 S1 2/-2 Winner sense 2/-2 C2 Score = -4 1/-1 S2 1/-1 C3 2/-2 Score = -5 Supervised, Semi-supervised and Unsupervised Approaches in WSD

  18. Semi-Supervised approaches Supervised, Semi-supervised and Unsupervised Approaches in WSD

  19. Semi-Supervised approaches Data required reduced Semi-Supervised approaches use minimal annotated data Supervised approaches use large annotated data Supervised, Semi-supervised and Unsupervised Approaches in WSD

  20. Semi-Supervised approaches Supervised, Semi-supervised and Unsupervised Approaches in WSD Unifying thread of operation Use of minimal annotated corpora. Use of unannotated data for tuning. Algorithms Bootstrapping . Monosemous Relatives .

  21. 1. bootstrapping Supervised, Semi-supervised and Unsupervised Approaches in WSD

  22. 1. bootstrapping An example of Yarowsky’s algorithm. At each iteration, new examples are labeled with class a or b and added to the set A of sense tagged examples. Courtesy Navigli, 2009 Supervised, Semi-supervised and Unsupervised Approaches in WSD

  23. UNsupervised approaches Supervised, Semi-supervised and Unsupervised Approaches in WSD

  24. Unsupervised approaches Unsupervised Approach I (Clustering based on size of balls) clusters • Input data • Circles of different size and colors • No associated background knowledge • Implicit features are size and color of balls clusters Unsupervised Approach II (Clustering based on color of balls) Supervised, Semi-supervised and Unsupervised Approaches in WSD

  25. Hyperlex(1/2) मुक्तता (discharge) उष्णता (heat) चमक (shine) वाफ (steam) उर्जा (energy) धन (positive) निर्माण (produce) प्रभार (charge) ऋण (negative) ज्वलन (combustion) जनित्र (turbine) इंधन (fuel) वादळ (thunder) Hyperlex: Example showing graph for context of word वीज (electricity/lightning) • For each high density component, highest degree node is selected as hub. • The procedure is iterated by removing the hub with its neighbors. • For this example, the hubs will be ज्वलन (combustion) and चमक (shine). Supervised, Semi-supervised and Unsupervised Approaches in WSD

  26. Hyperlex(2/2) • Example • जनित्रेवाफवापरूनवीजप्रभारनिर्माण करतात. Turbines steam use to electricity produce (Turbines use steam to produce electricity) Supervised, Semi-supervised and Unsupervised Approaches in WSD

  27. Summary Supervised Algorithms: • Based on human supervision hence the name. • Use corpus evidence instead of relying on knowledge bases. • Build classifiers to classify words, where senses are classes. Semi-supervised Algorithms • Use less information than supervised approaches. • Create required information as a part of the algorithm. • Unsupervised Algorithms • Cluster instances based on inherent features Supervised, Semi-supervised and Unsupervised Approaches in WSD

  28. SUMMARY Supervised Algorithms: • Perform better than all other approaches, especially knowledge based. • E.g. Can pick up clues from several components like proper nouns, unlike knowledge based approaches. • Depend heavily on large amount of tagged data. • Suffer from data sparsity. Semi-supervised Algorithms • Tend to partially eradicate the knowledge acquisition bottleneck . • Works at par with supervised approach. • Unsupervised Algorithms • Performance is good for a limited set of target words. Supervised, Semi-supervised and Unsupervised Approaches in WSD

  29. references Supervised, Semi-supervised and Unsupervised Approaches in WSD • AGIRRE, E., AND MARTINEZ, D. Exploring automatic word sense disambiguation with decision lists and the web. In Proc. of the COLING-2000 (2000).   • BOSER, B. E., GUYON, I. M., AND VAPNIK, V. N. A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theor y (1992), p. 144152.   • COST, S., AND SALZBERG, S. A weighted nearest neighbor algorithm for learning with symbolic features. Machine learning 10, 1 (1993), 5778.   • ESCUDERO, G., MARQUEZ, L., AND RIGAU, G. Naive bayes and exemplar-based approaches to word sense disambiguation revisited. Arxiv preprint cs/0007011 (2000).   • FELLBAUM, C., ET AL. WordNet: An electronic lexical database. MIT press Cambridge, MA, 1998. • FREUND, Y., SCHAPIRE, R., AND ABE, N. A short introduction to boosting. JOURNAL-JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE 14 (1999), 771780. • KHAPRA, M. M., BHATTACHARYYA, P., CHAUHAN, S., NAIR, S., AND SHARMA, A. Domain specific iterative word sense disambiguation in a multilingual setting. • KILGARRIFF, A., AND GREFENSTETTE, G. Introduction to the special issue on the web as corpus. Computational linguistics 29, 3 (2003), 333347.

  30. references Supervised, Semi-supervised and Unsupervised Approaches in WSD • KILGARRIFF, A., AND YALLOP, C. Whats in a thesaurus. In Proceedings of the Second Interna-tional Conference on Language Resources and Evaluation (2000), p. 13711379. • LITTLESTONE, N. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine learning 2, 4 (1988), 285318. • MALLERY, J. C. Thinking about foreign policy: Finding an appropriate role for artificially intel-ligent computers. Cambridge: Masters Thesis, MIT Political Science Department (1988).   • MCCULLOCH, W. S., AND PITTS, W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biology 5, 4 (1943), 115133. • MILLER, G., BECKWITH, R., FELLBAUM, C., GROSS, D., AND MILLER, K. J. WordNet: an on-line lexical database. International journal of lexicography 3, 4 (1990), 235312.   • NAVIGLI, R. Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2 (2009). • NAVIGLI, R., AND VELARDI, P. Learning domain ontologies from document warehouses and dedicated web sites. Computational Linguistics 30, 2 (2004), 151179.

  31. references Supervised, Semi-supervised and Unsupervised Approaches in WSD • NG, H. T., ET AL. Exemplar-based word sense disambiguation: Some recent improvements. In Proceedings of the Second Conference on Empirical methods in natural Language Processing (1997), p. 208213.   • PEDERSEN, T. A simple approach to building ensembles of naive bayesian classifiers f or word sense disambiguation. In Proceedings of the 1st North American chapter of the Association forComputational Linguistics conference (2000), p. 6369. • QUINLAN, J. R. Induction of decision trees. Machine learning 1, 1 (1986), 81106. • QUINLAN, J. R. C4. 5: programs for machine learning. Morgan Kaufmann, 1993. • ROGET, P. M. Roget's International Thesaurus, 1st ed. Cromwell, New York, 1911. • ROTH, D., YANG, M., AND AHUJA, N. A snowbased face detector. In Neural InformationProcessing (2000), vol. 12. • SCHAPIRE, R. E., AND SINGER, Y. Improved boosting algorithms using confidence-rated predic-tions. Machine learning 37, 3 (1999), 297336. • YAROWSKY, D. Decision lists for lexical ambiguity resolution: Application to accent restoration in spanish and french. In Proceedings of the 32nd annual meeting on Association for ComputationalLinguistics (1994), p. 8895. • YAROWSKY, D. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics (1995),p. 189196.

  32. Thank you ? Supervised, Semi-supervised and Unsupervised Approaches in WSD

  33. Appendix Supervised, Semi-supervised and Unsupervised Approaches in WSD

  34. 1. Wsd : variants • Lexical Sample [Targeted WSD]: System is required to disambiguate a restricted set of target words usually occurring one per sentence. • Employs Supervised techniques using Hand-labeled instances as training set and then an unlabeled test set. • All-words WSD: Systems are expected to disambiguate all open-class words in a text (i.e., nouns, verbs, adjectives, and adverbs). • Wide coverage systems to disambiguate all open-class words. Suffers from Data sparseness problem, as large knowledge sources are not available. Supervised, Semi-supervised and Unsupervised Approaches in WSD

  35. 2. Collocation Vector • Set of words around the target word. • Typically consists of next word(+1), next-to-next word(+2), -2, -1 & their POS's: • [wi−2, POSi−2, wi−1, POSi−1, wi+1, POSi+1, wi+2,POSi+2] • For example, the sentence : • “I usually have grilled bass on Sunday” • and the target word bass, would yield the following vector: [have, VB, grilled, ADJ, on, PREP, Sunday, NN] Supervised, Semi-supervised and Unsupervised Approaches in WSD

  36. 3. Decision trees • 4. Naïve Bayes • Applying Bayes’ rule and naive independence assumption on the features sˆ= argmaxs ε senses Pr(s).Πi=1nPr(Vwi|s) Supervised, Semi-supervised and Unsupervised Approaches in WSD Feature vectors are represented in the form of a tree. The tree is built using ID3(C4.5) algorithm. Corresponding to the input sentence, the tree is traversed. The sense at the leaf node reached is the winner sense.

  37. 5. Exemplar Based approach • Also known as Memory Based or Instance Based Learning approach. • Unlike other Supervised approaches, builds a Classification model by keeping all the training instances in the memory. • Typically implemented using kNN algorithm. • Represented in form of points in feature space. • The new examples are classified by computing distance with all training set examples. • The k-nearest neighbors are found. • Class from which largest number of neighbors are found is selected as the Winner sense. Supervised, Semi-supervised and Unsupervised Approaches in WSD

  38. Exemplar Based approach • (Cntd.) • The Hamming Distance between the points is calculated using: • Where, • x is the instance to be classified. • xi is the ith training example. • Wj is weight of jth feature, calculated using gain ration measure [Quinlan, 1993] or using modified value difference metric [Cost & Salzberg, 1993]. • ∂ (xj, xij) is zero if xi = xj and 1 otherwise. Supervised, Semi-supervised and Unsupervised Approaches in WSD

  39. 6. Neural networks • WSD is treated as a sequence labeling task. • The class space is reduced by using WordNet's super senses instead of actual senses. • A discriminative HMM is trained using the following features: • POS of w as well as POS of neighboring words. • Local collocations • Shape of the word and neighboring words E.g. for s = “Merrill Lynch & Co shape(s) =Xx*Xx*&Xx • Lends itself well to NER as labels like “person”, location”, "time” etc are included in the super sense tag set. Supervised, Semi-supervised and Unsupervised Approaches in WSD

  40. 7. Monosemous relatives • Uses the web as corpus. • Selects a seed of data from the web. • The seed data is minimal. • Then bootstraps and builds large annotated data. Supervised, Semi-supervised and Unsupervised Approaches in WSD

  41. 8. An iterative approach to wsd • Uses semantic relations (synonymy and hypernymy) form WordNet. • Extracts collocational and contextual information form WordNet (gloss) and a small amount of tagged data. • Monosemic words in the context serve as a seed set of disambiguated words. • In each iteration new words are disambiguated based on their semantic distance from already disambiguated words. • It would be interesting to exploit other semantic relations available in WordNet. Supervised, Semi-supervised and Unsupervised Approaches in WSD

  42. 9. Results: Supervised Supervised, Semi-supervised and Unsupervised Approaches in WSD

  43. 10. Results: Semi-Supervised Supervised, Semi-supervised and Unsupervised Approaches in WSD

  44. 11. Results: Hybrid Supervised, Semi-supervised and Unsupervised Approaches in WSD

  45. introduction Q : What is Word Sense Disambiguation(WSD) ? Target word : bank John has a bank account Context word : account Senses of the word “bank” Domain1 : FINANCE Domain3 : SUPPLY Domain2 : GEOGRAPHY Winner Sense WSD : Definitions • Generally: WSD is the ability to identify the sense(meaning) of words in context in a computational manner. • Formally: WSD a mapping A from words to senses, such that A(i) ⊆ SensesD(wi ). • Where: • SensesD(wi) : Set of senses encoded in a dictionary D for word wi . • A(i) : That subset of the senses of wi which are appropriate in the context T. • As aclassification problem: Where senses are classes. Supervised, Semi-supervised and Unsupervised Approaches in WSD

  46. Motivation WSD:As the Heart NLP • SRL : Semantic Role Labeling   • TE : Text Entailment   • CLIR : Cross Lingual Information Retrieval  • NER : Named Entity Recognition • MT : Machine Translation   • SP : Shallow Parsing   • SA : Sentiment Analysis   • WSD : Word Sense Disambiguation WSD is an AI-Complete Problem: It is as hard as the hardest problems in AI, like representation of common sense Supervised, Semi-supervised and Unsupervised Approaches in WSD

  47. D.AdaBoost. • Constructs strong classifier as a linear combination of two or more weak classifiers. • The method is adaptive because it adjusts the weak classifiers so that it correctly classifies previously misclassified instances. • The algorithm iterates m times, if there are m classifiers. steps • Each instance is assigned equal weight initially. • In each pass of the iteration, the weights of misclassified instances are increased. • A value αjis calculated for each classifier, which is a function of the classification error for classifier Cj Supervised, Semi-supervised and Unsupervised Approaches in WSD

  48. D.AdaBoost. Steps(Ctd.) • A classifiers are then combined by the function ‘H’ for instance x. • H is the strong classifier, which is a linear combination of the other weak classifiers. • It is a sign function of the linear combination of the weak classifiers. Supervised, Semi-supervised and Unsupervised Approaches in WSD

  49. Future Directions • Development of better sense recognition systems. • Eradication of knowledge acquisition bottleneck. • More attention needs to be paid towards Domain Specific approach in WSD. • If larger annotated corpora can be built then the accuracy of supervised approaches will shoot higher. Supervised, Semi-supervised and Unsupervised Approaches in WSD

  50. 2.Support Vector Machines SVM is a binary classifier which finds a hyper plane with the largest margin that separates training examples into 2 classes. As SVMs are binary classifiers, a separate classifier is built for each sense of the word. Training Phase: Using a tagged corpus, for every sense of the word a SVM is trained using features. Testing Phase: Given a test sentence, a test example is constructed using the features and fed as input to each binary classifier. The correct sense is selected based on the label returned by each classifier. In case of a clash, the SVM with higher confidence score is returned. Supervised, Semi-supervised and Unsupervised Approaches in WSD

More Related