1 / 28

WordNet: An Overview

WordNet: An Overview. Anubhav Madan anubhavm@comp.nus.edu.sg. Today’s Discussion. WordNet: A Lexical Database WordNet::Similarity Some More Applications Limitations Tutorial. WordNet: A Lexical Database. Started in 1985 Basic Unit: Synset Hierarchical arrangement w/r/t definition

collier
Download Presentation

WordNet: An Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WordNet: An Overview Anubhav Madan anubhavm@comp.nus.edu.sg - WordNet - Anubhav Madan

  2. Today’s Discussion • WordNet: A Lexical Database • WordNet::Similarity • Some More Applications • Limitations • Tutorial - WordNet - Anubhav Madan

  3. WordNet: A Lexical Database • Started in 1985 • Basic Unit: Synset • Hierarchical arrangement w/r/t definition • Contains compounds phrasal verbs, collocations, and idiomatic phrases • {Bad Person} @ {offender, libertine} • Establishes a rich, dense network and establishes text coherence - WordNet - Anubhav Madan

  4. WordNet: The Facts • A word or phrase is the basic unit • Words are organized into synsets, which are a group of units that have the same sense. • A gloss is a textual definition of the synset • Words organized into hierarchies • hypernym/hyponym {concept} IS-A {concept} • meronym/holonym {concept} HAS-PART {concept} • Types: Nouns, Verbs, Adjectives • 80,000 Nouns organized into 60,000 concepts - WordNet - Anubhav Madan

  5. Lexicographers X-Windows Application 1 Application 2 Lexical Source Files The WordNet Database Application 3 Grinder Application 4 Application N WordNet: Architecture - WordNet - Anubhav Madan

  6. WordNet: Architecture • Word/synset pairs stored in the WordNet DB. • {Word/List of Word Forms, Pointer to Lexical File, frames (for verbs), list of elements, (optional gloss), adjective cluster} • {apple, edible_fruit,@ (fruit with red or yellow or green skin and crisp whitish flesh) } • Indexes: Senses are Ordered • Index of Familarity – How well known is the word. • Index and Data Files • Sense Index • The Grinder as a Converter: takes Lexical Source Files written by Lexiographers and converts them into a format that is understandable and updatable for WN. - WordNet - Anubhav Madan

  7. Today’s Discussion • WordNet: A Lexical Database • WordNet::Similarity • Some More Applications • Limitations • Tutorial - WordNet - Anubhav Madan

  8. WordNet::Similarity • An application measuring “closeness” of concepts in terms of their definitions • Main categories of measures: • Path based • Depth based • Information Content Based • Gloss Based - WordNet - Anubhav Madan

  9. WordNet: Similarity Measures • Path Finder • Depth Finder • Wup (Wu and Palmer): Shortest path by scaling sum of values b/w node, root • Lch: (Leacock and Chodrow) Shortest path by scaling the max path • Path: Inverse of the Shortest Path measures • Information Content Finder • Resnik: Max Distance b/w concepts of both words • Jcn (Jiang and Conrath): Inverses the difference between Sum and LCS • Lin: Scales LCS IC with the description • Gloss Finder • Lesk (Banerjee and Pederson): Finds and scores overlaps between glosses • Vector (Padwardhan): Creates a co-occurrence matrix with glosses in vectors • Hso (Hirst and St-Onge): Specifies Direction between Words Demo - WordNet - Anubhav Madan

  10. Root LCH 2 D=5 Medium of Exchange 1 1 Money Credit 1 1 Cash Credit Card 1 Coin Lch Related (Money-Credit) = -log (2/10) = 0.70 - WordNet - Anubhav Madan

  11. Root WUP 2 D=5 Medium of Exchange 1 1 Money Credit 1 1 Cash Credit Card 1 Coin Wup ConSim (Money-Credit) = 4/6 = 0.67 - WordNet - Anubhav Madan

  12. Root Path 2 D=5 Medium of Exchange • Inverse of the ShortestPath Measures 1 1 Money Credit 1 1 Cash Credit Card 1 Path (Money-Credit) = 1/ min[0.70, 0.67] = 1/0.67 = 1.5 Coin - WordNet - Anubhav Madan

  13. 6/6 3/6 2/6 2/6 1/6 1/6 Resnik Medium of Exchange Money Credit Cash Credit Card Coin Resnik Sim (Money-Credit) = -log (3/6) = 0.30 - WordNet - Anubhav Madan

  14. 6/6 3/6 2/6 2/6 1/6 1/6 Lin Medium of Exchange Money Credit Cash Credit Card Coin Lin Sim (Money-Credit) = log (6/6 – 3/6) = 0.30 - WordNet - Anubhav Madan

  15. 6/6 3/6 2/6 2/6 1/6 1/6 JCN Medium of Exchange Money Credit Cash Credit Card Jcn Dist (Money-Coin) = log (3/6) + log (2/6) – 2*log(6/6) = 0.301 + 0.477 = 0.878 Coin - WordNet - Anubhav Madan

  16. Lesk - WordNet - Anubhav Madan

  17. Vector - WordNet - Anubhav Madan

  18. HSO • Classfies the relations in WordNet as having directions.  • The Is-a relations are upwards.  The has-part are horizontal.  • Establishes a relationship b/w words through a path that is neither too long nor changes direction very often. - WordNet - Anubhav Madan

  19. Demo - WordNet - Anubhav Madan

  20. Today’s Discussion • WordNet: A Lexical Database • WordNet::Similarity • Some More Applications • Limitations • Tutorial - WordNet - Anubhav Madan

  21. Applications • Building Semantic Concordances • Performance and Confidence in a Semantic Annotation Resnik Similarity Measure in Class Based Probabilities • Lch WordNet Similarity Measure in Word Sense Identification • Text Retrieval using Wordnet - WordNet - Anubhav Madan

  22. Applications • Lexical Chains as Representations of Context for the Detection of Correction of Malapropisms • Temporal Indexing through Lexical Chaining • COLOR-X • Knowledge Processing on an Extended WordNet - WordNet - Anubhav Madan

  23. Further Speculation • Sense Disambiguation • Information Retrieval • Semantic Relations and Textual Coherence • Knowledge engineering - WordNet - Anubhav Madan

  24. The Limitations • Relation IS-NOT or NOT-A-KIND-OF is inexpressible • Relation IS-USED-AS-A-KIND-OF is also inexpressible • No Explicit Distinction between Proper and Common Nouns – It was too difficult to include this information • Does not attempt to identify “basic-level” or “generic” categories. For the concepts in the middle of the lexical hierarchy, there can be many listed features that can identify the differences between words. WordNet doesn’t support this. • Not enough semantic relations in Wordnet. - WordNet - Anubhav Madan

  25. Tutorial • What is WordNet? • Why is WordNet unique? • What is the difference between WordNet and WordNet::Similarity • What are some of the limiting features? • Give an example of a human scenario, where WordNet would be instrumental - WordNet - Anubhav Madan

  26. Tutorial • What Similarity measure would you use if you had only the following information: • Path [linkages between words in an ontology] • Information Content of the Words • Gloss of the Words • An ontology with direction - WordNet - Anubhav Madan

  27. References • Overview: Pedersen, Ted and Patwardhan, Siddharth, and Michelizzi, Jason "WordNet::Similarity - Measuring the Relatedness of Concepts" In: Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-04), pp. 38-41, Boston, May 2004. • Lch: Leacock, C., and Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification. In Fellbaum, C., ed., WordNet: An electronic lexical database. MIT Press. 265–283. • Wup: Wu, Z., and Palmer, M. 1994. Verb semantics and lexical selection. In 32nd Annual Meeting of the Association for Computational Linguistics, 133–138. • Res: Resnik, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, 448–453. • Lin: Lin, D. 1998. An information-theoretic definition of similarity. In Proceedings of the International Conference on Machine Learning. • Jcn: Jiang, J., and Conrath, D. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings on International Conference on Research in Computational Linguistics, 19–33. • Hso: Hirst, G., and St-Onge, D. 1998. Lexical chains as representations of context for the detection and correction of malapropisms. In Fellbaum, C., ed., WordNet: An electronic lexical database. MIT Press. 305–332. • Lesk: Banerjee, S., and Pedersen, T. 2003. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, 805–810. • Vector: Patwardhan, S. 2003. Incorporating dictionary and corpus information into a context vector measure of semantic relatedness. Master’s thesis, Univ. of Minnesota, Duluth. • Links availiable at: http://www.comp.nus.edu.sg/~anubhavm/reading.htm - WordNet - Anubhav Madan

  28. Thank You Anubhav Madan anubhavm@comp.nus.edu.sg - WordNet - Anubhav Madan

More Related