wordnet an overview l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
WordNet: An Overview PowerPoint Presentation
Download Presentation
WordNet: An Overview

Loading in 2 Seconds...

play fullscreen
1 / 28

WordNet: An Overview - PowerPoint PPT Presentation


  • 255 Views
  • Uploaded on

WordNet: An Overview. Anubhav Madan anubhavm@comp.nus.edu.sg. Today’s Discussion. WordNet: A Lexical Database WordNet::Similarity Some More Applications Limitations Tutorial. WordNet: A Lexical Database. Started in 1985 Basic Unit: Synset Hierarchical arrangement w/r/t definition

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'WordNet: An Overview' - collier


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
wordnet an overview

WordNet: An Overview

Anubhav Madan

anubhavm@comp.nus.edu.sg

- WordNet - Anubhav Madan

today s discussion
Today’s Discussion
  • WordNet: A Lexical Database
  • WordNet::Similarity
  • Some More Applications
  • Limitations
  • Tutorial

- WordNet - Anubhav Madan

wordnet a lexical database
WordNet: A Lexical Database
  • Started in 1985
  • Basic Unit: Synset
  • Hierarchical arrangement w/r/t definition
  • Contains compounds phrasal verbs, collocations, and idiomatic phrases
  • {Bad Person} @ {offender, libertine}
  • Establishes a rich, dense network and establishes text coherence

- WordNet - Anubhav Madan

wordnet the facts
WordNet: The Facts
  • A word or phrase is the basic unit
  • Words are organized into synsets, which are a group of units that have the same sense.
  • A gloss is a textual definition of the synset
  • Words organized into hierarchies
    • hypernym/hyponym {concept} IS-A {concept}
    • meronym/holonym {concept} HAS-PART {concept}
  • Types: Nouns, Verbs, Adjectives
  • 80,000 Nouns organized into 60,000 concepts

- WordNet - Anubhav Madan

wordnet architecture

Lexicographers

X-Windows

Application 1

Application 2

Lexical

Source

Files

The

WordNet

Database

Application 3

Grinder

Application 4

Application N

WordNet: Architecture

- WordNet - Anubhav Madan

wordnet architecture6
WordNet: Architecture
  • Word/synset pairs stored in the WordNet DB.
  • {Word/List of Word Forms, Pointer to Lexical File, frames (for verbs), list of elements, (optional gloss), adjective cluster}
  • {apple, edible_fruit,@ (fruit with red or yellow or green skin and crisp whitish flesh) }
  • Indexes: Senses are Ordered
    • Index of Familarity – How well known is the word.
    • Index and Data Files
    • Sense Index
  • The Grinder as a Converter: takes Lexical Source Files written by Lexiographers and converts them into a format that is understandable and updatable for WN.

- WordNet - Anubhav Madan

today s discussion7
Today’s Discussion
  • WordNet: A Lexical Database
  • WordNet::Similarity
  • Some More Applications
  • Limitations
  • Tutorial

- WordNet - Anubhav Madan

wordnet similarity
WordNet::Similarity
  • An application measuring “closeness” of concepts in terms of their definitions
  • Main categories of measures:
    • Path based
      • Depth based
      • Information Content Based
    • Gloss Based

- WordNet - Anubhav Madan

wordnet similarity measures
WordNet: Similarity Measures
  • Path Finder
    • Depth Finder
      • Wup (Wu and Palmer): Shortest path by scaling sum of values b/w node, root
      • Lch: (Leacock and Chodrow) Shortest path by scaling the max path
    • Path: Inverse of the Shortest Path measures
    • Information Content Finder
      • Resnik: Max Distance b/w concepts of both words
      • Jcn (Jiang and Conrath): Inverses the difference between Sum and LCS
      • Lin: Scales LCS IC with the description
  • Gloss Finder
    • Lesk (Banerjee and Pederson): Finds and scores overlaps between glosses
    • Vector (Padwardhan): Creates a co-occurrence matrix with glosses in vectors
  • Hso (Hirst and St-Onge): Specifies Direction between Words

Demo

- WordNet - Anubhav Madan

slide10

Root

LCH

2

D=5

Medium of

Exchange

1

1

Money

Credit

1

1

Cash

Credit

Card

1

Coin

Lch Related (Money-Credit) = -log (2/10)

= 0.70

- WordNet - Anubhav Madan

slide11

Root

WUP

2

D=5

Medium of

Exchange

1

1

Money

Credit

1

1

Cash

Credit

Card

1

Coin

Wup ConSim (Money-Credit) = 4/6 = 0.67

- WordNet - Anubhav Madan

slide12

Root

Path

2

D=5

Medium of

Exchange

  • Inverse of the ShortestPath Measures

1

1

Money

Credit

1

1

Cash

Credit

Card

1

Path (Money-Credit)

= 1/ min[0.70, 0.67]

= 1/0.67

= 1.5

Coin

- WordNet - Anubhav Madan

resnik

6/6

3/6

2/6

2/6

1/6

1/6

Resnik

Medium of

Exchange

Money

Credit

Cash

Credit

Card

Coin

Resnik Sim (Money-Credit) = -log (3/6) = 0.30

- WordNet - Anubhav Madan

slide14

6/6

3/6

2/6

2/6

1/6

1/6

Lin

Medium of

Exchange

Money

Credit

Cash

Credit

Card

Coin

Lin Sim (Money-Credit) = log (6/6 – 3/6) = 0.30

- WordNet - Anubhav Madan

slide15

6/6

3/6

2/6

2/6

1/6

1/6

JCN

Medium of

Exchange

Money

Credit

Cash

Credit

Card

Jcn Dist (Money-Coin) = log (3/6) + log (2/6) – 2*log(6/6)

= 0.301 + 0.477 = 0.878

Coin

- WordNet - Anubhav Madan

slide16
Lesk

- WordNet - Anubhav Madan

vector
Vector

- WordNet - Anubhav Madan

slide18
HSO
  • Classfies the relations in WordNet as having directions. 
  • The Is-a relations are upwards.  The has-part are horizontal. 
  • Establishes a relationship b/w words through a path that is neither too long nor changes direction very often.

- WordNet - Anubhav Madan

slide19

Demo

- WordNet - Anubhav Madan

today s discussion20
Today’s Discussion
  • WordNet: A Lexical Database
  • WordNet::Similarity
  • Some More Applications
  • Limitations
  • Tutorial

- WordNet - Anubhav Madan

applications
Applications
  • Building Semantic Concordances
  • Performance and Confidence in a Semantic Annotation Resnik Similarity Measure in Class Based Probabilities
  • Lch WordNet Similarity Measure in Word Sense Identification
  • Text Retrieval using Wordnet

- WordNet - Anubhav Madan

applications22
Applications
  • Lexical Chains as Representations of Context for the Detection of Correction of Malapropisms
  • Temporal Indexing through Lexical Chaining
  • COLOR-X
  • Knowledge Processing on an Extended WordNet

- WordNet - Anubhav Madan

further speculation
Further Speculation
  • Sense Disambiguation
  • Information Retrieval
  • Semantic Relations and Textual Coherence
  • Knowledge engineering

- WordNet - Anubhav Madan

the limitations
The Limitations
  • Relation IS-NOT or NOT-A-KIND-OF is inexpressible
  • Relation IS-USED-AS-A-KIND-OF is also inexpressible
  • No Explicit Distinction between Proper and Common Nouns – It was too difficult to include this information
  • Does not attempt to identify “basic-level” or “generic” categories. For the concepts in the middle of the lexical hierarchy, there can be many listed features that can identify the differences between words. WordNet doesn’t support this.
  • Not enough semantic relations in Wordnet.

- WordNet - Anubhav Madan

tutorial
Tutorial
  • What is WordNet?
  • Why is WordNet unique?
  • What is the difference between WordNet and WordNet::Similarity
  • What are some of the limiting features?
  • Give an example of a human scenario, where WordNet would be instrumental

- WordNet - Anubhav Madan

tutorial26
Tutorial
  • What Similarity measure would you use if you had only the following information:
    • Path [linkages between words in an ontology]
    • Information Content of the Words
    • Gloss of the Words
    • An ontology with direction

- WordNet - Anubhav Madan

references
References
  • Overview: Pedersen, Ted and Patwardhan, Siddharth, and Michelizzi, Jason "WordNet::Similarity - Measuring the Relatedness of Concepts" In: Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-04), pp. 38-41, Boston, May 2004.
  • Lch: Leacock, C., and Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification. In Fellbaum, C., ed., WordNet: An electronic lexical database. MIT Press. 265–283.
  • Wup: Wu, Z., and Palmer, M. 1994. Verb semantics and lexical selection. In 32nd Annual Meeting of the Association for Computational Linguistics, 133–138.
  • Res: Resnik, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, 448–453.
  • Lin: Lin, D. 1998. An information-theoretic definition of similarity. In Proceedings of the International Conference on Machine Learning.
  • Jcn: Jiang, J., and Conrath, D. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings on International Conference on Research in Computational Linguistics, 19–33.
  • Hso: Hirst, G., and St-Onge, D. 1998. Lexical chains as representations of context for the detection and correction of malapropisms. In Fellbaum, C., ed., WordNet: An electronic lexical database. MIT Press. 305–332.
  • Lesk: Banerjee, S., and Pedersen, T. 2003. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, 805–810.
  • Vector: Patwardhan, S. 2003. Incorporating dictionary and corpus information into a context vector measure of semantic relatedness. Master’s thesis, Univ. of Minnesota, Duluth.
  • Links availiable at: http://www.comp.nus.edu.sg/~anubhavm/reading.htm

- WordNet - Anubhav Madan

thank you

Thank You

Anubhav Madan

anubhavm@comp.nus.edu.sg

- WordNet - Anubhav Madan