extended gloss overlaps as a measure of semantic relatedness l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Extended Gloss Overlaps as a Measure of Semantic Relatedness PowerPoint Presentation
Download Presentation
Extended Gloss Overlaps as a Measure of Semantic Relatedness

Loading in 2 Seconds...

  share
play fullscreen
1 / 40
Mia_John

Extended Gloss Overlaps as a Measure of Semantic Relatedness - PowerPoint PPT Presentation

147 Views
Download Presentation
Extended Gloss Overlaps as a Measure of Semantic Relatedness
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth Supported by NSF Grants: #0092784, REC-9979894

  2. Semantic Relatedness • Some pairs of words are closer in meaning than others • E.g. car – tire are strongly related car– tree are not strongly related • Relatedness between words can consist of • Synonymy [e.g. car – automobile] • Is-a/has-a relationships [e.g. car – tire] • Co-occurrence [e.g. car – insurance]

  3. Goal of this Paper • Create a measure to quantify semantic relatedness • Most existing work measures noun-noun only. • Resnik (1995), Lin (1997), Jiang-Conrath (1997), Leacock-Chodorow (1998) • We can measure across parts of speech. • Based on WordNet definitions and relations. • Evaluate • Using word sense disambiguation. • Compare to human relatedness judgments (in paper)

  4. Description of WordNet • Online English lexical database. • Like dictionaries, contains word senses and their definitions or glosses • E.g.: sentence: “the penalty meted out to one adjudged guilty” • Word senses that mean the same are grouped into synonym sets or synsets • E.g.: {sentence, conviction, condemnation}

  5. Semantic Relations in WordNet Synsets are connected to other synsets through “semantic relations” sentence: “the penalty meted out to one adjudged guilty”

  6. Semantic Relations in WordNet Synsets are connected to other synsets through “semantic relations” final judgment:“a judgment disposing of the case before the court of law” a “sentence” is a … sentence: “the penalty meted out to one adjudged guilty”

  7. Semantic Relations in WordNet Synsets are connected to other synsets through “semantic relations” final judgment:“a judgment disposing of the case before the court of law” a “sentence” is a … [hypernym] sentence: “the penalty meted out to one adjudged guilty”

  8. Semantic Relations in WordNet Synsets are connected to other synsets through “semantic relations” final judgment:“a judgment disposing of the case before the court of law” a “sentence” is a … [hypernym] sentence: “the penalty meted out to one adjudged guilty” … is a “sentence” … is a “sentence” hard time: “term served in a maximum security prison” death penalty: “punishment by death via execution”

  9. Semantic Relations in WordNet Synsets are connected to other synsets through “semantic relations” final judgment:“a judgment disposing of the case before the court of law” a “sentence” is a … [hypernym] sentence: “the penalty meted out to one adjudged guilty” … is a “sentence” [hyponym] … is a “sentence” [hyponym] hard time: “term served in a maximum security prison” death penalty: “punishment by death via execution”

  10. Gloss Overlaps ≈ Relatedness • Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g: • bank(1): “a financial institution” • bank(2): “sloping land beside a body of water” • lake: “a body of water surrounded by land”

  11. Gloss Overlaps ≈ Relatedness • Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g: • bank(1): “a financial institution” • bank(2): “sloping land beside a bodyofwater” • lake: “a bodyofwater surrounded by land”

  12. Gloss Overlaps ≈ Relatedness • Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g: • bank(1): “a financial institution” • bank(2): “sloping land beside a bodyofwater” • lake: “a bodyofwater surrounded by land” • Gloss overlaps = # content words common to two glosses ≈ relatedness • Thus, relatedness (bank(2), lake) = 3 • And, relatedness (bank(1), lake) = 0

  13. Limitations of (Lesk’s)Gloss Overlaps • Most glosses are very short. • So not enough words to find overlaps with. • Solution: Extended gloss overlaps • Add glosses of synsets connected to the input synsets.

  14. Extending a Gloss sentence: “the penalty meted out to one adjudged guilty” bench: “persons who hear cases in a court of law” # overlapped words = 0

  15. Extending a Gloss final judgment:“a judgment disposing of the case before the court of law” hypernym sentence: “the penalty meted out to one adjudged guilty” bench: “persons who hear cases in a court of law” # overlapped words = 0

  16. Extending a Gloss final judgment:“a judgment disposing of the case before the courtoflaw” hypernym sentence: “the penalty meted out to one adjudged guilty” bench: “persons who hear cases in a courtoflaw” # overlapped words = 2

  17. Creating the Extended Gloss Overlap Measure • How to measure overlaps? • Which relations to use for gloss extension?

  18. How to Score Overlaps? • Lesk simply summed up overlapped words. • But matches involving phrases – phrasal matches – are rarer, and more informative • E.g. “court of law” • Aim: Score of n words in a phrase > sum of scores of n words in shorter phrases • Solution: Give a phrase of n words a score of • “court of law” gets score of 9.

  19. Which Relations to Use? • Hypernyms [ “car”  “vehicle” ] • Hyponyms [ “car”  “convertible” ] • Meronyms [ “car”  “accelerator” ] • Holonym [ “car”  “train” ] • Also-see relation [“enter”  “move in” ] • Attribute [ “measure”  “standard” ] • Pertainym [ “centennial”  “century” ]

  20. Extended Gloss Overlap Measure • Input two synsets A and B • Find phrasal gloss overlaps between A and B • Next, find phrasal gloss overlaps between every synset connected to A, and every synset connected to B • Compute phrasal scores for all such overlaps • Add phrasal scores to get relatedness of A and B • A and B can be from different parts of speech.

  21. Evaluation: On WSD • Test semantic relatedness measures on Word Sense Disambiguation (WSD) task. • WSD = determine the intended sense of a multi-sense word in a sentence • E.g.: I sat on the bank of the lake. • Our WSD algorithm: Pick that sense of the targetword that is most strongly related to its neighboring words. (based on Lesk ’86)

  22. Word sense disambiguation using a relatedness measure the bench pronounced the sentence

  23. bench: “a long seat for more than one person” the bench pronounced the sentence bench: “persons who hear cases in a court of law”

  24. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” the bench pronounced the sentence bench: “persons who hear cases in a court of law” pronounce: “pronounce judgment on”

  25. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  26. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  27. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  28. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  29. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  30. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  31. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  32. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  33. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  34. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  35. Evaluation Data • Data from SENSEVAL-2 WSD exercise. • 4,328 passages, each 2-3 sentences long and containing 1 multi-sense target word. • Each target word labeled by humans with its most appropriate WordNet sense. • WSD algorithm’s output senses compared against these human labels. • Precision, recall, and f-measure reported.

  36. Evaluation Results

  37. Which WN Relations Help? • Evaluation with a single relation at a time • E.g., comparing only hypernyms, only hyponyms, etc. • Result: No single comparison is a big source of information. • No pair exceeded f-measure of 0.136, as compared to overall f-measure of 0.346

  38. Which WN Relations Help? • Most helpful were: • Hyponym relation • kinds of “car”  “compact”, “SUV”, “coupe”, etc. • Meronym relation • parts of “car”  “accelerator”, “wheel”, “hood”, etc. • These relations are usually one-many. • Thus they give access to many glosses. • Implies: more glosses  more useful.

  39. Conclusions • We presented a new measure of semantic relatedness • Can operate across parts of speech. • We evaluated on the task of WSD. • Performed much better than the Lesk baseline • Performance comparable to other systems. • Future work: • Augment using corpus statistics. • Evaluate on different task.

  40. Resources • WordNet::Similarity (relatedness measures) (http://search.cpan.org/dist/WordNet-Similarity) • Extended gloss overlaps • Resnik, Lin, Jiang-Conrath • Leacock-Chodorow, Hirst-St. Onge • Edge Counting, Random • SenseRelate (WSD using relatedness) (http://www.d.umn.edu/~tpederse/senserelate.html)