1 / 40

Extended Gloss Overlaps as a Measure of Semantic Relatedness

Extended Gloss Overlaps as a Measure of Semantic Relatedness. Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth Supported by NSF Grants: #0092784, REC-9979894. Semantic Relatedness.

Mia_John
Download Presentation

Extended Gloss Overlaps as a Measure of Semantic Relatedness

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth Supported by NSF Grants: #0092784, REC-9979894

  2. Semantic Relatedness • Some pairs of words are closer in meaning than others • E.g. car – tire are strongly related car– tree are not strongly related • Relatedness between words can consist of • Synonymy [e.g. car – automobile] • Is-a/has-a relationships [e.g. car – tire] • Co-occurrence [e.g. car – insurance]

  3. Goal of this Paper • Create a measure to quantify semantic relatedness • Most existing work measures noun-noun only. • Resnik (1995), Lin (1997), Jiang-Conrath (1997), Leacock-Chodorow (1998) • We can measure across parts of speech. • Based on WordNet definitions and relations. • Evaluate • Using word sense disambiguation. • Compare to human relatedness judgments (in paper)

  4. Description of WordNet • Online English lexical database. • Like dictionaries, contains word senses and their definitions or glosses • E.g.: sentence: “the penalty meted out to one adjudged guilty” • Word senses that mean the same are grouped into synonym sets or synsets • E.g.: {sentence, conviction, condemnation}

  5. Semantic Relations in WordNet Synsets are connected to other synsets through “semantic relations” sentence: “the penalty meted out to one adjudged guilty”

  6. Semantic Relations in WordNet Synsets are connected to other synsets through “semantic relations” final judgment:“a judgment disposing of the case before the court of law” a “sentence” is a … sentence: “the penalty meted out to one adjudged guilty”

  7. Semantic Relations in WordNet Synsets are connected to other synsets through “semantic relations” final judgment:“a judgment disposing of the case before the court of law” a “sentence” is a … [hypernym] sentence: “the penalty meted out to one adjudged guilty”

  8. Semantic Relations in WordNet Synsets are connected to other synsets through “semantic relations” final judgment:“a judgment disposing of the case before the court of law” a “sentence” is a … [hypernym] sentence: “the penalty meted out to one adjudged guilty” … is a “sentence” … is a “sentence” hard time: “term served in a maximum security prison” death penalty: “punishment by death via execution”

  9. Semantic Relations in WordNet Synsets are connected to other synsets through “semantic relations” final judgment:“a judgment disposing of the case before the court of law” a “sentence” is a … [hypernym] sentence: “the penalty meted out to one adjudged guilty” … is a “sentence” [hyponym] … is a “sentence” [hyponym] hard time: “term served in a maximum security prison” death penalty: “punishment by death via execution”

  10. Gloss Overlaps ≈ Relatedness • Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g: • bank(1): “a financial institution” • bank(2): “sloping land beside a body of water” • lake: “a body of water surrounded by land”

  11. Gloss Overlaps ≈ Relatedness • Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g: • bank(1): “a financial institution” • bank(2): “sloping land beside a bodyofwater” • lake: “a bodyofwater surrounded by land”

  12. Gloss Overlaps ≈ Relatedness • Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g: • bank(1): “a financial institution” • bank(2): “sloping land beside a bodyofwater” • lake: “a bodyofwater surrounded by land” • Gloss overlaps = # content words common to two glosses ≈ relatedness • Thus, relatedness (bank(2), lake) = 3 • And, relatedness (bank(1), lake) = 0

  13. Limitations of (Lesk’s)Gloss Overlaps • Most glosses are very short. • So not enough words to find overlaps with. • Solution: Extended gloss overlaps • Add glosses of synsets connected to the input synsets.

  14. Extending a Gloss sentence: “the penalty meted out to one adjudged guilty” bench: “persons who hear cases in a court of law” # overlapped words = 0

  15. Extending a Gloss final judgment:“a judgment disposing of the case before the court of law” hypernym sentence: “the penalty meted out to one adjudged guilty” bench: “persons who hear cases in a court of law” # overlapped words = 0

  16. Extending a Gloss final judgment:“a judgment disposing of the case before the courtoflaw” hypernym sentence: “the penalty meted out to one adjudged guilty” bench: “persons who hear cases in a courtoflaw” # overlapped words = 2

  17. Creating the Extended Gloss Overlap Measure • How to measure overlaps? • Which relations to use for gloss extension?

  18. How to Score Overlaps? • Lesk simply summed up overlapped words. • But matches involving phrases – phrasal matches – are rarer, and more informative • E.g. “court of law” • Aim: Score of n words in a phrase > sum of scores of n words in shorter phrases • Solution: Give a phrase of n words a score of • “court of law” gets score of 9.

  19. Which Relations to Use? • Hypernyms [ “car”  “vehicle” ] • Hyponyms [ “car”  “convertible” ] • Meronyms [ “car”  “accelerator” ] • Holonym [ “car”  “train” ] • Also-see relation [“enter”  “move in” ] • Attribute [ “measure”  “standard” ] • Pertainym [ “centennial”  “century” ]

  20. Extended Gloss Overlap Measure • Input two synsets A and B • Find phrasal gloss overlaps between A and B • Next, find phrasal gloss overlaps between every synset connected to A, and every synset connected to B • Compute phrasal scores for all such overlaps • Add phrasal scores to get relatedness of A and B • A and B can be from different parts of speech.

  21. Evaluation: On WSD • Test semantic relatedness measures on Word Sense Disambiguation (WSD) task. • WSD = determine the intended sense of a multi-sense word in a sentence • E.g.: I sat on the bank of the lake. • Our WSD algorithm: Pick that sense of the targetword that is most strongly related to its neighboring words. (based on Lesk ’86)

  22. Word sense disambiguation using a relatedness measure the bench pronounced the sentence

  23. bench: “a long seat for more than one person” the bench pronounced the sentence bench: “persons who hear cases in a court of law”

  24. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” the bench pronounced the sentence bench: “persons who hear cases in a court of law” pronounce: “pronounce judgment on”

  25. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  26. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  27. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  28. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  29. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  30. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  31. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  32. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  33. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  34. pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”

  35. Evaluation Data • Data from SENSEVAL-2 WSD exercise. • 4,328 passages, each 2-3 sentences long and containing 1 multi-sense target word. • Each target word labeled by humans with its most appropriate WordNet sense. • WSD algorithm’s output senses compared against these human labels. • Precision, recall, and f-measure reported.

  36. Evaluation Results

  37. Which WN Relations Help? • Evaluation with a single relation at a time • E.g., comparing only hypernyms, only hyponyms, etc. • Result: No single comparison is a big source of information. • No pair exceeded f-measure of 0.136, as compared to overall f-measure of 0.346

  38. Which WN Relations Help? • Most helpful were: • Hyponym relation • kinds of “car”  “compact”, “SUV”, “coupe”, etc. • Meronym relation • parts of “car”  “accelerator”, “wheel”, “hood”, etc. • These relations are usually one-many. • Thus they give access to many glosses. • Implies: more glosses  more useful.

  39. Conclusions • We presented a new measure of semantic relatedness • Can operate across parts of speech. • We evaluated on the task of WSD. • Performed much better than the Lesk baseline • Performance comparable to other systems. • Future work: • Augment using corpus statistics. • Evaluate on different task.

  40. Resources • WordNet::Similarity (relatedness measures) (http://search.cpan.org/dist/WordNet-Similarity) • Extended gloss overlaps • Resnik, Lin, Jiang-Conrath • Leacock-Chodorow, Hirst-St. Onge • Edge Counting, Random • SenseRelate (WSD using relatedness) (http://www.d.umn.edu/~tpederse/senserelate.html)

More Related