400 likes | 597 Views
Extended Gloss Overlaps as a Measure of Semantic Relatedness. Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth Supported by NSF Grants: #0092784, REC-9979894. Semantic Relatedness.
E N D
Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth Supported by NSF Grants: #0092784, REC-9979894
Semantic Relatedness • Some pairs of words are closer in meaning than others • E.g. car – tire are strongly related car– tree are not strongly related • Relatedness between words can consist of • Synonymy [e.g. car – automobile] • Is-a/has-a relationships [e.g. car – tire] • Co-occurrence [e.g. car – insurance]
Goal of this Paper • Create a measure to quantify semantic relatedness • Most existing work measures noun-noun only. • Resnik (1995), Lin (1997), Jiang-Conrath (1997), Leacock-Chodorow (1998) • We can measure across parts of speech. • Based on WordNet definitions and relations. • Evaluate • Using word sense disambiguation. • Compare to human relatedness judgments (in paper)
Description of WordNet • Online English lexical database. • Like dictionaries, contains word senses and their definitions or glosses • E.g.: sentence: “the penalty meted out to one adjudged guilty” • Word senses that mean the same are grouped into synonym sets or synsets • E.g.: {sentence, conviction, condemnation}
Semantic Relations in WordNet Synsets are connected to other synsets through “semantic relations” sentence: “the penalty meted out to one adjudged guilty”
Semantic Relations in WordNet Synsets are connected to other synsets through “semantic relations” final judgment:“a judgment disposing of the case before the court of law” a “sentence” is a … sentence: “the penalty meted out to one adjudged guilty”
Semantic Relations in WordNet Synsets are connected to other synsets through “semantic relations” final judgment:“a judgment disposing of the case before the court of law” a “sentence” is a … [hypernym] sentence: “the penalty meted out to one adjudged guilty”
Semantic Relations in WordNet Synsets are connected to other synsets through “semantic relations” final judgment:“a judgment disposing of the case before the court of law” a “sentence” is a … [hypernym] sentence: “the penalty meted out to one adjudged guilty” … is a “sentence” … is a “sentence” hard time: “term served in a maximum security prison” death penalty: “punishment by death via execution”
Semantic Relations in WordNet Synsets are connected to other synsets through “semantic relations” final judgment:“a judgment disposing of the case before the court of law” a “sentence” is a … [hypernym] sentence: “the penalty meted out to one adjudged guilty” … is a “sentence” [hyponym] … is a “sentence” [hyponym] hard time: “term served in a maximum security prison” death penalty: “punishment by death via execution”
Gloss Overlaps ≈ Relatedness • Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g: • bank(1): “a financial institution” • bank(2): “sloping land beside a body of water” • lake: “a body of water surrounded by land”
Gloss Overlaps ≈ Relatedness • Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g: • bank(1): “a financial institution” • bank(2): “sloping land beside a bodyofwater” • lake: “a bodyofwater surrounded by land”
Gloss Overlaps ≈ Relatedness • Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g: • bank(1): “a financial institution” • bank(2): “sloping land beside a bodyofwater” • lake: “a bodyofwater surrounded by land” • Gloss overlaps = # content words common to two glosses ≈ relatedness • Thus, relatedness (bank(2), lake) = 3 • And, relatedness (bank(1), lake) = 0
Limitations of (Lesk’s)Gloss Overlaps • Most glosses are very short. • So not enough words to find overlaps with. • Solution: Extended gloss overlaps • Add glosses of synsets connected to the input synsets.
Extending a Gloss sentence: “the penalty meted out to one adjudged guilty” bench: “persons who hear cases in a court of law” # overlapped words = 0
Extending a Gloss final judgment:“a judgment disposing of the case before the court of law” hypernym sentence: “the penalty meted out to one adjudged guilty” bench: “persons who hear cases in a court of law” # overlapped words = 0
Extending a Gloss final judgment:“a judgment disposing of the case before the courtoflaw” hypernym sentence: “the penalty meted out to one adjudged guilty” bench: “persons who hear cases in a courtoflaw” # overlapped words = 2
Creating the Extended Gloss Overlap Measure • How to measure overlaps? • Which relations to use for gloss extension?
How to Score Overlaps? • Lesk simply summed up overlapped words. • But matches involving phrases – phrasal matches – are rarer, and more informative • E.g. “court of law” • Aim: Score of n words in a phrase > sum of scores of n words in shorter phrases • Solution: Give a phrase of n words a score of • “court of law” gets score of 9.
Which Relations to Use? • Hypernyms [ “car” “vehicle” ] • Hyponyms [ “car” “convertible” ] • Meronyms [ “car” “accelerator” ] • Holonym [ “car” “train” ] • Also-see relation [“enter” “move in” ] • Attribute [ “measure” “standard” ] • Pertainym [ “centennial” “century” ]
Extended Gloss Overlap Measure • Input two synsets A and B • Find phrasal gloss overlaps between A and B • Next, find phrasal gloss overlaps between every synset connected to A, and every synset connected to B • Compute phrasal scores for all such overlaps • Add phrasal scores to get relatedness of A and B • A and B can be from different parts of speech.
Evaluation: On WSD • Test semantic relatedness measures on Word Sense Disambiguation (WSD) task. • WSD = determine the intended sense of a multi-sense word in a sentence • E.g.: I sat on the bank of the lake. • Our WSD algorithm: Pick that sense of the targetword that is most strongly related to its neighboring words. (based on Lesk ’86)
Word sense disambiguation using a relatedness measure the bench pronounced the sentence
bench: “a long seat for more than one person” the bench pronounced the sentence bench: “persons who hear cases in a court of law”
pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” the bench pronounced the sentence bench: “persons who hear cases in a court of law” pronounce: “pronounce judgment on”
pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”
pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”
pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”
pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”
pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”
pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”
pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”
pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”
pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”
pronounce: “speak or utter in a certain way” bench: “a long seat for more than one person” sentence: “a string of words that satisfies grammar rules” the bench pronounced the sentence bench: “persons who hear cases in a court of law” sentence: “the penalty meted out to one adjudged guilty” pronounce: “pronounce judgment on”
Evaluation Data • Data from SENSEVAL-2 WSD exercise. • 4,328 passages, each 2-3 sentences long and containing 1 multi-sense target word. • Each target word labeled by humans with its most appropriate WordNet sense. • WSD algorithm’s output senses compared against these human labels. • Precision, recall, and f-measure reported.
Which WN Relations Help? • Evaluation with a single relation at a time • E.g., comparing only hypernyms, only hyponyms, etc. • Result: No single comparison is a big source of information. • No pair exceeded f-measure of 0.136, as compared to overall f-measure of 0.346
Which WN Relations Help? • Most helpful were: • Hyponym relation • kinds of “car” “compact”, “SUV”, “coupe”, etc. • Meronym relation • parts of “car” “accelerator”, “wheel”, “hood”, etc. • These relations are usually one-many. • Thus they give access to many glosses. • Implies: more glosses more useful.
Conclusions • We presented a new measure of semantic relatedness • Can operate across parts of speech. • We evaluated on the task of WSD. • Performed much better than the Lesk baseline • Performance comparable to other systems. • Future work: • Augment using corpus statistics. • Evaluate on different task.
Resources • WordNet::Similarity (relatedness measures) (http://search.cpan.org/dist/WordNet-Similarity) • Extended gloss overlaps • Resnik, Lin, Jiang-Conrath • Leacock-Chodorow, Hirst-St. Onge • Edge Counting, Random • SenseRelate (WSD using relatedness) (http://www.d.umn.edu/~tpederse/senserelate.html)