Survey on WSD and IR

Survey on WSD and IR Apex@SJTU

WSD: Introduction • Problems in online news retrieval system: query: “major” Articles retrieved: • about “Prime Minister John Major MP” • “major” appears as an adjective • “major” appears as a military rank

WSD: Introduction • Gale, Church and Yarowsky (1992) cite work dating back to 1950. • For many years, WSD was applied only to limited domains and a small vocabulary. • In recent years, disambiguators are applied to resolve the senses of words in a large heterogeneous corpus. • With a more accurate representation and a query also marked up with word sense, researchers believe that the accuracy of retrieval would have to improve.

Approaches to disambiguation • Disambiguation based on manually generated rules • Disambiguation using evidence from existing corpora.

Disambiguation based on manually generated rules • Weiss (1973): • general context rule: If the word “type” appears near to “print”, it most likely meant a small block of metal bearing a raised character on one end. • template rule: If “of” appears immediately after “type”, it most likely meant a subdivision of a particular kind of thing.

Weiss (1973): • Template rules were better, so replied them first. • To create rules: • Examine 20 occurrences of an ambiguous word. • Test these manually created rules on a further 30 occurrences. • Accuracy: 90% • Cause for errors: idiomatic uses.

Disambiguation based on manually generated rules • Kelly and Stone (1975): • created a set of rules for 6,000 words • consisted of contextual rules similar to those of Weiss • in addition, used grammatical category of a word as a strong indicator of sense: • “the train” and “to train”

Kelly and Stone (1975): • The grammar and context rules were grouped into sets so that only certain rules were applied in certain situations. • Conditional statements controlled the application of rule sets. • Unlike Weiss’s system, this disambiguator was designed to process a whole sentence at the same time. • Accuracy: not a success

Disambiguation based on manually generated rules • Small and Rieger (1982) came to similar conclusions. • When this type of disambiguator was extended to work on larger vocabulary, the effort involved in building it became too great. • Since 1980s, WSD research has concentrated on automatically generated rules based on sense evidence derived from a machine readable corpus.

Disambiguation using evidence from existing corpora • Lesk (1988): • Resolve the sense of “ash” in : There was ash from the coal fire. • Dictionary definition looked up: • ash(1): The soft grey powder that remains after something has been burnt. • ash(2): A forest tree common in Britain. • Definition of context words looked up: • coal(1): A black mineral which is dub from the earth, which can be burnt to given heat. • fire(1): The condition of burning; flames, light and great heat. • fire(2): The act of firing weapons or artillery at an enemy.

Lesk (1988): • Sense definitions are ranked by scoring function based on the number of words that co-occur. • Questionable: how often the word overlap necessary for disambiguation occurred. • Accuracy: “very brief experimentation”, 50%--70% • No analysis for the failure, although definition length is recognized as a possible factor in deciding which dictionary to use.

Disambiguation using evidence from existing corpora • Wilks et al. (1990): • addressed this word overlap problem by using a technique of expanding a dictionary definition with words that commonly co-occurred with the text of that definition. • Co-occurrence information was derived from all definition texts in the dictionary.

Wilks et al. (1990): • Longman’s Dictionary of Contemporary English (LDOCE): all its definitions were written using a simplified vocabulary of around 2,000 words. • Few synonyms, a distracting element in the co-occurrence calculation. • “bank”: • for economic sense: “money”, ”check”, ”rob” • for geographical sense: “river”, ”flood”, ”bridge” • Accuracy: “bank” in 200 sentences, judged correct if it coincides with one manually chosen, 53% at fine-grained level(13 senses) and 85% at coarse-grained(5 senses) level. • They suggested using simulated annealing to disambiguate a whole sentence simultaneously.

Disambiguating simultaneously • Cowie et al. (1992): • Accuracy: tested on 67 sentences, 47% for fine-grained senses while 72% for coarse-grained ones. • No comparison with Wilks et al.’s. • No baseline. A possible baseline: senses randomly chosen A better one: select the most common sense

Manually tagging a corpus • A technique in POS tagging: • manually mark up a large text corpus with POS tag, and then train a statistical classifier to associate features with occurrences of the tags. • Ng and Lee (1996): • disambiguate 192,000 occurrences of 191 words. • examine the following features: • POS and morphological form of the sense tagged word • unordered set of its surrounding words • local collocations relative to it • and if the sense tagged word was a noun, the presence of a verb was noted also.

Ng and Lee (1996): • Experiments: • separated their corpus into training and test sets on an 89%--11% split • accuracy: 63.7% (baseline: 58.1%) • sense definition used were from WordNet, 7.8 senses per word for nouns and 12.0 senses for verbs • no comparison possible between WordNet definition or LDOCE

Using thesauri: Yarowsky (1992) • Roget’s thesaurus: 1,042 semantic categories • Grolier Multimedia Encyclopedia To decide which semantic category an ambiguous word occurrence should be assigned: • a set of clue words, one set for each category, was derived from a POS tagged corpus • the context of each occurrence was gathered • a term selection process similar to relevance feedback was used to derive clue words

Yarowsky (1992) • Eg. clue words for animal/insects: species, family bird, fish, cm, animal, tail, egg, wild, common, coat, female, inhabit, eat, nest • Comparison between words in the context and the clue word sets • Accuracy: 12 ambiguous words, several hundred occurrences, 92% of accuracy on average • Comparison were suspect.

Testing disambiguators • Few “pre-disambiguated” test corpora publicly available. • A sense tagged version of the Brown corpus, called SEMCOR, is available. Trec-like effort underway, called SENSEVAL.

WSD and IR experiments • Voorhees (1993): based on WordNet: • Each of 90,000 words and phrases is assigned to one or more synsets. • A synset is a set of words that are synonyms of each other; the words of a synset define it and its meaning. • All synsets are linked together to form a mostly hierarchical semantic network based on hypernymy and hyponymy. • Other relations: meronymy, holonymy, antonymy.

Voorhees (1993): • the hood of a word sense contained in synset s: • largest connected sub graph; • contains s; • contains only descendants of an ancestor of s • contains no synset that has a descendent that includes another instance of a member of s. • Consistently worse, tagging sense inaccurately

The hood of the first sense of “house” would include the words: housing, lodging, apartment, flat, cabin, gatehouse, bungalow, cottage.

Wallis (1993) • replace words with definitions from LDOCE. • “ocean” and “sea”: ocean: The great mass of salt water that covers most of the earth; sea: the great body of salty water that covers much of the earth’s surface. • disappointing results. • no analysis of the cause.

Sussna (1993) • Assign a weight to all relations and calculate the semantic distance between two synsets. • Calculate semantic distance between context words and each of the the synsets to rank the synsets. • Parameters: size of context (41 as optimal), the number of words (only 10 because of computation consideration) disambiguated simultaneously. • Accuracy: 56%

Analyses of WSD & IR • Krovetz & Croft: sense mismatches were significantly more likely to occur in non-relevant documents. • word collocation • skewed frequency distribution • Situations under which WSD may prove useful: • where collocation is less prevalent • where query words were used in a minority sense

Analyses of WSD & IR • Sanderson (1994,1997): • pseudo-words: banana/kalashnikov/anecdote • experiments on the factor of query length: effectiveness of retrievals based on short query was greatly affected by the introduction of ambiguity but much less so for longer queries.

Analyses of WSD & IR • Gonzalo et al. (1998): experiments based on SEMCOR, write a summary for each document and use it as a query, which is related with only one relevant document. • Cause for error: sense may be too specific newspaper as a business concern as opposed to the physical object

Gonzalo et al. (1998): • synset based representation: retrieval based on synset seems to be the best • erroneous disambiguation and its impact on retrieval effectiveness: baseline precision: 52.6% when error 30%, precision 54.4% when error 60%, precision 49.1%

Sanderson (1997): • output word sense in a list ranked by a confidence score • accuracy: worse than the one without sense, better than the one tagged with one sense. • possible cause: errors.

Disambiguation without sense definition • Zernik (1991): • generate cluster for an ambiguous word by three criteria: context words, grammatical category and derivational morphology. • associate the cluster with a dictionary sense. eg. “train”: 95% of accuracy, grammatical category “office”: full of error

Disambiguation without sense definition Schutze and Pederson (1995): Very few of the results which show 14% improvement • Cluster based on context words only: words with similar context are put into the same cluster, but recognized as a cluster if only the context appears more than fifty time sin corpus • Similar context of “ball”: tennis, football, cricket. Thus this method breaks up a word’s commonest sense into a number of uses (the sporting sense of ball).

Schutze and Pederson (1995): • score each use of a word • representing a word occurrence by • just the word • word with its commonest use • word with n of its uses

WSD in IR Revisited sigir’03 • Skewed frequency distributions coupled with the query term co-occurrence effect are the reasons why traditional IR techniques that don’t take sense into account are not penalized severely. • The impact of inaccurate fine grained WSD has an extreme negative effect on the performance of an IR system. • To achieve increases in performance, it is imperative to minimize the impact of the inaccurate disambiguation. • The need for 90% accurate disambiguation in order to see performance increases remains questionable.

The WSD methods applied • A number of experiments were tried, but nothing better than the following was found: applying each of knowledge source (collocations, co-occurrence, and sense frequency) in a stepwise fashion: • a context window consisting of the sentence surrounding the target word to identify sense of the word • examine the surrounding sentence if it contained any collocates we have observed from Semcor • specific sense data

WSD in IR Revisited: Conclusions • Reasons for success: high precision WSD technique sense frequency statistics • Resilience of vector space model • Analysis for Schutze and Pederson’s success: added tolerance

“A highly accurate bootstrapping algorithm for word sense disambiguation” Rada M. 2000 Disambiguate all nouns and verbs: • step 1: complex nominals • step 2: name entity • step 3: word pairs, based on SEMCOR (previous word, word) pair, (word, successive word) pair • step 4: context, based on SEMCOR and WordNet in WordNet, hypernym are also its context

“A highly accurate bootstrapping algorithm for word sense disambiguation” (cont’d) • step 5: words with semantic distance 0 from some words which has already been disambiguated • step 6: words with semantic distance 1 from some words which has already been disambiguated • step 7: words with semantic distance 0 among ambiguous words • step 8: words with semantic distance 1 among ambiguous words

“An Effective Approach to Document Retrieval via Utilizing WordNet and Recognizing Phrases” sigir 04 • Significant increase for short query • Only WSD on Query and Query Expansion • Phrase-based and Term-based • PSEUDO-RELEVANCE

Phrases identification • 4 types of phrases: Proper names (Name Entity), Dictionary Phrases( by WordNet), a simple phrases, a complex phrase • Decide windows size of simple/complex phrases by calculate correlation

Correlation

WSD • Unlike Rada Miha’s WSD, Liu didn’t utilize Semcor, only utilize WORDNET • 6 step, basic ideas, by hyper, hypo, cross-reference,etc

Query Expansion • Add Synonyms(conditional) • Add Definition Words( only first shortest noun phrase) conditional if it is highly globally correlated • Add Hyponyms(conditional) • Add Compound Word(conditional)

PSEUDO RELEVANCE FEEDBACK • Using Global Correlations and Wordnet • Global_cor>1 and one of two conditions: • 1: monosense • 2:its defintion contains some other query terms • 3.it is in top10 ranked documents • Combining Local and Global Correlations:

Results • SO: standard Okapi (term-similarity) • NO: enhanced SO • NO+P: +phrase-similarity • NO+P+D: +WSD • NO+P+D+F: +Pseudo-feedback

Results:

Model conclusion • WSD query only • WSD only by Wordnet, no semcor • Query Complicate Expansion • Pseudo-relevance feedback • Phrases and term-based

Thank you!

Survey on WSD and IR

Survey on WSD and IR

Presentation Transcript

Audit on IR 3.3.09

Survey on ICA

Papers on Parallel IR

Word Sense Disambiguation (WSD)

Carnegie IR Survey Pipeline

Three Approaches to Unsupervised WSD

Results on IR and SBR samples

Term Burstiness in WSD and Pseudo Relevance Feedback

Giving exposure to your IR and content on your IR

Source: IISI / WSD

A method for WSD on Unrestricted Text

Using WordNet and WSD in Conceptual Query Expansion

Wordpress GDPR Plugin - WebSystems - WSD

WSD Special Programs

WSD for Applications

SENSEVAL: Evaluating WSD Systems