survey on wsd and ir l.
Skip this Video
Loading SlideShow in 5 Seconds..
Survey on WSD and IR PowerPoint Presentation
Download Presentation
Survey on WSD and IR

Loading in 2 Seconds...

play fullscreen
1 / 47

Survey on WSD and IR - PowerPoint PPT Presentation

  • Uploaded on

Survey on WSD and IR. Apex@SJTU. WSD: Introduction. Problems in online news retrieval system: query: “major” Articles retrieved: about “Prime Minister John Major MP” “major” appears as an adjective “major” appears as a military rank. WSD: Introduction.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Survey on WSD and IR

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
wsd introduction
WSD: Introduction
  • Problems in online news retrieval system:

query: “major”

Articles retrieved:

        • about “Prime Minister John Major MP”
        • “major” appears as an adjective
        • “major” appears as a military rank
wsd introduction3
WSD: Introduction
  • Gale, Church and Yarowsky (1992) cite work dating back to 1950.
  • For many years, WSD was applied only to limited domains and a small vocabulary.
  • In recent years, disambiguators are applied to resolve the senses of words in a large heterogeneous corpus.
  • With a more accurate representation and a query also marked up with word sense, researchers believe that the accuracy of retrieval would have to improve.
approaches to disambiguation
Approaches to disambiguation
  • Disambiguation based on manually generated rules
  • Disambiguation using evidence from existing corpora.
disambiguation based on manually generated rules
Disambiguation based on manually generated rules
  • Weiss (1973):
      • general context rule:

If the word “type” appears near to “print”, it most likely meant a small block of metal bearing a raised character on one end.

      • template rule:

If “of” appears immediately after “type”, it most likely meant a subdivision of a particular kind of thing.

weiss 1973
Weiss (1973):
  • Template rules were better, so replied them first.
  • To create rules:
      • Examine 20 occurrences of an ambiguous word.
      • Test these manually created rules on a further 30 occurrences.
  • Accuracy: 90%
  • Cause for errors: idiomatic uses.
disambiguation based on manually generated rules7
Disambiguation based on manually generated rules
  • Kelly and Stone (1975):
      • created a set of rules for 6,000 words
      • consisted of contextual rules similar to those of Weiss
      • in addition, used grammatical category of a word as a strong indicator of sense:
        • “the train” and “to train”
kelly and stone 1975
Kelly and Stone (1975):
  • The grammar and context rules were grouped into sets so that only certain rules were applied in certain situations.
  • Conditional statements controlled the application of rule sets.
  • Unlike Weiss’s system, this disambiguator was designed to process a whole sentence at the same time.
  • Accuracy: not a success
disambiguation based on manually generated rules9
Disambiguation based on manually generated rules
  • Small and Rieger (1982) came to similar conclusions.
  • When this type of disambiguator was extended to work on larger vocabulary, the effort involved in building it became too great.
  • Since 1980s, WSD research has concentrated on automatically generated rules based on sense evidence derived from a machine readable corpus.
disambiguation using evidence from existing corpora
Disambiguation using evidence from existing corpora
  • Lesk (1988):
      • Resolve the sense of “ash” in :

There was ash from the coal fire.

      • Dictionary definition looked up:
        • ash(1): The soft grey powder that remains after something has been burnt.
        • ash(2): A forest tree common in Britain.
      • Definition of context words looked up:
        • coal(1): A black mineral which is dub from the earth, which can be burnt to given heat.
        • fire(1): The condition of burning; flames, light and great heat.
        • fire(2): The act of firing weapons or artillery at an enemy.
lesk 1988
Lesk (1988):
  • Sense definitions are ranked by scoring function based on the number of words that co-occur.
  • Questionable: how often the word overlap necessary for disambiguation occurred.
  • Accuracy: “very brief experimentation”,


  • No analysis for the failure, although definition length is recognized as a possible factor in deciding which dictionary to use.
disambiguation using evidence from existing corpora12
Disambiguation using evidence from existing corpora
  • Wilks et al. (1990):
  • addressed this word overlap problem by using a technique of expanding a dictionary definition with words that commonly co-occurred with the text of that definition.
  • Co-occurrence information was derived from all definition texts in the dictionary.
wilks et al 1990
Wilks et al. (1990):
  • Longman’s Dictionary of Contemporary English (LDOCE): all its definitions were written using a simplified vocabulary of around 2,000 words.
  • Few synonyms, a distracting element in the co-occurrence calculation.
  • “bank”:
      • for economic sense: “money”, ”check”, ”rob”
      • for geographical sense: “river”, ”flood”, ”bridge”
      • Accuracy: “bank” in 200 sentences, judged correct if it coincides with one manually chosen, 53% at fine-grained level(13 senses) and 85% at coarse-grained(5 senses) level.
  • They suggested using simulated annealing to disambiguate a whole sentence simultaneously.
disambiguating simultaneously
Disambiguating simultaneously
  • Cowie et al. (1992):
      • Accuracy: tested on 67 sentences, 47% for fine-grained senses while 72% for coarse-grained ones.
      • No comparison with Wilks et al.’s.
      • No baseline.

A possible baseline: senses randomly chosen

A better one: select the most common sense

manually tagging a corpus
Manually tagging a corpus
  • A technique in POS tagging:
    • manually mark up a large text corpus with POS tag, and then train a statistical classifier to associate features with occurrences of the tags.
  • Ng and Lee (1996):
    • disambiguate 192,000 occurrences of 191 words.
    • examine the following features:
      • POS and morphological form of the sense tagged word
      • unordered set of its surrounding words
      • local collocations relative to it
      • and if the sense tagged word was a noun, the presence of a verb was noted also.
ng and lee 1996
Ng and Lee (1996):
  • Experiments:
      • separated their corpus into training and test sets on an 89%--11% split
      • accuracy: 63.7% (baseline: 58.1%)
      • sense definition used were from WordNet, 7.8 senses per word for nouns and 12.0 senses for verbs
      • no comparison possible between WordNet definition or LDOCE
using thesauri yarowsky 1992
Using thesauri: Yarowsky (1992)
  • Roget’s thesaurus: 1,042 semantic categories
  • Grolier Multimedia Encyclopedia

To decide which semantic category an ambiguous word occurrence should be assigned:

    • a set of clue words, one set for each category, was derived from a POS tagged corpus
    • the context of each occurrence was gathered
    • a term selection process similar to relevance feedback was used to derive clue words
yarowsky 1992
Yarowsky (1992)
  • Eg. clue words for animal/insects:

species, family bird, fish, cm, animal, tail, egg, wild, common, coat, female, inhabit, eat, nest

  • Comparison between words in the context and the clue word sets
  • Accuracy: 12 ambiguous words, several hundred occurrences, 92% of accuracy on average
  • Comparison were suspect.
testing disambiguators
Testing disambiguators
  • Few “pre-disambiguated” test corpora publicly available.
  • A sense tagged version of the Brown corpus, called SEMCOR, is available. Trec-like effort underway, called SENSEVAL.
wsd and ir experiments
WSD and IR experiments
  • Voorhees (1993):

based on WordNet:

      • Each of 90,000 words and phrases is assigned to one or more synsets.
      • A synset is a set of words that are synonyms of each other; the words of a synset define it and its meaning.
      • All synsets are linked together to form a mostly hierarchical semantic network based on hypernymy and hyponymy.
      • Other relations: meronymy, holonymy, antonymy.
voorhees 1993
Voorhees (1993):
  • the hood of a word sense contained in synset s:
      • largest connected sub graph;
      • contains s;
      • contains only descendants of an ancestor of s
      • contains no synset that has a descendent that includes another instance of a member of s.
  • Consistently worse, tagging sense inaccurately
The hood of the first sense of “house” would include the words: housing, lodging, apartment, flat, cabin, gatehouse, bungalow, cottage.
wallis 1993
Wallis (1993)
  • replace words with definitions from LDOCE.
  • “ocean” and “sea”:

ocean: The great mass of salt water that covers most of the earth;

sea: the great body of salty water that covers much of the earth’s surface.

  • disappointing results.
  • no analysis of the cause.
sussna 1993
Sussna (1993)
  • Assign a weight to all relations and calculate the semantic distance between two synsets.
  • Calculate semantic distance between context words and each of the the synsets to rank the synsets.
  • Parameters: size of context (41 as optimal), the number of words (only 10 because of computation consideration) disambiguated simultaneously.
  • Accuracy: 56%
analyses of wsd ir
Analyses of WSD & IR
  • Krovetz & Croft: sense mismatches were significantly more likely to occur in non-relevant documents.
    • word collocation
    • skewed frequency distribution
  • Situations under which WSD may prove useful:
    • where collocation is less prevalent
    • where query words were used in a minority sense
analyses of wsd ir26
Analyses of WSD & IR
  • Sanderson (1994,1997):
    • pseudo-words: banana/kalashnikov/anecdote
    • experiments on the factor of query length:

effectiveness of retrievals based on short query was greatly affected by the introduction of ambiguity but much less so for longer queries.

analyses of wsd ir27
Analyses of WSD & IR
  • Gonzalo et al. (1998): experiments based on SEMCOR, write a summary for each document and use it as a query, which is related with only one relevant document.
  • Cause for error: sense may be too specific

newspaper as a business concern as opposed to the physical object

gonzalo et al 1998
Gonzalo et al. (1998):
  • synset based representation:

retrieval based on synset seems to be the best

  • erroneous disambiguation and its impact on retrieval effectiveness:

baseline precision: 52.6%

when error 30%, precision 54.4%

when error 60%, precision 49.1%

sanderson 1997
Sanderson (1997):
  • output word sense in a list ranked by a confidence score
  • accuracy: worse than the one without sense, better than the one tagged with one sense.
  • possible cause: errors.
disambiguation without sense definition
Disambiguation without sense definition
  • Zernik (1991):
    • generate cluster for an ambiguous word by three criteria: context words, grammatical category and derivational morphology.
    • associate the cluster with a dictionary sense.


“train”: 95% of accuracy, grammatical category

“office”: full of error

disambiguation without sense definition31
Disambiguation without sense definition

Schutze and Pederson (1995): Very few of the results which show 14% improvement

  • Cluster based on context words only: words with similar context are put into the same cluster, but recognized as a cluster if only the context appears more than fifty time sin corpus
  • Similar context of “ball”: tennis, football, cricket. Thus this method breaks up a word’s commonest sense into a number of uses (the sporting sense of ball).
schutze and pederson 1995
Schutze and Pederson (1995):
  • score each use of a word
  • representing a word occurrence by
    • just the word
    • word with its commonest use
    • word with n of its uses
wsd in ir revisited sigir 03
WSD in IR Revisited sigir’03
  • Skewed frequency distributions coupled with the query term co-occurrence effect are the reasons why traditional IR techniques that don’t take sense into account are not penalized severely.
  • The impact of inaccurate fine grained WSD has an extreme negative effect on the performance of an IR system.
  • To achieve increases in performance, it is imperative to minimize the impact of the inaccurate disambiguation.
  • The need for 90% accurate disambiguation in order to see performance increases remains questionable.
the wsd methods applied
The WSD methods applied
  • A number of experiments were tried, but nothing better than the following was found: applying each of knowledge source (collocations, co-occurrence, and sense frequency) in a stepwise fashion:
    • a context window consisting of the sentence surrounding the target word to identify sense of the word
    • examine the surrounding sentence if it contained any collocates we have observed from Semcor
    • specific sense data
wsd in ir revisited conclusions
WSD in IR Revisited: Conclusions
  • Reasons for success:

high precision WSD technique

sense frequency statistics

  • Resilience of vector space model
  • Analysis for Schutze and Pederson’s success: added tolerance
a highly accurate bootstrapping algorithm for word sense disambiguation rada m 2000
“A highly accurate bootstrapping algorithm for word sense disambiguation” Rada M. 2000

Disambiguate all nouns and verbs:

  • step 1: complex nominals
  • step 2: name entity
  • step 3: word pairs, based on SEMCOR

(previous word, word) pair, (word, successive word) pair

  • step 4: context, based on SEMCOR and WordNet

in WordNet, hypernym are also its context

a highly accurate bootstrapping algorithm for word sense disambiguation cont d
“A highly accurate bootstrapping algorithm for word sense disambiguation” (cont’d)
  • step 5: words with semantic distance 0 from some words which has already been disambiguated
  • step 6: words with semantic distance 1 from some words which has already been disambiguated
  • step 7: words with semantic distance 0 among ambiguous words
  • step 8: words with semantic distance 1 among ambiguous words
an effective approach to document retrieval via utilizing wordnet and recognizing phrases sigir 04
“An Effective Approach to Document Retrieval via Utilizing WordNet and Recognizing Phrases” sigir 04
  • Significant increase for short query
  • Only WSD on Query and Query Expansion
  • Phrase-based and Term-based
phrases identification
Phrases identification
  • 4 types of phrases: Proper names (Name Entity), Dictionary Phrases( by WordNet), a simple phrases, a complex phrase
  • Decide windows size of simple/complex phrases by calculate correlation
  • Unlike Rada Miha’s WSD, Liu didn’t utilize Semcor, only utilize WORDNET
  • 6 step, basic ideas, by hyper, hypo, cross-reference,etc
query expansion
Query Expansion
  • Add Synonyms(conditional)
  • Add Definition Words( only first shortest noun phrase) conditional if it is highly globally correlated
  • Add Hyponyms(conditional)
  • Add Compound Word(conditional)
pseudo relevance feedback
  • Using Global Correlations and Wordnet
  • Global_cor>1 and one of two conditions:
  • 1: monosense
  • 2:its defintion contains some other query terms
  • is in top10 ranked documents
  • Combining Local and Global Correlations:
  • SO: standard Okapi (term-similarity)
  • NO: enhanced SO
  • NO+P: +phrase-similarity
  • NO+P+D: +WSD
  • NO+P+D+F: +Pseudo-feedback
model conclusion
Model conclusion
  • WSD query only
  • WSD only by Wordnet, no semcor
  • Query Complicate Expansion
  • Pseudo-relevance feedback
  • Phrases and term-based