Survey on WSD and IR. Apex@SJTU. WSD: Introduction. Problems in online news retrieval system: query: “major” Articles retrieved: about “Prime Minister John Major MP” “major” appears as an adjective “major” appears as a military rank. WSD: Introduction.
If the word “type” appears near to “print”, it most likely meant a small block of metal bearing a raised character on one end.
If “of” appears immediately after “type”, it most likely meant a subdivision of a particular kind of thing.
There was ash from the coal fire.
A possible baseline: senses randomly chosen
A better one: select the most common sense
To decide which semantic category an ambiguous word occurrence should be assigned:
species, family bird, fish, cm, animal, tail, egg, wild, common, coat, female, inhabit, eat, nest
based on WordNet:
ocean: The great mass of salt water that covers most of the earth;
sea: the great body of salty water that covers much of the earth’s surface.
effectiveness of retrievals based on short query was greatly affected by the introduction of ambiguity but much less so for longer queries.
newspaper as a business concern as opposed to the physical object
retrieval based on synset seems to be the best
baseline precision: 52.6%
when error 30%, precision 54.4%
when error 60%, precision 49.1%
“train”: 95% of accuracy, grammatical category
“office”: full of error
Schutze and Pederson (1995): Very few of the results which show 14% improvement
high precision WSD technique
sense frequency statistics
Disambiguate all nouns and verbs:
(previous word, word) pair, (word, successive word) pair
in WordNet, hypernym are also its context