Word Association Thesaurus as a Resource for Building Wordnet. Anna Sinopalnikova Masaryk University, Brno, Czech Republic Saint-Petersburg State University, Russia [email protected] Overview . Types of LRs used What is Word Association? Information to be extracted from WAT WAT vs. Corpus
Masaryk University, Brno, Czech Republic
Saint-Petersburg State University, Russia
Primary resources Wordnet
e.g. text corpora
present (more or less) ‘raw’ data on the language in use
information is given implicitly
e.g. explanatory dictionaries, Roget type thesauri
present explications of internal knowledge of language
based on primary resources + intuition
information is given explicitlyWhat kind of language resources are used to build wordnets?
(of 100 people asked)
E.g. Kent & Rosanoff (1910) 100 stimuli - 1000 subjects
Palermo & Jenkins (1964) 200 stimuli - 1000 subjects
Russian: ‘man’, ‘house’, ‘love’, ‘life’, ‘be/eat’, ‘think’, ‘live’, ‘go’, ‘big/large’, ‘good’, ‘bad’, ‘no/not’...
295 words with more then 100 relations
English: man, sex, no (not), love, house; work, eat, think, go, live; good, old, small…
586 words with more then 100 relations
E.g. Cat -> black, Cheshire, pussy;
Cat -> mat, nip, purr
Cf. text corpora
E.g. Cat-> dog, mouse, animal, pet;
Cat-> eyes, claw
E.g. buy CONVERSIVE sell, while cry INVOLVED_AGENT baby.
Wetter & Rapp (1996), Willners (2001): Correlation between frequency of word X and word Y co-occurrence in a corpus and strength of association word X-word Y in WAT.
Coverage: 64% word associations do not occur in the corpus
Table 1. Distribution of word associations that do not occur in the corpus.
NB! Mostly it’s Syntagmatic WA that are missing, not paradigmatic ones
WAT is equal to or excels other LRs in several respects.
WAT is comparable to a balanced text corpus, and could supply all the necessary empirical information in case of absence of the latter.
Thank you… syntagmatic associations