1 / 29

The WordNet Lexical Database

The WordNet Lexical Database. Bernardo Magnini ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica Trento - Italy. Outline. WordNet: introduction Extending WordNet Languages other than English New information WordNet as a (linguistic) ontology Using WordNet

Download Presentation

The WordNet Lexical Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The WordNet Lexical Database Bernardo Magnini ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica Trento - Italy

  2. Outline • WordNet: introduction • Extending WordNet • Languages other than English • New information • WordNet as a (linguistic) ontology • Using WordNet • Word sense disambiguation • Information Retrieval/ Question Answering • Semantic Web

  3. WordNet • Electronic Lexical Database for the English language realized at Princeton University by George Miller’s team • Based on psycholinguistic theories • Several releases: from version 1.0 in 1991 to version 1.7.1 in 2001 • WordNet 2 (??) • WordNet is a public domain resource http://www.cogsci.princeton.edu/~wn/ Fellbaum C. (Ed.): WordNet, an Electronic Lexical Database, MIT Press, 1998 • Global WordNet Association (GWA) • Conference, workshops

  4. Word Forms Word Meanings F1 F2 F3 … Fn E1,1 E1,2 E2,2 E3,3 Em,n M1 M2 M3 … Mm . . . Lexical Matrix • Mappings between word forms and meanings are many:many • F1 and F2 are synonyms • F2 is polysemous

  5. Basic Primitives • Word forms: lexical items in a language (i.e. no artificial concepts), including collocations • Senses: a meaning of a word form • Synsets: a set of synonym senses • Relations: • Lexical: among senses • Semantic: among synsets

  6. Lexical Relations • Synonymy • Two expression are synonymous if the substitution of one for the other does not alter the truth value of the sentence (Leibniz) • => need to partition WordNet into nouns, verbs, adjectives, and adverbs • Antonymy ex. [rich/poor] [rise/fall] • The antonym of a word x is sometimes not-x, but not always: not rich ≠> poor • Main organization principle for the adjectives

  7. Semantic relations (1) • Hyponymy/Hyperonymy (the ISA relation) A synset {x1, x2, … } is an hyponym of the synset {y1, y2, …} if native speakers accept sentences such as An x is a (kind of) y • Transitive and asimmetrical • WordNet is a graph, even if normally synsets have a single hyperonym • Main organization principle of nouns

  8. Semantic relations (2) • Meronymy/Holonymy(the Part-Of relation) A synset {x1, x2, … } is a meronym of the synset {y1, y2, …} if native speakers accept sentences such as An x is a part of y or A y has an x (as a part) • Meronymy is transitive and asimmetrical and can be used to construct a part hierarchy

  9. Semantic relations (3) • Peculiar semantic relations in the verb hierarchy • Troponym: a verb expressing a specific manner elaboration of another verb (e.g. walk  move) X is a troponym of Y if to X is to Y in some manner or Y is a particular way to X • Entailment: a verb X entails Y if X cannot be done unless Y is or has been done (e.g. snore  sleep)

  10. An Example

  11. WordNet 1.7.1

  12. SemCor • English, part of the Brown Corpus • 700,000 running words, annotated with Part of Speech • 200,000 words annotated with WordNet senses (and lemmas)

  13. WordNet Extensions • Computational needs: • WordNets for languages other than English • New semantic relations • WordNet as an Ontology • Domain specific wordnets • Automatic acquisition of information • Interchange formats

  14. Languages other than English • EuroWordNetproject: monolingual wordnets are connected through an Interlingual Index (ILI) – Distributed by ELDA/ELRA • Italian, Spanish, Catalan, Basque, French, Estonian, Portuguese, Swedish, Dutch, German, • Balkanet Project: Bulgarian, Greek, Romanian, Slovenian • Danish, Hebrew • Chinese, some Indian languages • Lexical gaps

  15. New Relations (1) • Derivation relations(Princeton – WordNet-2) • Invent  inventor (need of disambiguation) • Gloss disambiguation (Extended WordNet – Moldovan 2000) • Glosses are parsed, disambiguated and converted in a logical form • WordNet Domains (Magnini, Cavaglia, 2000) (ITC-irst) • Synsets are labeled with domains, such as Medicine, Architecture, Sport, …

  16. WordNet Domains • Integrate taxonomic and domain oriented information • Cross hierarchy relations • doctor#2 [Medicine] --> person#1 • hospital#1 [Medicine] --> location#1 • Cross category relations: operate#3 [Medicine] • Cross language information

  17. New Relations (2) • Classes versus Instances: • Bush<belong-to-class> person • Role relations for verbs: • singer <role-agent>song • Implicit knowledge(Peters, 2002) • Discover regular polysemy relations in WordNet: Bank#1 (an istitution) bank#2 (a building)

  18. Automatic Acquisition • MEANING project (IST-2001-34460) • Topic Signatures (Aguirre, 2001) • Synset related words automatically extracted from the Web • Automatic collection of sense examples (Leacock et al. 98, Mihalcea and Moldovan 99) • Synsets Selectional Preferences (Carrol, 2001) • From the BNC corpus • WordNet Annotated corpora • Open Mind Word Expert (Mihalcea, 2002)

  19. WordNet as an Ontology • Some relations contradict ontological principles • OntoClean approach (Guarino, 2002): • Confusion between concepts and individuals (e.g. Palestine and Trust_Territories at the same level) • Role/Type: a role cannot subsume a type (e.g. Person <isa> Causal_agent

  20. Domain Specific WordNets • Extension of WordNet hierarchies using domain-specific document collections (Vossen, 2001) (Buitelaar, 2001) (Velardi, 2001) • Tuning of WordNet synsets (Turcato, 2000) • Merging generic and specialized wordnets (Magnini et al. 2002): • Overlaps and inconsistencies among sysnsets • Precedence rules for inheritance

  21. Interchange Formats • XML: • Implementation independent • Easily extensible to new relations • there are at least three different versions; none of them is yet much used • Mappings among different wordnet versions: • 1.5  1.6 • 1.6  1.7 • May contain errors

  22. Using WordNet • Large diffusion within the Natural Language Processing community • Suitable for open-domain, content-based tasks where interpretation based on lexical semantics is required • Algorithms: take advantage of the wordnet semantic relations • Issues: fine grained sense distinctions • Applicative areas: Query expansion in IR, Word Sense Disambiguation, Question Answering

  23. Distance/Similarity Algorithms • Conceptual distance (Agirre-Rigau, 1995) • Consider the density of the taxonomy • Semantic similarity (Resnik, 1995) • The node with the higher information content connecting two nodes Sim(c1, c2) = max [-log p(c)] Where c is a node on a isa-path connecting c1 and c2 And p(c) is a probability computed considering the occurrence of c in a corpus.

  24. Sense Distinctions • In WordNet there are sense distinctions difficult to understand • Many applications would benefit from polysemy reduction • Sense clustering methodologies: • Based on domain information • Based on aligned corpora in different languages

  25. WordNet and Word Sense Disambiguation • As a sense repository • For the SENSEVAL competition • Manual annotated data are required for training systems based on machine learning algorithms • As an information source for knowledge-based algorithms

  26. IR: Query Expansion • Open debate: • Semantic information is not useful (Voorhees, 1994) • WSD with performance < 90% decrease IR results (Sanderson, 1994); current WSD systems perform less then 80% • Semantic information significantly increases the IR performances (up to 30%) (Gonzalo, 1998) • Recent experiments (de Luopy, 2002) show that using synonyms and WSD (72% accuracy) in query expansion slightly (2-3%) improve performances

  27. WordNet in Question/Answering • Answer type identification(Harabagiu, 2001: top score at TREC-QA-2000); • Answer types defined on the WordNet taxonomy • Answer extraction • Named entities recognition based on WordNet Question/answer relation discovery in passage retrieval (Pasca, 2001)

  28. Semantic Web • Interpreting semi-structured knowledge sources • Directories, file systems, catalogues • Implicit knowledge • Linguistic analysis of labels based on WordNet

  29. Conclusions • WordNet as a linguistic ontology • Using WordNet, as it is, in applicative tasks is not easy: “The art of using WordNet” • Extensions, such as domains, multilingual wordnets, etc., are required • Still preliminary results in IR, QA, WSD • Good news: a more and more large community is using WordNet

More Related