Wordnet eurowordnet global wordnet
This presentation is the property of its rightful owner.
Sponsored Links
1 / 117

Wordnet, EuroWordNet, Global Wordnet PowerPoint PPT Presentation


  • 129 Views
  • Uploaded on
  • Presentation posted in: General

Wordnet, EuroWordNet, Global Wordnet. Piek Vossen [email protected] http://www.globalwordnet.org. Overview. Princeton WordNet (1980 - ongoing) EuroWordNet (1996 - 1999) The database design The general building strategy Towards a universal index of meaning

Download Presentation

Wordnet, EuroWordNet, Global Wordnet

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Wordnet eurowordnet global wordnet

Wordnet, EuroWordNet, Global Wordnet

Piek Vossen

[email protected]

http://www.globalwordnet.org


Overview

Overview

  • Princeton WordNet (1980 - ongoing)

  • EuroWordNet (1996 - 1999)

    • The database design

    • The general building strategy

    • Towards a universal index of meaning

  • Global WordNet Association (2001 - ongoing)

    • Other wordnets

    • BalkaNet (2001 - 2004)

    • IndoWordnet (2002 - ongoing)

    • Meaning (2002 - 2005)


Wordnet1 5

WordNet1.5

  • Developed at Princeton by George Miller and his team as a model of the mental lexicon.

  • Semantic network in which concepts are defined in terms of relations to other concepts.

  • Structure:

    • organized around the notion of synsets (sets of synonymous words)

    • basic semantic relations between these synsets

  • Initially no glosses

  • Main revision after tagging the Brown corpus with word meanings: SemCor.

  • http://www.cogsci.princeton.edu/~wn/w3wn.html


  • Structure of wordnet1 5

    Structure of WordNet1.5


    Eurowordnet

    EuroWordNet

    • The development of a multilingual database with wordnets for several European languages

    • Funded by the European Commission, DG XIII, Luxembourg as projects LE2-4003 and LE4-8328

    • March 1996 - September 1999

    • 2.5 Million EURO.

    • URL: http://www.hum.uva.nl/~ewn


    Objectives of eurowordnet

    Objectives of EuroWordNet

    • Languages covered:

      • EuroWordNet-1 (LE2-4003): English, Dutch, Spanish, Italian

      • EuroWordNet-2 (LE4-8328): German, French, Czech, Estonian.

    • Size of vocabulary:

      • EuroWordNet-1: 30,000 concepts - 50,000 word meanings.

      • EuroWordNet-2: 15,000 concepts- 25,000 word meaning.

    • Type of vocabulary:

      • the most frequent words of the languages

      • all concepts needed to relate more specific concepts


    Consortium

    Consortium


    The basic principles of eurowordnet

    The basic principles of EuroWordNet

    • the structure of the Princeton WordNet

    • the design of the EuroWordNet database

    • wordnets as language-specific structures

    • the language-internal relations

    • the multilingual relations


    Specific features of eurowordnet

    Specific features of EuroWordNet

    • it contains semantic lexicons for other languages than English.

    • each wordnet reflects the relations as a language-internal system, maintaining cultural and linguistic differences in the wordnets.

    • it contains multilingual relations from each wordnet to English meanings, which makes it possible to compare the wordnets, tracking down inconsistencies and cross-linguistic differences.

    • each wordnet is linked to a language independent top-ontology and to domain labels.


    Autonomous language specific

    object

    artifact, artefact

    (a man-made object)

    natural object (an

    object occurring

    naturally)

    block

    instrumentality

    body

    box

    spoon

    bag

    device

    implement

    container

    tool

    instrument

    Autonomous & Language-Specific

    Wordnet1.5

    Dutch Wordnet

    voorwerp

    {object}

    blok

    {block}

    lichaam

    {body}

    werktuig{tool}

    bak

    {box}

    lepel

    {spoon}

    tas

    {bag}


    Differences in structure

    Differences in structure

    • Artificial Classes versus Lexicalized Classes:

    • instrumentality; natural object

    • Lexicalization differences of classes:

    • container and artifact (object) are not lexicalized in Dutch

    • What is the purpose of different hierarchies?

    • Should we include all lexicalized classes from all (8) languages?


    Linguistic versus conceptual ontologies

    Linguistic versus Conceptual Ontologies

    • Conceptual ontology:

    • A particular level or structuring may be required to achieve a better control or performance, or a more compact and coherent structure.

      • introduce artificial levels for concepts which are not lexicalized in a language (e.g. instrumentality, hand tool),

      • neglect levels which are lexicalized but not relevant for the purpose of the ontology (e.g. tableware, silverware, merchandise).

    • What properties can we infer for spoons?

    • spoon -> container; artifact; hand tool; object; made of metal or plastic; for eating, pouring or cooking


    Linguistic versus conceptual ontologies1

    Linguistic versus Conceptual Ontologies

    Linguistic ontology:

    Exactly reflects the relations between all the lexicalized words and expressions in a language. It therefore captures valuable information about the lexical capacity of languages: what is the available fund of words and expressions in a language.

    What words can be used to name spoons?

    spoon -> object, tableware, silverware, merchandise, cutlery,


    Separate wordnets and ontologies

    WordNet1.5

    container

    box

    object

    container

    box

    Separate Wordnets and Ontologies

    Language-Neutral Ontology

    Language-Specific Wordnets

    ReferenceOntologyClasses:

    BOX

    ContainerProduct;

    SolidTangibleThing

    Dutch Wordnet

    voorwerp

    doos

    EuroWordNet Top-Ontology:

    Form: Cubic

    Function: Contain

    Origin: Artifact

    Composition: Whole


    Wordnets versus ontologies

    Wordnets versus ontologies

    Wordnets:

    autonomous language-specific lexicalization patterns in a relational network.

    Usage: to predict substitution in text for information retrieval,

    text generation, machine translation, word-sense-disambiguation.

    Ontologies:

    data structure with formally defined concepts.

    Usage: making semantic inferences.


    Wordnets as linguistic ontologies

    Wordnets asLinguistic Ontologies

    Classical Substitution Principle:

    Any word that is used to refer to something can be replaced by its synonyms, hyperonyms and hyponyms:

    horsestallion, mare, pony, mammal, animal, being.

    It cannot be referred to by co-hyponyms and co-hyponyms of its hyperonyms:

    horseXcat, dog, camel, fish, plant, person, object.

    Conceptual Distance Measurement:

    Number of hierarchical nodes between words is a measurement of closeness, where the level and the local density of nodes are additional factors.


    Linguistic principles for deriving relations

    Linguistic Principles for deriving relations

    • 1. Substitution tests (Cruse 1986):

      • 1a.It is a fiddle therefore it is a violin.

        • bIt is a violin therefore it is a fiddle.

      • 2a.It is a dog therefore it is an animal.

        • b*It is an animal therefore it is a dog.

      • 3ato kill (/a murder) causes to die (/ death)

      • to kill (/a murder) has to die (/ death) as a consequence

      • b*to die / death causes to kill

      • *to die / death has to kill as a consequence


    Linguistic principles for deriving relations1

    Linguistic Principles for deriving relations

    • 2. Principle of Economy (Dik 1978):

      • If a word W1 (animal) is the hyperonym of W2 (mammal) and W2 is the hyperonym of W3 (dog) then W3 (dog) should not be linked to W1 (animal) but to W2 (mammal).

    • 3. Principle of Compatibility

      • If a word W1 is related to W2 via relation R1, W1 and W2 cannot be related via relation Rn, where Rn is defined as a distinct relation from R1.


    Wordnet eurowordnet global wordnet

    Domains

    Ontology

    bewegen

    gaan

    move

    go

    2OrderEntity

    Traffic

    III

    Location

    Dynamic

    Air

    Road`

    rijden

    ride

    drive

    Lexical Items Table

    Lexical Items Table

    Lexical Items Table

    Lexical Items Table

    ILI-record

    {drive}

    conducir

    cavalcare

    cabalgar

    jinetear

    III

    mover

    transitar

    andare

    muoversi

    Architecture of the

    EuroWordNet Data Base

    III

    berijden

    I

    I

    III

    III

    II

    II

    III

    III

    II

    II

    guidare

    Inter-Lingual-Index

    III

    I = Language Independent link

    II = Link from Language Specific

    to Inter lingual Index

    III = Language Dependent Link


    The mono lingual design of eurowordnet

    The mono-lingual design of EuroWordNet


    Language internal relations

    Language Internal Relations

    • WN 1.5 starting point

    • The ‘synset’ as a weak notion of synonymy:

      • “two expressions are synonymous in a linguistic context C

      • if the substitution of one for the other in C does not alter

      • the truth value.” (Miller et al. 1993)

  • Relations between synsets:

  • RelationPOS-combinationExample

  • ANTONYMYadjective-to-adjective

  • verb-to-verbopen/ close

  • HYPONYMYnoun-to-nouncar/ vehicle

  • verb-to-verbwalk/ move

  • MERONYMYnoun-to-nounhead/ nose

  • ENTAILMENTverb-to-verbbuy/ pay

  • CAUSEverb-to-verbkill/ die


  • Differences eurowordnet wordnet1 5

    Differences EuroWordNet/WordNet1.5

    • Added Features to relations

    • Cross-Part-Of-Speech relations

    • New relations to differentiate shallow hierarchies

    • New interpretations of relations


    Ewn relationship labels

    EWN Relationship Labels

    • Disjunction/Conjunction of multiple relations of the same type

    • WordNet1.5

      • door1 -- (a swinging or sliding barrier that will close the entrance to a room or building; "he knocked on the door"; "he slammed the door as he left") PART OF: doorway, door, entree, entry, portal, room access

      • door 6 -- (a swinging or sliding barrier that will close off access into a car; "she forgot to lock the doors of her car") PART OF: car, auto, automobile, machine, motorcar.


    Ewn relationship labels1

    EWN Relationship Labels

    {airplane}HAS_MERO_PART: conj1 {door}

    HAS_MERO_PART: conj2 disj1{jet engine}

    HAS_MERO_PART: conj2 disj2{propeller}

    {door}HAS_HOLO_PART: disj1 {car}

    HAS_HOLO_PART: disj2 {room}

    HAS_HOLO_PART: disj3 {entrance}

    {dog} HAS_HYPERONYM: conj1{mammal}

    HAS_HYPERONYM: conj2{pet}

    {albino}HAS_HYPERONYM: disj1{plant}

    HAS_HYPERONYM: dis2{animal}

    Default Interpretation: non-exclusive disjunction


    Ewn relationship labels2

    EWN Relationship Labels

    • Disjunction/Conjunction of multiple relations of the same type

    • {{dog}

    • HAS_HYPONYM: dis1{poodle}

    • HAS_HYPONYM: dis1{labrador}

    • HAS_HYPONYM: {sheep dog}(Orthogonal)

    • HAS_HYPONYM: {watch dog}(Orthogonal)

    • Default Interpretation: non-exclusive disjunction


    Ewn relationship labels3

    EWN Relationship Labels

    • Factive/Non-factive CAUSES (Lyons 1977)

      • factive (default interpretation):

    • “to kill causes to die”:

      • {kill}CAUSES{die}

    • non-factive: E1 probably or likely causes event E2 or E1 is intended to cause some event E2:

    • “to search may cause to find”.

  • {search}CAUSES {find} non-factive


  • Ewn relationship labels4

    EWN Relationship Labels

    Reversed

    In the database every relation must have a reverse counter-part but there is a difference between relations which are explicitly coded as reverse and automatically reversed relations:

    {finger} HAS_HOLONYM{hand}

    {hand}HAS_MERONYM{finger}

    {paper-clip} HAS_MER_MADE_OF{metal}

    {metal}HAS_HOL_MADE_OF{paper-clip} reversed

    Negation

    {monkey}HAS_MERO_PART{tail}

    {ape}HAS_MERO_PART{tail} not


    Cross part of speech relations

    Cross-Part-Of-Speech relations

    • WordNet1.5: nouns and verbs are not interrelated by basic semantic relations such as hyponymy and synonymy:

      • adornment 2change of state-- (the act of changing something)

      • adorn 1change, alter-- (cause to change; make different)

    • EuroWordNet: words of different parts of speech can be inter-linked with explicit xpos-synonymy, xpos-antonymy and xpos-hyponymy relations:

      • {adorn V}XPOS_NEAR_SYNONYM{adornment N}


    Cross part of speech relations1

    Cross-Part-Of-Speech relations

    The advantages of such explicit cross-part-of-speech relations are:

    • similar words with different parts of speech are grouped together.

    • the same information can be coded in an NP or in a sentence. By unifying higher-order nouns and verbs in the same ontology it will be possible to match expressions with very different syntactic structures but comparable content

    • by merging verbs and abstract nouns we can more easily link mismatches across languages that involve a part-of-speech shift. Dutch nouns such as “afsluiting”, “gehuil” are translated with the English verbs “close” and “cry”, respectively.


    Entailment in wordnet

    Entailment in WordNet

    WordNet1.5: Entailment indicates the direction of the implication or entailment:

    a. + Temporal Inclusion (the two situations partially or totally overlap)

    a.1 co-extensiveness (e. g., to limp/to walk)hyponymy/troponymy

    a.2 proper inclusion (e.g., to snore/to sleep)entailment

    b. - Temporal Exclusion (the two situations are temporally disjoint)

    b.1 backward presupposition (e.g., to succeed/to try)entailment

    b.2 cause (e.g., to give/to have)


    Subevents in eurowordnet

    Subevents in EuroWordNet

    EuroWordNet

    Direction of the entailment is expressed by the labels factive and reversed:

    {to succeed} is_caused_by{to try}factive

    {to try}causes{to succeed}non-factive

    Proper inclusion is described by the has_subevent/ is_subevent_of relation in combination with the label reversed:

    {to snore}is_subevent_of{to sleep}

    {to sleep}has_subevent{to snore}reversed

    {to buy}has_subevent{to pay}

    {to pay}is_subevent_of{to buy}reversed


    The interpretation of the cause relation

    The interpretation of the CAUSE relation

    • WordNet1.5: The causal relation only holds between verbs and it should only apply to temporally disjoint situations:

    • EuroWordNet: the causal relation will also be applied across different parts of speech:

      • {to kill} Vcauses{death} N

      • {death} nis_caused_by{to kill} vreversed

      • {to kill } vcauses{dead} a

      • {dead} ais_caused_by{to kill} vreversed

      • {murder} ncauses{death}n

      • {death} ais_caused_by{murder} nreversed


    The interpretation of the cause relation1

    The interpretation of the CAUSE relation

    • Various temporal relationships between the (dynamic/non-dynamic) situations may hold:

      • Temporally disjoint: there is no time point when dS1 takes place and also S2 (which is caused by dS1) (e.g. to shoot/to hit);

      • Temporally overlapping: there is at least one time point when both dS1 and S2 take place, and there is at least one time point when dS1 takes place and S2 (which is caused by dS1) does not yet take place (e.g. to teach/to learn);

      • Temporally co-extensive: whenever dS1 takes place also S2 (which is caused by dS1) takes place and there is no time point when dS1 takes place and S2 does not take place, and vice versa (e.g. to feed/to eat).


    Role relations

    Role relations

    In the case of many verbs and nouns the most salient relation is not the hyperonym but the relation between the event and the involved participants. These relations are expressed as follows:

    {hammer}ROLE_INSTRUMENT{to hammer}

    {to hammer}INVOLVED_INSTRUMENT{hammer}reversed

    {school}ROLE_LOCATION {to teach}

    {to teach}INVOLVED_LOCATION {school}reversed

    These relations are typically used when other relations, mainly hyponymy, do not clarify the position of the concept network, but the word is still closely related to another word.


    Co role relations

    Co_Role relations

    guitar playerHAS_HYPERONYMplayer

    CO_AGENT_INSTRUMENTguitar

    player HAS_HYPERONYMperson

    ROLE_AGENTto play music

    CO_AGENT_INSTRUMENTmusical instrument

    to play musicHAS_HYPERONYM to make

    ROLE_INSTRUMENTmusical instrument

    guitarHAS_HYPERONYMmusical instrument

    CO_INSTRUMENT_AGENTguitar player

    ice saw HAS_HYPERONYMsaw

    CO_INSTRUMENT_PATIENTice

    sawHAS_HYPERONYMsaw

    ROLE_INSTRUMENTto saw

    iceCO_PATIENT_INSTRUMENTice saw REVERSED


    Co role relations1

    Co_Role relations

    Examples of the other relations are:

    criminalCO_AGENT_PATIENTvictim

    novel writer/ poetCO_AGENT_RESULTnovel/ poem

    doughCO_PATIENT_RESULTpastry/ bread

    photograpic cameraCO_INSTRUMENT_RESULTphoto


    Wordnet eurowordnet global wordnet

    BE_IN_STATE and STATE_OF

    Example:the poor are the ones to whom the state poor applies

    Effect:poor NHAS_HYPERONYMperson N

    poor NBE_IN_STATEpoor A

    poor ASTATE_OFpoor N reversed

    IN_MANNER and MANNER_OF

    Example:to slurp is to eat in a noisely manner

    Effect:slurp VHAS_HYPERONYMeat V

    slurp VIN_MANNERnoisely Adverb

    noisely AdverbMANNER_OFslurp V reversed


    Overview of the language internal relations in eurowordnet

    Overview of the Language Internal relations in EuroWordnet

    • Same Part of Speech relations:

    • NEAR_SYNONYMYapparatus - machine

    • HYPERONYMY/HYPONYMYcar - vehicle

    • ANTONYMYopen - close

    • HOLONYMY/MERONYMYhead - nose

    • Cross-Part-of-Speech relations:

    • XPOS_NEAR_SYNONYMYdead - death; to adorn - adornment

    • XPOS_HYPERONYMY/HYPONYMYto love - emotion

    • XPOS_ANTONYMYto live - dead

    • CAUSEdie - death

    • SUBEVENTbuy - pay; sleep - snore

    • ROLE/INVOLVEDwrite - pencil; hammer - hammer

    • STATEthe poor - poor

    • MANNERto slurp - noisily

    • BELONG_TO_CLASSRome - city


    Thematic networks

    Thematic networks

    organisme (organism)

    Causes

    genezen

    (to get well)

    Patient

    Part of

    wezen(being)

    ziekte

    (disease)

    Patient

    orgaan

    (organ)

    persoon (person)

    behandelen(treat)

    Agent

    scalpel

    Patient

    arts (doctor)

    Instrument

    opereren

    (operate)

    zieke (sick person, patient)

    maagaandoening

    (stomach disease)

    maag

    (stomach)

    Involves


    The multi lingual design of eurowordnet

    The multi-lingual design of EuroWordNet


    The multilingual design

    The Multilingual Design

    • Inter-Lingual-Index: unstructured fund of concepts to provide an efficient mapping across the languages;

    • Index-records are mainly based on WordNet1.5 synsets and consist of synonyms, glosses and source references;

    • Various types of complex equivalence relations are distinguished;

    • Equivalence relations from synsets to index records: not on a word-to-word basis;

    • Indirect matching of synsets linked to the same index items;


    Ewn interlingual relations

    EWN Interlingual Relations

    • EQ_SYNONYM: there is a direct match between a synset and an ILI-record

    • EQ_NEAR_SYNONYM: a synset matches multiple ILI-records simultaneously,

    • HAS_EQ_HYPERONYM: a synset is more specific than any available ILI-record.

    • HAS_EQ_HYPONYM: a synset can only be linked to more specific ILI-records.

    • other relations:

      CAUSES/IS_CAUSED_BY, EQ_SUBEVENT/EQ_ROLE, EQ_IS_STATE_OF/EQ_BE_IN_STATE


    Equivalent near synonym

    Equivalent Near Synonym

    • 1. Multiple Targets

      • One sense for Dutch schoonmaken (to clean) which simultaneously matches with at least 4 senses of clean in WordNet1.5:

      • {make clean by removing dirt, filth, or unwanted substances from}

      • {remove unwanted substances from, such as feathers or pits, as of chickens or fruit}

      • (remove in making clean; "Clean the spots off the rug")

      • {remove unwanted substances from - (as in chemistry)}

      • The Dutch synset schoonmaken will thus be linked with an eq_near_synonym relation to all these sense of clean.


    Equivalent near synonym1

    Equivalent Near Synonym

    • 2. Multiple Source meanings

      • Synsets inter-linked by a near_synonym relation can be linked to same target ILI-record(s), either with an eq_synonym or an eq_near_synonym relation:

      • Dutch wordnet:

      • toestel near_synonym apparaat

      • ILI-records:{machine}; {device}; {apparatus}; {tool}


    Equivalent hyponymy

    Equivalent Hyponymy

    has_eq_hyperonym

    Typically used for gaps in WordNet1.5 or in English:

    • genuine, cultural gaps for things not known in English culture, e.g. citroenjenever, which is a kind of gin made out of lemon skin,

    • pragmatic, in the sense that the concept is known but is not expressed by a single lexicalized form in English, e.g.: Dutch hoofd only refers to human head and Dutch kop only refers to animal head, English uses head for both.

      has_eq_hyponym

      Used when wordnet1.5 only provides more narrow terms. In this case there can only be a pragmatic difference, not a genuine cultural gap, e.g.: Spanish dedo can be used to refer to both finger and toe.


    Complex mappings across languages

    = normal equivalence

    =

    eq

    _has_hyponym

    =

    eq

    _has_hyperonym

    Complex mappings across languages

    GB-Net

    IT-Net

    toe

    dito

    toe

    {

    : part of foot }

    finger

    finger

    {

    : part of hand }

    head

    dedo

    dito

    {

    ,

    :

    finger or toe }

    head

    {

    : part of body }

    NL-Net

    ES-Net

    hoofd

    {

    : human head }

    kop

    {

    : animal head }

    hoofd

    dedo

    kop


    The methodologies for building wordnets

    The methodologies for building wordnets


    Overall building process

    Overall Building Process

    Machine Readable Dictionaries

    Wordnets, Taxonomies,

    Corpora

    Loaded in local databases

    Ia

    Ib

    Specification of selection criteria

    Subset of

    word meanings

    Improve and extend the wordnet fragments

    Encoding of

    language internal and equivalence relations

    Ia

    Wordnet fragment with

    links to WordNet1.5

    in local database

    Adjust coverage

    improve encoding

    II

    Load wordnet in the EuroWordNet Database

    Ic

    Verification

    by users

    Wordnet fragment in

    EuroWordNet database

    Demonstration

    in

    Information

    Retrieval

    Comparing and restructuring the wordnet

    Verification

    Report

    III


    Main methods

    Main Methods

    • Expand approach: translate WordNet1.5 synsets to another language and take over the structure

      • easier and more efficient method

      • compatible structure with WordNet1.5

      • structure is close to WordNet1.5 but also biased by it

    • Merge approach: create an independent wordnet in another language and align the separate hierarchies by generating the appropriate translations

      • more complex and labour intensive

      • different structure from WordNet1.5

      • lanuage specific patterns can be maintained


    Methods for extracting language internal relations

    Methods for extracting language-internal relations

    • editors and database for manually encoding relations;

    • comparison with WordNet1.5 structure;

    • definition patterns in monolingual dictionaries;

    • co-occurrences in corpora;

    • morphology;

    • bilingual dictionaries;

    • lexical semantic substitution tests


    Wordnet eurowordnet global wordnet

    Methods for extracting equivalence relations

    • extract monosemeous translations of English synsets, e.g. a Spanish word has only 1 translation to an English word which has only one sense and vice versa;

    • disambiguation of multiple ambivalent translations by measuring their conceptual-distance between the senses of these translations in the WordNet1.5 hierarchy (Rigau and Aguirre, 95);

    • disambiguation of ambivalent translations by measuring the conceptual-distance directly in the WordNet1.5 hierarchy between alternative translations and the translations of the direct semantic context in the source wordnet;

    • disambiguation of ambivalent translations by measuring the overlap in top-concepts inherited in the source wordnet and inherited for the different senses of translations in WordNet1.5;


    Aligning wordnets

    object

    artifact object

    natural object

    instrument

    muziekinstrument

    musical instrument

    orgel

    organ

    ?

    organ

    organ

    hammond orgel

    hammond organ

    Aligning wordnets


    Inheriting semantic features

    Inheriting Semantic Features

    hart 1

    orgaan 1 (Living Part) deel 2 (Part) iets 1 LEAF

    -----------------------------------------------------------------------------------------------------

    heart 1

    playing card 1 card 1 (Artifact Function Object) paper 6 (Artifact Solid)

    material 5 (Substance) matter 1 inanimate object 1 entity 1 LEAF

    heart 2

    disposition 2 (Dynamic Experience Mental)nature 1

    trait 1 (Property) attribute 1 (Property) abstraction 1 LEAF

    heart 3

    bravery 1 spirit 1 character 1 trait 1 (Property) attribute 1 (Property)

    abstraction 1 LEAF

    heart 4

    internal organ 1 organ 4 (Living Part) body part 1 (Living Part)

    part 10 entity 1 LEAF


    Reliability of equivalence relations

    Reliability of Equivalence Relations


    Reliability of equivalence relations1

    Reliability of Equivalence Relations


    Conflicting starting points

    Conflicting Starting points

    • 1. There should be a maximum of flexibility:

      • the wordnets should be able to reflect language-specific relations and patterns

      • the wordnets should be built relatively independently because each sites has different starting points:

        • different tools, database and resources (Machine Readable Dictionaries)

        • differences in the languages

  • 2. The wordnets have to be compatible in terms of coverage and relations to be useful for multilingual information retrieval and translations tools and to be able to compare the wordnets.


  • Measures to achieve maximal compatibility

    Measures to achieve maximal compatibility

    • The results are loaded into a common Multilingual Database (Polaris):

      • consistency checks and types of incompatibility

      • specific comparison options to measure consistency and overlap in coverage

    • User-guides for building wordnets in each language:

      • the steps to encode the relations for a word meaning.

      • common tests and criteria for all the relations.

      • overview of problems and solutions.

    • A set of common Base-Concepts which are shared by all the sites, having:

      • most relations and the most-important positions in the wordnets

      • most meanings and badly defined

    • Classification of the common Base Concept in terms of a Top-Ontology of 63 basic Semantic Distinctions

    • Top-Down Approach, where first the Base Concepts and their direct context are (manually) encoded and next the wordnets are (semi-automatically) extended top-down to include more specific concepts that depend on these Base Concept.


    Top ontology and base concepts

    Top-Ontology and Base Concepts

    • Top-Ontology with 63 higher-level concepts

    • Existing Ontologies:

      • WordNet1.5 top-levels

      • Aktions-Art models (Vendler, Verkuyl)

      • Acquilex and Sift ontologies (EC-projects)

      • Qualia-structure (Pustejovsky)

      • Upper-Model, MikroKosmos, Cyc, Ad Hoc ANSI-Committee on ontologies

    • The ontology was adapted to represent the variety of concepts in the set of Common Base Concepts, across the 4 language:.

      • homogenous Base-Concept Clusters

      • average size of Base Concept Cluster

      • apply to both nouns and verbs

  • Set of 1024 common Base Concepts making up the core of the separate wordnets.


  • Base concepts

    Base Concepts

    • Procedure:

    • Each site determined the set of word meanings with most relations (up to 15% of all relations) and high positions in the hierarchy.

    • This set was extended with all meanings used to define the first selection.

    • The local selection was translated to WordNet1.5 equivalences: 4 lists of WordNet1.5 synsets (between 450 – 2000 synsets per selection).

    • These sets of WordNet1.5 translations have been compared.

    • Concepts selected by all sites:

    • 30 synsets (24 nouns synsets, 6 verb synsets).

    • Explanations:

    • The individual selections are not representative enough.

    • There are major differences in the way meanings are classified, which have an effect on the frequency of the relations.

    • The translations of the selection to WordNet1.5 synsets are not reliable

    • The resources cover very different vocabularies


    Wordnet eurowordnet global wordnet

    Concepts selected by at least two sites: intersections of pairs

    NOUNSVERBS

    NLESITGB/WNNLESITGB/WN

    NL1027103182333323364286

    ES10352345284361281843

    IT18245334167421810439

    GB/WN3332841671296864339236

    Total Set of shared Base Concepts : Union of intersection pairs

    NounsVerbsTotal

    1stOrderEntities491491

    2ndOrderEntities272228500

    3rdOrderEntities3333

    Total7962281024


    Wordnet eurowordnet global wordnet

    Table 4: Number of Common BCs represented in the local wordnets

    Related to CBCsEq_synonymEq_near_ CBCs Without

    RelationsSynonym relationsDirect Equivalent

    AMS99272526997

    FUE10121009015

    PSA8787591919

    Table 5: BC4 Gaps in at least two wordnets (10 synsets)

    body covering#1mental object#1; cognitive content#1; content#2

    body substance#1natural object#1

    social control#1place of business#1; business establishment#1

    change of magnitude#1plant organ#1

    contractile organ#1Plant part#1

    psychological feature#1spatial property#1; spatiality#1


    Wordnet eurowordnet global wordnet

    • Table 6: Local senses with complex equivalence relations to CBCs

    • NLESIT

    • Eq_has_hyperonym61404

    • eq_has_hyponym341420

    • Eq_has_holonym20

    • Eq_has_meronym32

    • Eq_involved3

    • Eq_is_caused_by3

    • Eq_is_state_of1

    • Example of complex relation

      • CBC: cause to feel unwell#1, Verb

      • Closest Dutch concept: {onwel#1}, Adjective (sick)

  • Equivalence relation: eq_is_caused_by


  • Adaptation of base concepts in eurowordnet 2

    Adaptation of Base Concepts in EuroWordNet-2

    • A similar selection of fundamental concepts has been made in EuroWordNet-2

    • The selected concepts have been compared among German, French, Czech and Estonian and with the EuroWordNet-1 selection

    • The EuroWordNet-1 set has been extended to 1310 Base Concepts

    • A distinction has been made between Hard and Soft Base Concepts

      • Hard: represented by only a single Index-record

      • Soft: represented by several close Index-records

    • The final set has been used as starting point in EuroWordNet-2


    Comparison of base concept selections

    Comparison of Base Concept Selections


    Revised set of base concepts

    Revised Set of Base Concepts


    Starting points for the top ontology

    Starting points for the Top-Ontology

    • The ontology should support the building and encoding of semantic networks as linguistic ontologies: networks of lexicalized words and expressions in a language.

    • The classification of the Base Concepts in terms of the Top Ontology should apply to all the involved languages.

    • Enforce uniformity and compatibility of the different wordnets, by providing a common framework. Divide the Base Concepts (BCs) into coherent clusters to enable contrastive-analysis and discussion of closely related word meanings

    • Customize the database by assigning features to the top-concepts, irrespective of language-specific structures.

    • Provide an anchor point for connecting other ontologies to the Inter-Lingual-Index, such as CYC, MikroKosmos, the Upper-Model, by linking them to the corresponding ILI-records.


    Principles for deciding on the distinctions

    Principles for deciding on the distinctions

    • Starting point is that the wordnets are linguistic ontologies:

    • Semantic classifications common in linguistic paradigms: Aktionsart models [Vendler 1967, Verkuyl 1972, Verkuyl 1989, Pustejovsky 1991], entity-orders [Lyons 1977], Aristotle’s Qualia-structure [Pustejovsky 1995].

    • Ontologies developed in previous EC-projects, which had a similarbasisand are well-known in the project consortium: Acquilex (BRA 3030, 7315), Sift (LE-62030, [Vossen and Bon 1996].

    • The ontology should be capable of reflecting the diversity of the set of common BCs, across the 4 languages. In this sense the classification of the common BCs in terms of the top-concepts should result in:

      • Homogeneous Base Concept Clusters: classifications in WordNet1.5 and the other wordnets.

      • Average-sized Base Concept Clusters: not extremely large or small.


    Other important characteristics

    Other important characteristics:

    • The distinctions apply to both nouns, verbs and adjectives, because these can be related in the language-specific wordnets via a xpos_synonymy relation, and the ILI-records can be related to any part-of-speech.

    • The top-concepts are hierarchically ordered by means of a subsumption relation but there can only be one super-type linked to each top-concept: multiple inheritance between top-concepts is not allowed.

    • In addition to the subsumption relation top-concepts can have an opposition-relation to indicate that certain distinctions are disjunct, whereas others may overlap.

    • There may be multiple relations from ILI-records to top-concepts: the Base Conceptss can be cross-classified in terms of multiple top-concepts (as long as these have no opposition-relation between them): i.e. multiple inheritance from Top-Concept to Base Concept is allowed.

    • Result: the TCs function as cross-classifying features rather than conceptual classes.

      • Meanings for bodyparts are not linked to a single class BodyPart but to two features: Living and Part.


    The eurowordnet top ontology 63 concepts excluding the top

    The EuroWordNet Top-Ontology: 63 concepts (excluding the top)

    • First Level [Lyons 1977]:

    • 1stOrderEntity(491 BC synsets, all nouns)

      • Any concrete entity (publicly) perceivable by the senses and located at any point in time, in a three-dimensional space.

    • 2ndOrderEntity(500 BC synsets, 272 nouns and 228 verbs)

      • Any Static Situation (property, relation) or Dynamic Situation, which cannot be grasped, heart, seen, felt as an independent physical thing. They can be located in time and occur or take place rather than exist; e.g. continue, occur, apply

    • 3rdOrderEntity(33 BC synsets, all nouns)

      • An unobservable proposition that exists independently of time and space. They can be true or false rather than real. They can be asserted or denied, remembered or forgotten. E.g. idea, though, information, theory, plan.


    Test to distinguish 1st 2nd and 3rd orderentities

    Test to distinguish 1st, 2nd and 3rd OrderEntities

    • Third-order entities cannot occur, have no temporal duration and therefore fail on both tests:

    • aThe same person was here again to-day

    • bThe same thing happened/occurred again to-day

    • *?The idea, fact, expectation, etc.... was here/occurred/ took place

    • A positive test for a 3rdOrderEntity is based on the properties that can be predicated:

    • okThe idea, fact, expectation, etc.. is true, is denied, forgotten

    • The first division of the ontology is disjoint: BCs cannot be classified as combinations of these TCs. This distinction cuts across the different parts of speech in that:

    • 1stOrderEntities are always (concrete) nouns.

    • 2ndOrderEntities can be nouns, verbs and adjectives, where adjectives are always non-dynamic (refer to states and situations not involving a change of state).

    • 3rdOrderEntities are always (abstract) nouns.


    Base concepts classified as 3rdorderentities

    Base Concepts classified as 3rdOrderEntities

    • theory; idea; structure; evidence; procedure; doctrine; policy; data point; content; plan of action; concept; plan; communication; knowledge base; cognitive content; know-how; category; information; abstract; info;


    Wordnet eurowordnet global wordnet

    1stOrderEntity1

    Origin 0 the way in which an entity has come about

    Natural21Living30Plant18

    Human106

    Creature2

    Animal123

    Artifact144

    Function0 the typical activity or role that is associated with an entity

    Vehicle8 Occupation23Covering8

    Garment3 Software4 Furniture6

    Place45Container12Comestible32

    Instrument18Container12Building13

    Representation12: MoneyRepresentation10; LanguageRepresentation34; Image Representation9

    Form0 a-morf or fixed shape.

    Substance32Solid63

    Liquid13

    Gas1

    Object62

    Composition0 group of self-contained wholes or as a part of such a whole

    Part86

    Group63


    Conjunctive classes of 1storderentities

    Conjunctive classes of 1stOrderEntities

    Frequent combinations

    5Comestible;Solid;Artifact 7LanguageRepresentation

    5Container;Part;Solid;Living 7Vehicle;Object;Artifact

    5Furniture;Object;Artifact10Instrument;Object;Artifact

    5Instrument;Artifact12Part

    5Living14Place

    5Plant14Place;Part

    6Liquid15Substance

    6Object;Artifact19LanguageRepresentation;Artifact

    6Part;Living20Occupation;Object;Human

    6Place;Part;Solid22Object;Animal; Function

    7Building;Object;Artifact38Group;Human

    7Group42Object;Human


    Conjunctive classes of 1storderentities1

    Conjunctive classes of 1stOrderEntities

    • Low Frequent combinations

    • fruit:Comestible (Function)life: Group (Composition)

    • Object (Form) Living (Natural, Origin)

      • Part (Composition)cell:Part (Composition)

      • Plant (Natural, Origin) Living (Natural, Origin)

  • skin:Covering (Covering)arms:Instrument (Function)

  • Solid (Form)Group (Composition)

    • Part (Composition)Object (Form)

    • Living (Natural, Origin) Artifact (Origin)


  • 1storderentities classified as function only

    1stOrderEntities classified as Function only

    barrier 1; belonging 2;building material 1;causal agency 1;commodity 1;consumer goods 1;creation 3;curative 1;decoration 2;device 4;fastener 1;force 6;force 7;form 5;impediment 1;

    medicament 1;piece of work 1;possession 1;protection 4;remains 2;restraint 2;support 6;support; 7;supporting structure 1;thing 3


    2ndorderentity 0

    2ndOrderEntity0

    SituationType6 (the event-structure in terms of which a situation can be characterized as a conceptual unit over time; Disjoint features)

    Dynamic134

    (he sat down quickly. a quick meeting)

    BoundedEvent183

    UnboundedEvent48

    Static28

    (?he sits quickly.)

    Property61

    Relation38

    SituationComponent0

    (the most salient semantic component(s) that characterize(s) a situation; Conjuncted Features)

    Cause67Communication50Condition62 Physical140

    Agentive170Existence27Experience43 Possession23

    Phenomenal17Location76Manner21 Purpose137

    Stimulating25Mental90Modal10Quantity39

    Social102Time24 Usage8


    Conjunctive classes of 2ndorderentities

    Conjunctive classes of 2ndOrderEntities

    Static

    5Property;Physical;Condition

    5Property;Stimulating;Physical

    5Relation

    5Relation;Social

    6Static;Quantity

    7Property;Condition

    8Relation;Location

    9Property

    10Relation;Physical;Location:

    adjoin 1; aim 4; blank space 1; course 7; direction 8; distance 1; elbow room 1; path 3; spatial property 1; spatial relation 1


    Conjunctive classes of 2ndorderentities1

    Conjunctive classes of 2ndOrderEntities

    Dynamic

    5BoundedEvent;Cause;Physical

    5BoundedEvent;Cause;Physical;Location

    5BoundedEvent;Time

    5Dynamic

    5Dynamic;Location

    5Dynamic;Phenomenal

    5Dynamic;Phenomenal;Physical

    6BoundedEvent;Agentive

    6BoundedEvent;Location

    6BoundedEvent;Physical;Location

    6Dynamic;Agentive;Communication

    6Dynamic;Cause

    8BoundedEvent;Agentive;Mental;Purpose

    8BoundedEvent;Quantity;Time

    9BoundedEvent;Cause

    9Dynamic;Experience;Mental experience 7; find 3;affect 5; arouse 5; excite 2; cognition 1; desire 2; disposition 2; disposition 4; disturbance 7; emotion 1; feeling 1; humor 3; pleasance 1; process 4; look 8; phenomenon 1; cause to appear 1; perception 2; sensation 1; feel 12; experience 8; trouble 3; reality 1


    Top down building procedure

    Top-Down Building Procedure

    • 1) Construction of a core wordnet from the common set of Base Concepts

    • Find Representatives in the local language for the Common Base Concepts (1310 synsets)

    • Add local Base Concepts that are not selected as Common Base Concepts

    • Specify the hyperonyms of the local and common Base Concepts

    • 2) Extend the Core Wordnets

    • Add the first level of hyponyms to the core wordnets

    • Add other hyponyms which have many sub-hyponyms

    • Add other types of relations: XPOS, roles, meronymy, subevents, causes.

    • 3) Verify the Selection

    • Corpus frequency: Parole lexicons and corpora

    • Top-Concept clustering

    • Intersection of ILI-records

    • Overlap in ILI-chains


    Top down building

    Top-Down Building

    Top-Ontology

    63TCs

    Hypero

    nyms

    Hypero

    nyms

    CBC

    Represen-

    tatives

    Local

    BCs

    1310 CBCs

    149 new ILIs

    CBC

    Repre-senta.

    Local

    BCs

    WMs

    related via

    non-hypo

    nymy

    WMs

    related via

    non-hypo

    nymy

    Remaining

    WordNet1.5

    Synsets

    First Level Hyponyms

    First Level Hyponyms

    Remaining

    Hyponyms

    Remaining

    Hyponyms

    Inter-Lingual-Index


    The current wordnets

    The current wordnets


    Comparison of wordnets

    Comparison of wordnets

    • In depth comparison of major semantic fields

    • Comparison of the intersection of the associated ILI-records Distribution of the associated ILI-records over the different top ontology clusters

    • Comparison of the hyponymy relations in the wordnets, projected on the associated ILI-records


    Intersection of the associated ili records

    Nouns

    Verbs

    Total

    62780

    32520

    Total

    12215

    7455

    frequency

    % of 

    (WN,IT, NL, ES)

    % of 

    (IT, NL, ES)

    frequency

    % of 

    (WN,IT, NL, ES)

    % of 

    (IT, NL, ES)

    ES

    24596

    39.2%

    75.6%

    4654

    38.1%

    62.4%

    IT

    14272

    22.7%

    43.9%

    4673

    38.3%

    62.7%

    NL

    21259

    33.9%

    65.4%

    6416

    52.5%

    86.1%

    Ç (ES, IT)

    10907

    17.4%

    33.5%

    3272

    26.8%

    43.9%

    Ç (ES, NL)

    14773

    23.5%

    45.4%

    3870

    31.7%

    51.9%

    Ç (IT, NL)

    9862

    15.7%

    30.3%

    3950

    32.3%

    53.0%

    Ç (ES, IT, NL)

    8183

    13.0%

    25.2%

    3051

    25.0%

    40.9%

    Intersection of the associated ILI-records


    Distribution over the top ontology clusters

    Distribution over the top ontology clusters


    Distribution over the top ontology clusters1

    Distribution over the top ontology clusters


    Comparison of the hyponymy relations projected on the associated ili records

    Comparison of the hyponymy relations, projected on the associated ILI-records

    To be able to compare hyponymy chains, each word sense in the chain has been replaced by the ILI-records that are linked to these synsets which gives the following result:

    veranderen (change)  bewegen (move intransitive)  bewegen (move reflexive)  voortbewegen (move location)  verplaatsen (move from A to B)  stijgen (move to a higher position)  opstijgen (take off)

    00064108 01046072 01046072 01046072 01055491 01094615 00257753


    Coverage of complete noun chains projected over wn1 5 structure

    Coverage of complete noun chains projected over WN1.5 structure


    Partial noun chains projected over wn1 5

    Partial noun chains projected over WN1.5


    Partial noun chains with 1 gap projected over wn1 5

    Partial noun chains with 1 gap projected over WN1.5


    Towards an efficient condensed and universal index of sense distinctions

    Towards an efficient, condensed and universal index of sense-distinctions

    • Independently of the wordnet structures in each language, we can manipulate the mapping across languages via the ILI.

    • We can use the information of all the languages to correct incompleteness and inconsistencies of the individual resources

    • Ultimately, we should try to find a minimal and sufficient set of concepts to provide an efficient mapping.


    Characteristics of the inter lingual index

    Characteristics of the Inter-Lingual-Index

    • The Inter-lingual-Index (ILI) is an unstructured fund of concepts with the sole purpose of providing an efficient mapping of senses across languages.

    • Requirements:

    • 1. efficient level of granularity

    • ILIWordnets

    • {break} “He broke the glass”breken Dutch

    • {break; cause to break}breken Dutch

    • {break; damage} inflict damage upon.romper Spanish

    • rompere Italian

    • 2. superset of concepts that occur across languages

      • ILIWordnets

    • {cashier}eq_hyperonymcassière Dutch

    • eq_hyperonym cajeraSpanish

    • {female cashier} eq_synonymcassière Dutch

    • eq_synonymcajeraSpanish


    A minimal and efficient set of concepts

    A Minimal and Efficient set of concepts

    • Globalizing the sense-differentiation:

      • create metonymic clusters

      • abstract from contextual specialization and grammatical perspectives

      • abstract from part-of-speech realization

      • abstract from productive and predictable meanings

    • Extending the Inter-Lingual-Index to become the superset of concepts occurring in two or more wordnets only if:

      • concepts are unpredictable and unproductive

      • concepts cannot be linked exhaustively and uniquely to the ILI


    Under specified concepts metonymic clusters

    Under-specified conceptsMetonymic clusters

    eq_metonym

    eq_metonym

    club

    metonym#

    club: organization

    metonym#

    club: building

    {vereniging}NL

    eq_synonym

    eq_synonym

    {club}EN

    {club;

    verenigingsgebouw}NL


    Under specified concepts generalization and diathesis clusters

    Under-specified conceptsGeneralization and Diathesis clusters

    eq_diatheis

    eq_diathesis

    break

    diathesis#

    break:

    inchoative

    diathesis#

    break:

    causative

    {breken; kapotgaan}NL

    {rompere}IT

    {breken; kapotmaken}NL

    eq_synonym

    eq_synonym

    {rompersi}IT


    Under specified for pos

    Under-specified for POS

    eq_xpos_synonym

    eq_xpos_synonym

    depart

    xpos#

    departure

    xpos#

    depart

    {vertrekkenV}NL

    {departV}EN

    eq_synonym

    eq_synonym

    {departureN}EN

    {vertrekN}NL


    Overview of equivalence relations to the ili

    Overview of equivalence relations to the ILI

    RelationPOSSources: TargetsExample

    eq_synonymsame1:1auto : voiture

    car

    eq_near_synonymanymany : manyapparaat, machine, toestel:

    apparatus, machine, device

    eq_hyperonymsamemany : 1 (usually)citroenjenever:

    gin

    eq_hyponymsame(usually) 1 : manydedo :

    toe, finger

    eq_metonymysamemany/1 : 1universiteit, universiteitsgebouw:

    university

    eq_diathesissamemany/1 : 1raken (cause), raken:

    hit

    eq_generalizationsamemany/1 : 1schoonmaken :

    clean


    Progress on restructuring the ili

    Progress on restructuring the ILI

    Clusters added manually and automatically based on:

    • structural properties of WN1.5

    • mapping to other sources: Levin’s classes, WN1.6

    • cross-lingual mapping

      clusterswordsword sensessynsets

      Nouns1703139832052895

      Verbs2905179951343839

      New ILIs from other wordnets have not yet been added. We estimated that for verbs hardly any new ILIs are needed, for nouns about 30% of non-translated concepts (2,000 synsets based on Dutch).


    Effects of ili clusters

    Effects of ILI-clusters

    Intersection of ILI-references for Dutch, Spanish, Italian and English

    Nouns 2895 clustered synsets (4,6% of 62780 WN1.5 noun synsets)

    intersection increased from 7736 (23,8%) to 8183 (25,2%) out of the union of 32520 synsets

    Verbs 3839 clustered synsets (31,4% of 12215 WN1.5 verb synsets)

    intersection increased from 1632 (21,9%) to 3051 (40,9%) out of the union of 7455 synsets


    Superset of all concepts

    Superset of all concepts.

    • Procedure:

      • Initially, the ILI will only contain WordNet1.5 synsets.

      • a site that cannot find a proper equivalent among the available ILI-concepts will link the meaning to another ILI-record using a so-called complex-equivalence relation and will generate a potential new ILI-record:

  • Dutch MeaningDefinitionComplex-equivalenceTarget concept

  • klunento walk on skates has_eq_hyperonymwalk

    • after a building-phase all potentially-new ILI-records are collected and verified for overlap by one site;

    • a proposal for updating the ILI is distributed to all sites and has to be verified;

    • the ILI is updated and all sites have to reconsider the equivalence relations for all meanings that can potentially be linked to the new ILI-records;


  • Filling gaps in the ili

    Filling gaps in the ILI

    Types of GAPS

    • genuine, cultural gaps for things not known in English culture, e.g. citroenjenever, which is a kind of gin made out of lemon skin,

      • Non-productive

      • Non-compositional

    • pragmatic, in the sense that the concept is known but is not expressed by a single lexicalized form in English, e.g.: container, borrower, cajera (female cashier)

      • Productive

      • Compositional

    • Universality of gaps: Concepts occurring in at least 2 languages


    Productive and predictable lexicalizations exhaustively linked to the ili

    Productive and Predictable Lexicalizations exhaustively linked to the ILI

    beat

    eq_has_hyperonym

    eq_has_hyperonym

    {doodslaanV}NL

    {totschlagenV}DE

    eq_has_hyperonym

    eq_has_hyperonym

    kill

    {doodstampenV}NL

    {tottrampelnV}DE

    eq_has_hyperonym

    eq_has_hyperonym

    stamp

    eq_has_hyperonym

    {doodschoppenV}NL

    kick

    eq_has_hyperonym

    eq_has_hyperonym

    eq_has_hyperonym

    cashier

    {casière}NL

    {cajeraN}ES

    eq_in_state

    female

    eq_in_state

    eq_has_hyperonym

    fish

    {alevínN}ES

    young

    eq_in_state


    Wordnet gaps across languages

    WordNet gaps across languages


    Towards an efficient condensed and universal index of sense distinctions1

    Metonymy/

    Generalization

    clusters

    Universal

    Core meanings

    WordNet1.5

    POS

    Independent

    Non-predictable

    90,000

    concepts

    Productive derivations and compounds linked

    exhaustively

    Universal systematic polysemy and level of granularity

    Language and domain specific lexicalizations that do not occur in a large variety of languages

    Language specific realizations in grammatical forms

    Towards an efficient, condensed and universal index of sense-distinctions


    The eurowordnet database

    The EuroWordNet database

    1.) The actual wordnets in Flaim database format: an indexing and compression format of Novell.

    2.) Polaris (Louw 1997): Re-implementation of the Novell ConceptNet toolkit (Díez-Orzas et al 1995) adapted to the EuroWordNet architecture.

    • import and export wordnets or wordnet selections from/to ASCII files.

    • resolve links for imported concepts.

    • edit and add concepts, variants and relations in the wordnets.

    • access to the ILI and ontologies and to switch between the wordnets and ontologies via the ILI.

    • extract, import and export clusters of senses based on relations.

    • project synsets or clusters from one wordnet to another wordnet

    • compare clusters of synsets.

    • import new or adapted ILI-records.

    • update ILI-references to updated ILI.

      3. Periscope (Cuypers and Adriaens 1997): a graphical interface for viewing the EuroWordNet database.


    Global wordnet association http www globalwordnet org

    Global Wordnet Associationhttp://www.globalwordnet.org

    • provide a standardized framework to link, compare and build complete wordnets for all the European languages and dialects.

    • initialize the development of wordnets in non-European languages

    • develop more specific definitions, tests and procedures for evaluating and developing wordnets.

    • extend the specification of EuroWordNet to lexical units which are not yet covered (adjectives/adverbs, lexicalized phrases and multi-words).

    • develop (axiomatized) ontologies for Domains and World-Knowledge that can be shared by all languages via the ILI.

    • develop an efficient ILI for linking, sharing, consistency checking and cross-language technology applications. This ILI could function as a gold-standard of sense-distinctions.

    • organize a (annual/bi-annual) workshop or conference.


    2nd global wordnet conference

    2nd Global Wordnet Conference

    • Location: Masaryk University, Brno (Czech Republic),

    • January, 20 - 23, 2004.

    • http://www.fi.muni.cz/gwc2004/


    Other wordnet initiatives

    Other wordnet initiatives

    • Welsh

    • Basque, Catalan

    • Chinese

    • BalkaNet

    • IndoWordnet

    • Meaning

    • Danish

    • Norway

    • Swedish

    • Portuguese

    • Arabic

    • Korean

    • Russian


    Balkanet

    BalkaNet

    • Funded by the European Union as project IST-2000-29388.

    • 3-year project: 2001 - 2004

    • Follows a strict EuroWordNet approach:

      • Expanded set of base concepts

      • Top-down building approach

    • EWN database extended with:

      • Greek, Romanian, Serbian, Turkish, Bulgarian, Czech

    • Development of new wordnet database system: VisDic

    • http://www.ceid.upatras.gr/Balkanet/.


    Indowordnet

    IndoWordnet

    • Current Wordnet development in India:

      • Hindi and Marathi at IIT Bombay,

      • Tamil at Anna University-K.B Chandrashekhar Research Centre (AU-KBC) Chennai and Tamil University Tanjavur,

      • Gujarathi at MS University Baroda, Oriya at Utkal University Bhubaneswar and Bengali at IIT Kharagpur.

    • The Hindi WordNet is at an advanced stage of development with about 11000 semantically linked synsets and with associated software and user interface.


    Indowordnet1

    IndoWordnet

    • By the end of 2003 each Indian language will create a WordNet of 5000 synsets. These will be for about 2000 most frequent content words in each language. Use will be made of the wordlist sorted by frequency- available with the CIIL

    • Language specific WordNets developed by the following institutions:

      • CIIL, Mysore: Kannada, Kashmiri, Punjabi, Urdu, Himachali, Malayalam.

      • IIT Bombay: Hindi, Marathi and Konkani

      • AU-KBC Chenai and Tamil University Tanjavur: Tamil and Malayalam

      • University of Hyderabad: Telegu

      • University of Baroda: Gujarati

      • Utkal University Bhubaneswar: Oriya

      • IIT Kharagpur: Bengali

    • Reserach groups have to be identified for building the WordNets of Assamese, Nepali and Languages of the North East.


    Developing multilingual web scale language technologies http www lsi upc es nlp meaning

    Meaning

    Developing Multilingual Web-scale Language Technologies

    http://www.lsi.upc.es/~nlp/meaning/


    Meaning objectives

    Meaning Objectives

    • Funded by the European Union as project IST-2001-34460

    • 3 -year project: April 2002 - April 2005

    • Large-scale (Lexical) Knowledge Bases

      • Automatic enrichment of EWN

      • Mixed approach (KB + ML)

      • Applied to Q/A, CLIR

    • Problem

      • structural and lexical ambiguity


    Meaning approach

    Meaning Approach

    • automatic collection of sense examples (Leacock et al. 98, Mihalcea y Moldovan 99)

    • Large-scale WSD (Boosting, SVM, transductives)

    • Large-scale Knowledge Acquisition (McCarthy 01, Agirre & Martinez 02)


    Wordnet eurowordnet global wordnet

    English

    Web Corpus

    Italian

    Web Corpus

    English

    EWN

    Italian

    EWN

    Multilingual

    Central Repository

    Spanish

    EWN

    Basque

    EWN

    Spanish

    Web Corpus

    Catalan

    EWN

    Basque

    Web Corpus

    Catalan

    Web Corpus

    Meaning

    Architecture

    WSD

    WSD

    ACQ

    UPLOAD

    UPLOAD

    ACQ

    PORT

    PORT

    PORT

    PORT

    UPLOAD

    UPLOAD

    ACQ

    ACQ

    WSD

    PORT

    UPLOAD

    WSD

    ACQ

    WSD


    Wordnet eurowordnet global wordnet

    Meaning

    WP6: Word Sense Disambiguation

    • A combination of unsupervised Knowledge-based and supervised Machine Learning techniques that will provide a high-precision system that is able to tag running text with word senses

    • A system that acquires a huge number of examples per word from the web

    • The use of sophisticated linguistic information, such as, syntactic relations, semantic classes, selectional restrictions, subcategorization information, domain, etc.

    • Efficient margin-based Machine Learning algorithms.

    • Novel algorithms that combine tagged examples with huge amounts of untagged examples in order to increase the precision of the system.


    Wordnet eurowordnet global wordnet

    THE END...


  • Login