slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
German Rigau i Claramunt lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics PowerPoint Presentation
Download Presentation
German Rigau i Claramunt lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics

Loading in 2 Seconds...

play fullscreen
1 / 127

German Rigau i Claramunt lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics - PowerPoint PPT Presentation


  • 439 Views
  • Uploaded on

Ontologies. German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya. WordNet (Miller et al. 90, Fellbaum 98) EuroWordNet (Vossen et al. 98) Spanish WordNet

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'German Rigau i Claramunt lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics' - salena


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Ontologies

German Rigau i Claramunt

http://www.lsi.upc.es/~rigau

TALP Research Center

Departament de Llenguatges i Sistemes Informàtics

Universitat Politècnica de Catalunya

ontologies outline
WordNet (Miller et al. 90, Fellbaum 98)

EuroWordNet (Vossen et al. 98)

Spanish WordNet

Combining Methods (Atserias et al. 97)

Mapping hierarchies (Daudé et al. 01)

Mikrokosmos (Viegas et al. 96)

Cyc (Malesh et al. 96)

WordNet 2 (Harabagiu 98)

MindNet (Richardson et al. 97)

ThoughtTreasure (Mueller 00)

Meaning ...

OntologiesOutline
slide3

WordNet & EuroWordNet

German Rigau i Claramunt

http://www.lsi.upc.es/~rigau

TALP Research Center

Departament de Llenguatges i Sistemes Informàtics

Universitat Politècnica de Catalunya

wordnet eurowordnet wordnet
Universidad de Princeton (Miller et al. 1990)

Conceptos lexicalizados (parabras, lexíes)

Relacionados entre sí por relaciones semánticas

sinonimia

antonimia

hiperonimia-hiponimia

meronimia

implicación

causa

...

WordNet & EuroWordNetWordNet
wordnet eurowordnet relaciones sem nticas de wn1 5
Sinonimia

Conceptos Lexicalizados (SYNSETS)

Noción débil de sinonimia: Sinonimia en contexto

Synset: Conjunto de palabras o lexías que en un contexto dado expresan un concepto

Hiperonimia / Hiponimia

Relación de clase a subclase

WordNet & EuroWordNetRelaciones Semánticas de WN1.5
slide6
Meronimias

Parte componente

{mano}{brazo}

Elemento de colectividad

{persona}{gente}

Sustancia

{periódico}{papel}

WordNet & EuroWordNetRelacions Semàntiques de WN1.5

slide7
Antonimia

{grande}{pequeño}

Causa

{matar}{morir}

Implicación

{divorciarse}{casarse}

Derivación

{presidencial}{presidente}

Similitud

{bueno}{positivo}

WordNet & EuroWordNetRelaciones Semánticas de WN1.5

slide8

WordNet & EuroWordNetEjemplo WordNet

<conveyance>

<vehicle>

<doorlock>

<car door>

<motor vehicle, automovile,...>

<cruiser, squad car, patrol car, ...>

<cruiser, squad car, patrol car, ...>

<cab, taxi, hack, ...>

slide9
Proyecto LE-2 4003

Telematics Application Programme de la UE

Redes semánticas de diversas lenguas

Integradas e interconectadas

Inglés Universidad de Sheffield

Holandés Univ. de Amsterdam

Italiano I.L.C. de Pisa

Español UB, UPC, UNED.

Computers and the Humanities

(Vol.monográfico,1998)

http://www.hum.uva.nl/~ewn/

WordNet & EuroWordNetEuroWordNet

slide10
EWN2

Alemán, Francés, Checo, Sueco, Estonio

Proyecto ITEM

Castellano, Catalán, Vasco

CREL (Centre de Referència d’Enginyeria Lingüística)

Catalán (UB, UPC)

WordNet & EuroWordNetExtensiones EuroWordNet

slide11
Desarrollo de recursos Básicos

Tratamiento interlingüístico de la información

- Sistemas multilingües de recuperación de información (p.e., Internet)

- Módulo léxico-semántico de los sistemas de ingeniería lingüística

 Extracción de información

 Traducción automática

WordNet & EuroWordNetAplicaciones

slide12
Preservación de las relaciones semánticas específicas de cada lengua

Máxima compatibilidad entre los diferentes recursos

Relativa independencia de los WordNets

en el proceso de construcción

en el resultado final

WordNet & EuroWordNetRequisitos de Diseño

slide14
Núcleo

El ILI

La Top Concept Ontology (TCO)

Ontología de dominios (DO)

Periferia

WordNets específicos

WordNet & EuroWordNetComponentes de EuroWordNet

slide15
Colección no estructurada de elementos

Ligados con

al menos, un synset de un EWN

un elemento de la TCO o DO

Asociados a synsets de WN 1.5

WordNet & EuroWordNetInterlingual Index of EuroWordNet

slide16
Jerarquía de conceptos independientes de la lengua

distinciones semánticas: objeto, lugar, dinámico, …

abstracta (no léxica)

Superpuesta al ILI

Tres tipos de entidades:

Primer orden: entidades concretas

Segundo orden: situaciones estáticas o dinámicas

Tercer orden: proposiciones abstractas

WordNet & EuroWordNetTop Concept Ontology of EuroWordNet

slide18
Jerarquía de etiquetas de dominio

Reducción de la polisemia

Dominios:

Tráfico:

Tráfico rodado, tráfico aéreo

Información Internacional

Micología

Medicina

WordNet & EuroWordNetDomain Ontology of EuroWordNet

slide19
Riqueza superior a WN

Entre:

synsets (módulos monolingües)

registros ILI (multilingües):

{actuar-1} EQ-SYNONYM {‘behave in a certain manner’}

registros ILI y TCO o OD

WordNet & EuroWordNetRelaciones de EuroWordNet

spanish wordnet building process

Spanish WordNet:Building Process

German Rigau i Claramunt

http://www.lsi.upc.es/~rigau

TALP Research Center

Departament de Llenguatges i Sistemes Informàtics

Universitat Politècnica de Catalunya

spanish wordnet general methodology
1)Mapping to WN1.5

manual work

automatic derivation of equivalents, using bi-lingual dictionaries

2) Manual correction

3) Re-structuring

Spanish WordNetGeneral Methodology
spanish wordnet main steps first core manual translation
Nouns:

A) WN1.5’s Tops File plus first level of hyponyms (about 800 synsets).

B) The rest of EWN’s Common Base Concepts (which were not in our set).

C) Manual translation of synsets intermediate between (A) and (B) following WN1.5 hyerarchy ¾thus building a compact taxonomy equivalent to WN1.5 without gaps¾

Verbs:

Manual translation of EWN’s Base Concepts (about 150 synsets)

Spanish WordNetMain Steps: First Core (Manual Translation)
spanish wordnet main steps subset 1 semi automatic
Nouns:

Applying authomatic methods using bi-lingual dictionaries

Manual validation of several subsets to check if the link is correct

Deriving a Confidence Score (CS) for every authomatic method (heuristic)

Selecting pairs synset-word above 85% CS

Some manual correction of this Subset 1 (mainly, filling gaps)

Verbs:

3600 English verbs connected to WN1.5 senses and ambiguously translated to Spanish are manually inspected and disambiguated

Spanish WordNetMain Steps: Subset 1 (Semi-automatic)
spanish wordnet main steps subset 2
Main goals

enhance the quality of the Subset 1 by manual revision

extend it by manual building of synsets

4 Sub-tasks

Spanish WordNetMain Steps: Subset 2
spanish wordnet main steps subset 229
1) Covering manually those gaps in the hyponymy chains covered by other languages

2) Manual cleaning of some automatically-generated variants.

(a) pairs of synsets which are adjacent in the hyponymy chain and share at least one variant.

deleting redundant variants

re-locating to either pre-existant or newly created synsets

(b) multi-word expressions present in synsets.

Deleting non-lexicalized

Spanish WordNetMain Steps: Subset 2
spanish wordnet main steps subset 230
3) Manual addition of new vocabulary which has been considered relevant.

It mainly comes from the Catalan WordNet: since we are building both wordnets in parallell, we detected those synsets which were built for Catalan and not for Spanish

4) Manual addition of cross-part of speech relations between nominal and verbal synsets.

This work has been based mainly on noun-verb pairs obtained by means of morphological criteria. (Work carried out by UNED –Madrid-)

Spanish WordNetMain Steps: Subset 2
spanish wordnet main steps beyond subset 2
Massive Manual Checking (from Nov’98)

Using WEI

Variants automatically generated

Filling gaps in the hierachy

New vocabulary

New Adjectives

Spanish WordNetMain Steps: Beyond Subset 2
combining multiple methods for the automatic construction of multilingual wordnets

Combining Multiple Methods for the Automatic Construction of Multilingual WordNets

German Rigau i Claramunt

http://www.lsi.upc.es/~rigau

TALP Research Center

Departament de Llenguatges i Sistemes Informàtics

Universitat Politècnica de Catalunya

combining multiple methods outline
Ten class methods

Four monosemic criteria

Four polysemic criteria

two hybrid criteria

Three conceptual distance methods

CD1: using pairwise word coocurrences

CD2: using headword and genus

CD3: using bilingual Spanish entries with multiple translations

Combining Multiple Methods ...Outline
combining multiple methods ten class methods42
Four monosemic criteria

SW

EW

EW

SW

EW

Synset

SW

EW

Synset

Combining Multiple Methods ...Ten class methods

SW

EW

Synset

Synset

Synset

SW

EW

Synset

SW

combining multiple methods ten class methods43
Four polysemic criteria

SW

EW

EW

SW

EW

SW

Combining Multiple Methods ...Ten class methods

SW

EW

Synset+

Synset+

Synset+

Synset+

SW

EW

Synset+

SW

EW

Synset+

combining multiple methods ten class methods44
Variant criterion

Field criterion

Combining Multiple Methods ...Ten class methods

<..., EW, ..., EW, ...>

SW

<..., headword-EW, ..., Ind-EW, ...>

SW

combining multiple methods conceptual distance methods
Conceptual Distance (Agirre et al. 94)

length of the shortest path

specificity of the concepts

Combining Multiple Methods ...Conceptual Distance methods
  • using WordNet
  • Bilingual dictionary
combining multiple methods conceptual distance methods47
Three conceptual distance methods

CD1: using pairwise word coocurrences

CD2: using headword and genus

CD3: using bilingual Spanish entries with multiple translations

Combining Multiple Methods ...Conceptual Distance methods
combining multiple methods conceptual distance methods example cd2

<structure, construction>

<building, edifice>

<place of worship, ...>

<church, church building>

<abbey>

<monastery>

<convent>

<abbey>

<abbey>

Combining Multiple Methods ...Conceptual Distance methods (Example CD2)

<entity>

<object, ...>

<artifact, artefact>

<house, lodging>

<religious residence, cloiser>

abadía_1_2 Iglesia o monasterio regido por un abad o abadesa

(abbey, a church or a monastery ruled by an abbot or an abbess)

combining multiple methods conceptual distance methods example cd249

<monastery>

<convent>

<abbey>

<abbey>

Combining Multiple Methods ...Conceptual Distance methods (Example CD2)

<entity>

<object, ...>

<artifact, artefact>

<structure, construction>

<house, lodging>

<building, edifice>

<place of worship, ...>

<religious residence, cloiser>

<church, church building>

<abbey> 06 ARTIFACT

abadía_1_2 Iglesia o monasterio regido por un abad o abadesa

(abbey, a church or a monastery ruled by an abbot or an abbess)

mapping conceptual hierarchies using relaxation labelling

Mapping Conceptual Hierarchies Using Relaxation Labelling

German Rigau i Claramunt

TALP Research Center

UPC

mapping conceptual hierarchies using relaxation labelling outline
Setting

Relaxation Labelling Algorithm

Constraints

Experiments & Results I (multilingual)

Experiments & Results II (monolingual)

Further work

Mapping Conceptual Hierarchies using Relaxation LabellingOutline
mapping conceptual hierarchies using relaxation labelling setting57
Connecting already existing Hierarchies

Relaxattion labelling Algorithn

Constraints

Between

Spanish taxonomy automatically derived from an MRD (Rigau et al. 98)

WordNet

using a bilingual MRD

Mapping Conceptual Hierarchies using Relaxation LabellingSetting
mapping conceptual hierarchies using relaxation labelling setting58
Mapping Conceptual Hierarchies using Relaxation LabellingSetting

animal

(Tops <animal, animate_being, ...>)

(person <beast, brute, ...>)

(person <dunce, blockhead, ...>)

ave

(animal <bird>)

(artifact <bird, shuttle, ...>)

(food <fowl, poultry, ...>)

(person <dame, doll, ...>)

faisán

(animal <pheasant>)

(food <pheasant>)

rapaz

(animal <bird>)

(artifact <bird, shuttle, ...>)

(food <fowl, poultry, ...>)

(person <dame, doll, ...>)

mapping conceptual hierarchies using relaxation labelling outline59
Setting

Relaxation Labelling Algorithm

Constraints

Experiments & Results I (multilingual)

Experiments & Results II (monolingual)

Further work

Mapping Conceptual Hierarchies using Relaxation LabellingOutline
mapping conceptual hierarchies using relaxation labelling relaxation labelling algorithm
Iterative algorithm for function optimization based on local information

it can deal with any kind of constraints

variables (senses of the taxonomy)

labels (synsets)

Finds a weight assignment for each possible label for each variable

weights for the labels of the same variable add up to one

weigth assignation satisfies -to the maximum possible extent- the set of constraints

Mapping Conceptual Hierarchies using Relaxation LabellingRelaxation Labelling Algorithm
mapping conceptual hierarchies using relaxation labelling relaxation labelling algorithm61
1) Start with a random weight assigment

2) Compute the support value for each label of each variable (according to the constraints)

3) Increase the weights of the labels more compatible with context and decrease those and decrease those of the less compatible labels.

4) If a stopping/convergence is satisfied, stop,

otherwiese go to step 2.

Mapping Conceptual Hierarchies using Relaxation LabellingRelaxation Labelling Algorithm
mapping conceptual hierarchies using relaxation labelling outline62
Setting

Relaxation Labelling Algorithm

Constraints

Experiments & Results I (multilingual)

Experiments & Results II (monolingual)

Further work

Mapping Conceptual Hierarchies using Relaxation LabellingOutline
mapping conceptual hierarchies using relaxation labelling constraints
Rely on the taxonomy structure

Coded with three characters

X: Spanish Taxonomy, I (immediate),

Y: English Taxonomy, A (ancestor)

X: Relation, E (hypernym), O (hyponym), B (both)

Examples:

Mapping Conceptual Hierarchies using Relaxation LabellingConstraints

IIE

AAB

+

+

+

+

mapping conceptual hierarchies using relaxation labelling hierarchical constraints
II Constraints

IIE

IIO

IIB

Mapping Conceptual Hierarchies using Relaxation LabellingHierarchical Constraints

NAACL’2001

mapping conceptual hierarchies using relaxation labelling hierarchical constraints65
AI ConstraintsMapping Conceptual Hierarchies using Relaxation LabellingHierarchical Constraints

+

+

+

+

AIE

AIO

AIB

NAACL’2001

mapping conceptual hierarchies using relaxation labelling hierarchical constraints66
IA ConstraintsMapping Conceptual Hierarchies using Relaxation LabellingHierarchical Constraints

+

+

+

+

IAE

IAO

IAB

NAACL’2001

mapping conceptual hierarchies using relaxation labelling hierarchical constraints67
AA ConstraintsMapping Conceptual Hierarchies using Relaxation LabellingHierarchical Constraints

+

+

+

+

+

+

+

+

AAE

AAO

AAB

NAACL’2001

mapping conceptual hierarchies using relaxation labelling outline68
Setting

Relaxation Labelling Algorithm

Constraints

Experiments & Results I (multilingual)

Experiments & Results II (monolingual)

Further work

Mapping Conceptual Hierarchies using Relaxation LabellingOutline
combining multiple methods ranlp 97 eight class methods
Four monosemic criteria

SW

EW

EW

SW

EW

Synset 85% 4%

SW

EW

Synset

Combining Multiple Methods ...RANLP’97Eight class methods

Cov.

Prec.

SW

EW

Synset 92% 5%

Synset 89% 1%

Synset

SW

EW

Synset 89% 2%

SW

combining multiple methods ranlp 97 eight class methods70
Four polysemic criteria

SW

EW

EW

SW

EW

SW

Combining Multiple Methods ...RANLP’97Eight class methods

Prec.

Cov.

SW

EW

Synset+ 80% 8%

Synset+ 75% 2%

Synset+

Synset+ 58% 17%

SW

EW

Synset+ 61% 60%

SW

EW

Synset+

combining multiple methods ranlp 97 experiments results
Poly TOK, FOK TOK, FNOK total

animal 279 (90%) 30 (91%) 209 (90%)

food 166 (94%) 3 (100%) 169 (94%)

cognition 198 (67%) 27 (90%) 225 (69%)

communication 533 (77%) 40 (97%) 573 (78%)

all TOK, FOK TOK, FNOK total

animal 424 (93%) 62 (95%) 486 (90%)

food 166 (94%) 83 (100%) 249 (96%)

cognition 200 (67%) 245 (90%) 445 (82%)

communication 536 (77%) 234 (97%) 760 (81%)

Combining Multiple Methods ...RANLP’97 Experiments & Results
combining multiple methods ranlp 97 experiments results72
Combining Multiple Methods ...RANLP’97 Experiments & Results

piel

(substance <skin, fur, peel>)

marta

(substance <sable, marte, coal_back>)

visón

(substance <mink, mink_coat>)

mapping conceptual hierarchies using relaxation labelling outline73
Setting

Relaxation Labelling Algorithm

Constraints

Experiments & Results I (multilingual)

Experiments & Results II (monolingual)

Further work

Mapping Conceptual Hierarchies using Relaxation LabellingOutline
slide74
All Relationships

also-see, similar-to, attribute, antonym, etc.

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01Generalized Constraints

R

R

slide75
Non-structural constraints

W: number of word coincidences

G: word coincidences in glosses

F: number of frame coincidences (verbs)

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01Generalized Constraints

slide76

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01POS mapping depencences

Nouns

Adjectives

Adverbs

Verbs

slide77
Structural constraints

hyper/hyponymy

antonymy

also-see

Non-structural constraints

W, G and F

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01Constraints for Verbs

slide78
Structural constraints

Adj-to-Adj

antonymy, similar-to and also-see

Adj-to-Verb

participle-of

Adj-to-Noun

pertains and attribute

Non-structural constraints

W and G

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01ConstraintsAdjectives

slide79
Structural constraints

Adv-to-Adv

antonymy

Adv-to-Adj

derived

Non-structural constraints

W and G

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01ConstraintsAdverbs

slide80

A Complete... ACL’00, NAACL’01Example extra-POS

WN1.6

00843344a

evangelical evangelistic

WN1.5

Similar to

02025107a

evangelical evangelistic

00842521a

enthusiastic

pertainym

02025107a

evangelical

04237485n

Gospel Gospels evangel

pertainym

04853575n

Gospel Gospels evangel

slide81

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01Example extra-POS

WN1.5

WN1.6

00057615r

impossibly absurdly

00294844r

impossibly

derived from

derived from

antonym

01393725a

impossible

01752468a

impossible

00294658a

possibly

slide82

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 Results

  • Basic constraint set: structural constraints
    • Nouns: AA hyper/hyponym
    • Verbs: AA hyper/hyponym, II also-see
    • Adjectives: II antonymy, similar-to, also-see
    • Adverbs: II antonymy
slide83

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 Results

Coverage

Ambigous

Overall

N

99.7%

94.9% - 99.6%

97.6% - 99.8%

V

96.9%

93.5% - 99.2%

94.6% - 99.2%

A

94.1%

82.8% - 98.9%

89.5% - 99.4%

R

80.8%

97.5% - 100%

99.0% - 100%

  • Basic constraint set: structural constraints

Precision - recall

slide84

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 Results

Coverage

Ambigous

Overall

N

99.9%

97.5% - 97.7 %

98.8% - 98.9%

V

99.8%

99.4% - 99.7%

99.3% - 99.6%

A

98.9%

96.5% - 98.8%

97.9% - 99.3%

R

99.5%

97.5% - 100%

99.0% - 100%

  • Basic constraint set + W, G and F for verbs

Precision - recall

slide85

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01Results

Coverage

Ambigous

Overall

N

-

-

-

V

-

-

-

A

95.8%

95.8% - 98.9%

90.9% - 99.4%

R

88.0%

69.2% - 94.2%

97.9% - 98.1%

  • Basic + extra-POS relationships

Precision - recall

slide86

A Complete WN1.5 to WN1.6 Mapping ... ACL’00, NAACL’01 Results

Coverage

Ambigous

Overall

N

99.9%

97.5% - 97.7 %

98.8% - 98.9%

V

99.8%

99.4% - 99.7%

99.3% - 99.6%

A

99.0%

96.5% - 99.1%

97.9% - 99.5%

R

99.6%

98.3% - 100%

99.3% - 100%

  • Basic + extra-POS relationships + WGF

Precision - recall

mapping conceptual hierarchies using relaxation labelling conclusions
First complete mapping between Wordnet versions

Combining structural and non-structural information

Robust approach based on local information, but with global effects

Incremental POS approach

http://www.lsi.upc.es/~nlp

90 downloads (since November 2000)

Mapping Conceptual Hierarchies using Relaxation LabellingConclusions
mapping conceptual hierarchies using relaxation labelling further work
mapping other structures

WN-EDR, WN-LDOCE, etc.

Other language taxonomies to EuroWordNet

SpanishEWN to WN1.6

symmetrical philosophy rather than source-target

Mapping Conceptual Hierarchies using Relaxation LabellingFurther Work
slide89

Mikrokosmos

German Rigau i Claramunt

http://www.lsi.upc.es/~rigau

TALP Research Center

Departament de Llenguatges i Sistemes Informàtics

Universitat Politècnica de Catalunya

mikrokosmos outline
Introduction

Representational Issues

The Lexicon

The Ontology

Acquisition Process

Lexicon Acquisition

Guidelines

Ontology/Lexicon Trade-off

Semantics in Action

MikrokosmosOutline
mikrokosmos introduction
Knowledge Base Machine Translation (KBMT)

CRL, NMSU

5,000 concepts

Events

Objects

Properties

7,000 Spanish word senses

40,000 word senses

after expansion with productive Lexical Rules

comprar -> comprador, comprable, ...

Text Meaning Representation

MikrokosmosIntroduction
mikrokosmos representational issues the lexicon
Typed Feature Structures (Pollard and Sag 87)

language-dependant

10 zones

phonology

orthography

morphology

Syntactic (subcategorization)

Semantic (Lexical Semantic Representation)

syntax-semantic linking

stylistics

paradigmatic

syntacmatic

MikrokosmosRepresentational Issues: The Lexicon
mikrokosmos representational issues the lexicon93
Adquirir-V1

syn: subj: cat: NP

obj: cat: NP

sem: acquire

agent: HUMAN

theme: OBJECT

Adquirir-V2

syn: subj: cat: NP

obj: cat: NP

sem: acquire

agent: HUMAN

theme: INFORMATION

MikrokosmosRepresentational Issues: The Lexicon
mikrokosmos representational issues the ontology
Taxonomic multi-hierarchical

14 local or inherited links in average

language-impartial

EVENTS, OBJECTS, PROPERTIES

Methodology & Guidelines

MikrokosmosRepresentational Issues: The Ontology
mikrokosmos representational issues the ontology95
ACQUIRE

DEFINITION “The transfer of possession event where the

agent transfers an object to its possession”

IS - A TRANSFER-POSSESSION

SOURCE HUMAN PLACE

THEME OBJECT (NOT HUMAN)

AGENT ANIMAL (DEFAULT HUMAN)

DESTINATION ANIMAL PLACE (DEFAULT HUMAN)

INHERITED

BENEFICIARY HUMAN

MikrokosmosRepresentational Issues: The Ontology
mikrokosmos acquisition process the lexicon
Multi-lingual

French, English, Japanese, Russian, Spanish, etc.

Multi-media

Multi-process

Analysis

Generation (mono and multilingual)

MT

Summarization

IE

Speech Processing

Tools

corpus-search, lookup dictionary, ontology browser

MikrokosmosAcquisition Process: The Lexicon
mikrokosmos acquisition process the ontology
Guidelines

1) Do not add instances as concepts

Instances do not have their own instances

Concepts do not have fixed position in space/time

2) Do not decompose concepts further

3) Use close concepts

4) Do not add EVENTs with particular arguments

5) Do not add concepts with instance-specific aspects,

temporal relations

6) Do not add language-specific concepts

7) Do not add ontologycal concepts for collections

MikrokosmosAcquisition Process: The Ontology
mikrokosmos acquisition process ontology lexicon trade off
Daily negociations

lexicon acquirers

ontology acquirers

Possibilities

one-to-one mapping

lexicon unspecification

lexicon ontology balance

MikrokosmosAcquisition Process: Ontology/Lexicon Trade-off
mikrokosmos acquisition process ontology lexicon trade off99
one-to-one mapping

Problems

Lexical: every word in a language is a concept

conceptual: cuire in french is not ambiguous

MikrokosmosAcquisition Process: Ontology/Lexicon Trade-off

PREPARE-FOOD

INST: COOKING-EQUIPMENT

COOK

INST: STOVE

BAKE

INST: OVEN

cook : cuire sur le feu

bake : cuire ou four

mikrokosmos acquisition process ontology lexicon trade off100
Lexicon Unspecification

Problems

BAKE is not in the ontology

MikrokosmosAcquisition Process: Ontology/Lexicon Trade-off

PREPARE-FOOD

INST: COOKING-EQUIPMENT

bake : cuire ou four

INST: OVEN

cook : cuire sur le feu

mikrokosmos acquisition process ontology lexicon trade off101
Lexicon-Ontology BalanceMikrokosmosAcquisition Process: Ontology/Lexicon Trade-off

PREPARE-FOOD

INST: COOKING-EQUIPMENT

BAKE

INST: OVEN

FRY

INST: STOVE

INST: FRYING-PAN

cook : cuire

bake

mikrokosmos semantics in action
El grupo Roche, a través de su compañía en España, adquirió Doctor Andreu.

El grupo Roche adquirió Doctor Andreu a través de su compañía en España.

La adquisición de Doctor Andreu por el grupo Roche fue hecha a través de su compañía en España.

ACQUIRE-1 Agent: ORGANIZATION-1

Theme: ORGANIZATION-2

Instrument: ORGANIZATION-3

ORGANIZATION-1 Object-Name: Grupo Roche

ORGANIZATION-2 Object-Name: Doctor Andreu

ORGANIZATION-3 Location: España

MikrokosmosSemantics in Action
mikrokosmos semantics in action103
Onto-Search: Ontological search mechanism to check constraints

check-onto(ACQUIRE, EVENT) = 1

since ACQUIRE is a type of EVENT

check-onto(ORGANIZATION, HUMAN) = 0.9

since ORGANIZATION HAS-MEMBER HUMAN

MikrokosmosSemantics in Action
mikrokosmos semantics in action104
1) a-través-deINSTRUMENT, LOCATION

adquirir require PHYSICAL-OBJECT

2) enLOCATION, TEMPORAL

España is not a TEMPORAL-OBJECT

3) adquirirACQUIRE, LEARN

Doctor Andreu is not an INFORMATION

4) Doctor AndreuORGANIZATION, HUMAN

the Theme of ACQUIRE is not HUMAN

5) compañíaCORPORATION, SOCIAL-EVENT

ORGANIZATIONs typically fill the INSTRUMENT slot of ACQUIRE acts

MikrokosmosSemantics in Action
mikrokosmos experiment wsd
Text 1 2 3 4 Mean

words 347 385 370 353 364

words/sentence 16.5 24.0 26.4 20.8 21.4

open-class words 183 167 177 177 176

ambiguous words 57 42 57 35 48

syntax 21 19 20 12 18

correct 51 41 45 34 43

% 97 99 93 99 97

MikrokosmosExperiment: WSD
mikrokosmos experiment wsd106
Text Mean Mean Unseen

words 364 390

words/sentence 21.4 26

open-class words 176 104

ambiguous words 48 26

syntax 18 9

correct 43 23

% 97 97

MikrokosmosExperiment: WSD
slide107

WordNet2

German Rigau i Claramunt

http://www.lsi.upc.es/~rigau

TALP Research Center

Departament de Llenguatges i Sistemes Informàtics

Universitat Politècnica de Catalunya

wordnet2 outline
Introduction

Text Inferences

Defining Features

Plausible inferences

Inference Rules

Semantic Paths

What WordNet cannot do

WordNet2Outline
wordnet2 introduction
(Harabagiu 98)

Commonse reasoning requires extensive knowledge

~ 100 millions of concepts and relations

WordNet

represents almost all English words

100.000 synsets

linked by semantic relations

WordNet2

each synset has a gloss that, when disambiguated may increase the number of relations

WordNet glosses into semantic networks

NEW RELATIONS

WordNet2Introduction
wordnet2 text inferences
German was hungry

He opened the refrigerator

hungry (feeling a need or desire to eat)

eat (take in solid food)

refrigerator (an appliance in which foods can be stored at low temperature)

WordNet2Text Inferences
wordnet2 defining features
Transform each concept’s gloss into a graph where concepts are nodes and lexical relations are links

<culture> (all the knowledge shared by society)

<share> --AGENT--> <society>

<doctor> (licensed medical practitioner)

<medical practitioner> --ATRIBUTTE--> <licensed>

WordNet2Defining Features
wordnet2 defining features112
WordNet2Defining Features

ship

OBJECT

guide

PURPOSE

LOCATION

pilot

person

water

GLOSS

ATTRIBUTE

ATTRIBUTE

difficult

qualified

wordnet2 inference rules
Rule 1 Rule 2

VC1 IS-A VC2 VC1 IS-A VC2

VC2 IS-A VC3 VC2 ENTAIL VC3

------------------------- -------------------------

VC1 IS-A VC3 VC1 ENTAIL VC3

Rule 3 Rule 2

VC1 IS-A VC2 VC1 IS-A VC2

VC2 R_IS-A VC3 VC2 R_ENTAIL VC3

------------------------- -------------------------

VC1 PLAUSIBLE (not VC3) VC1 EXPLAINS VC3

16 + 1 regles

WordNet2Inference Rules
wordnet2 semantic paths
0) Create and load the KB

1) Place markers on KB concepts

2) Propagate markers

The algorithm avoids cycles

3) Detect collisions

To each marker collision it corresponds a path

4) Extract Inferences

WordNet2Semantic Paths
wordnet2 semantic paths115
Inference sequence

German was hungry

German felt a desire to eat

German felt a desire to take in food

COLLISION: German=he felt a desire to take food, stored in an appliance, which he opened

He opened an appliance where food is stored

He opened the refrigerator

WordNet2Semantic Paths
wordnet2 what wordnet cannot do
Major WordNet limitations:

1) The lack of compound concepts

2) The small number of causation and entailment relations

3) the lack of preconditions for verbs

4) the absence of case relations

WordNet2What WordNet cannot do
slide117

ThoughtTreasure

German Rigau i Claramunt

http://www.lsi.upc.es/~rigau

TALP Research Center

Departament de Llenguatges i Sistemes Informàtics

Universitat Politècnica de Catalunya

thoughttreasure overview
a comprehensive platform for

NLP English, French

commonsense reasoning

A hotel room has a bed, night table, ...

People has fingernails

soda is a drink

one hangs up at the end of a phone call

the sky is blue

dogs bark

someone who is 16 years old is a teenager

ThoughtTreasureOverview
thoughttreasure overview119
25,000 concepts organized into a hierarchy

EVIAN -> FLAT-WATER -> DRINKING-WATER

55,000 words (English, French)

food <-> aliment <-> FOOD

50,000 asertions about concepts

green-pea is green

100 scripts

ThoughtTreasureOverview
thoughttreasure overview120
Text Agents for recognizing names, phones, etc

mechanisms for learning new words

X-phile is someone who likes X

a syntactic parser

a NL generator

a semantic parser

an anaphoric parser

planning agents for achieving goals

understanding agents

ThoughtTreasureOverview
thoughttreasure example
Who created Bugs Bunny?

1.0 (create human-interrogative-pronoun Bugs-Bunny)

0.9 (create rock-group-the-Who Bugs-Bunny)

1.0 (create Tex-Avery Bugs-Bunny)

0.1 (not (create rock-group-the-Who Bugs-Bunny))

ThoughtTreasureExample
slide122

Meaning

German Rigau i Claramunt

http://www.lsi.upc.es/~rigau

TALP Research Center

Departament de Llenguatges i Sistemes Informàtics

Universitat Politècnica de Catalunya

meaning overview
Bases de Conocimiento

Enriquecimiento automático de EWN (modelos verbales, etc.)

Aproximación mixta (KB + ML)

Q/A

Problema

ambigüedad estructural y léxica

Aproximación

localizar automáticamente ejemplos de sentidos(Leacock et al. 98, Mihalcea y Moldovan 99)

WSD a gran escala (Boosting, SVM, transductivos …)

Acquisición Conocimiento (Ribas 95, McCarthy 01)

MeaningOverview
meaning exploiting ewn semantic relations

<evento>

<agrupación grupo colectivo>

<evento social>

<grupo_social>

<competición, concurso>

<organización>

<partido_1>

<partido_2, partido_político>

<semifinal>

<cuartos_de_final>

<partido_laborista>

MeaningExploiting EWN Semantic Relations
meaning exploiting ewn semantic relations125
MeaningExploiting EWN Semantic Relations

partido 1

Todos los partidos piden reformas legales para TV3.

La derecha planea agruparse en un partido.

El diputado reiteró que ni él ni UDC, “como partido”, han recibido dinero de Pellerols.

partido 2

Pero España puso al partido intensidad, ritmo y coraje.

El seleccionador cree que el partido de hoy contra Italia dará la medida de España

El Racing no gana en su campo desde hace seis partidos.

meaning exploiting ewn semantic relations126
MeaningExploiting EWN Semantic Relations

partido 1

No negociaremos nunca com un partido político que sea partidario de la independencia de Taiwan.

Una vez más es noticia la desviación de fondos destinadoss a la formación ocupacional hacia la financiación de un partido político.

Estas lleyess fueron votadas gracias a un consenso general de los partidos políticos.

partido 2

Rivera pide el suporte de la afición para encarrilar las semifinales.

Sólo el equipo de Valero Ribera puede sentenciar una semifinal como lo hizo ayer en un Palau Blaugrana completamente entregado.

El Racing ganó los cuartos de final en su campo.

meaning arquitecture
MeaningArquitecture

English

Web Corpus

Italian

Web Corpus

WSD

WSD

English

EWN

Italian

EWN

ACQ

UPLOAD

UPLOAD

ACQ

Multilingual

Central Repository

PORT

PORT

PORT

PORT

Spanish

EWN

Basque

EWN

ACQ

ACQ

UPLOAD

UPLOAD

Spanish

Web Corpus

Catalan

EWN

Basque

Web Corpus

WSD

Catalan

Web Corpus

WSD