slide1
Download
Skip this Video
Download Presentation
Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen , October 2001

Loading in 2 Seconds...

play fullscreen
1 / 57

Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen , October 2001 - PowerPoint PPT Presentation


  • 104 Views
  • Uploaded on

Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen , October 2001. What is SIMPLE ?. A set of 12 harmonised computational lexicons for HLT applications, geared for multilingual links. A common rich model representation language

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen , October 2001' - jason


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Lexicons …

and Complex Expressions:

towards Multilingual Linking

Nicoletta Calzolari

Copenhagen, October 2001

Copenhagen, Oct. 2001

what is simple
What is SIMPLE?

A set of 12 harmonised computational lexicons for HLT applications,

geared for multilingual links

A common

    • rich model
    • representation language
    • methodology of building the lexicon

common Template Types, with default obligatory info(Type defining), and indication of optional info

  • First time: on a large scale, for so many languages
    • Lexical meaning represented in terms of integrated combinations of different sorts of information (semantic type, argument structure, relations, features, etc. )
    • Ontology-based information comes together with predicative representation and syntactic linking
  • A shared set of SemUs (from EWN) (about 700) of the 12 Lexicons cross-lingually related

Copenhagen, Oct. 2001

slide3
PAROLE/SIMPLE Architecture + CLIPS Italian National Project

MuS

SynU

SemU

SemU

SemU

SemU

60,000

lemmas

55,000

lemmas

55,000

SemU

MuS

SynU

Sem Info

Sem Info

Sem Info

Sem Info

TEMPLATE

Sem. Rel

Sem. Feat

Lexical Rel

Copenhagen, Oct. 2001

slide4
Semantic information in SIMPLE

Word senses encoded as Semantic Units(SemUs),containing the following info:

  • Semantic type *
  • Domain *
  • Lexicographic gloss *
  • Extdended Qualia structure
  • Reg. Polysemy altern.
  • Event type
  • Derivation relations
  • Synonymy
  • Collocations
  • Argument structure for predicative SemUs *
  • Selection restrictions on the arguments *
  • Link of the arguments to the syntactic subcategorization frames (represented in the PAROLE lexicons) *

Copenhagen, Oct. 2001

slide5
Semantic Multidimensionality and NLP

NLP tasks (IE, WSD, NP Recognition, etc.) need to access multidimensional aspects of word meaning, represented in SIMPLE with the

Extended Qualia Relations

Is_a_part_of

Member_of

la pagina del libro (the page of the book)

il difensore della Juventus (Juventus fullback)

il suonatore di liuto (the lute player)

il tavolo di legno (the wooden table)

Telic

Made_of

Copenhagen, Oct. 2001

overall organization
SemU

Predicate, arguments, Selection restrictions

Qualia

Derivation

Polysemy

Event Type

Overall Organization

...

Greek lexicon

Danish lexicon

Type Ontology

150 types

Catalan lexicon

Template

Instantiation

Italian lexicon

Pred. Layer

Copenhagen, Oct. 2001

slide7
Semantic Information

The SIMPLE Way

The Core Ontology represents a first level of organization of the semantic type system

Each type is associated to a Template consisting of a cluster of information (relations, features, argument structure, event type, etc.) that defines the type

The information characterizing a Semantic Unit includes:

a.The type defining information (associated to the template the SemU instantiates)

b.Additional information (other relations or features, selectional restrictions, terminology, cross-part of speech relations, polysemy, etc.)

Copenhagen, Oct. 2001

template
Type System Coordinates

Predicative Layer

Qualia Structure

Contextual/

Polysemy Information

Template

“redundancy”

Copenhagen, Oct. 2001

slide9
VerbExamples: hear, smell, etc.
      • NounExamples: sight, look, etc.
      • Linguistic Tests: ….
      • Levin Class: 30.1 (See verb, e.g. detect, see, notice), 30.4 (Stimulus subject, e.g.
      • look, smell)
      • Comments: Processes involving an experiencing relation, ….
  • SemU:1 (look)
  • Usyn:
  • BC Number:105
  • Template_Type:[Perception]
  • Template_Supertype: [Psychological_event]
  • Domain:General
  • Semantic Class:Perception
  • Gloss://free// osservare con attenzione
  • Event type:process
  • Pred _Rep.:Lex_Pred (,)
  • Derivation:
  • Selectional Restr.: arg0 =Animate//concept// arg1:default = [Entity]
  • Formal:isa (1,:[Perception]>):[Psych_ev]
  • Agentive:
  • Constitutive:instrument(1, :[Body_part])
  • intentionality={yes,no} //optional//={yes}
  • Telic:
  • Collocates:Collocates (,...)
  • Complex:

Template for “Perception”

Copenhagen, Oct. 2001

semantic relations
Modular Representation of a SemU

SemU

Semantic Relations

Flexibility: an extendable framework to allow

coherent future extensions & tuning for specific applications/text types

Pred. Layer

Predicate, arguments, selection restrictions, ..

Rel. Layer

Relations betw. SemUs

Features

Qualia

multiple meaning dimensions in a sense

Derivation

cross-PoS relations

Polysemy

regular polysemous classes

Collocation

collocational information

Copenhagen, Oct. 2001

slide11
Top

Telic

Formal

Constitutive

Agentive

Is_a

Is_a_part_of

Property

Created_by

Agentive_cause

Indirect_telic

Purpose

...

Contains

...

Instrumental

Is_the_habit_of

Used_for

Used_as

Semantic Relations

..

Activity

..

..

100 Rels.

  • The targets of relations identify:
  • prototypical semantic information associated with a SemU
  • elements of dictionary definitions of SemUs
  • typical corpus collocates of the SemU

Copenhagen, Oct. 2001

slide12
Semantic Relations

Ala (wing)

make

Agentive

SemU: 3232

Type: [Part]

Part of an airplane

fly

Used_for

Is_a_part_of

airplane

Isa

SemU: 3268

Type: [Part]

Part of a building

part

Isa

Used_for

Isa

SemU: D358

Type: [Body_part]

Organ of birds for flying

building

Is_a_part_of

Is_a_part_of

SemU: 3467

Type: [Role]

Role in football

player

bird

Isa

Copenhagen, Oct. 2001

relations and predicates
Relations and Predicates

Pred_SELL, ,

,

SemU

Sell V

Is_the_agent_of

SemU

Seller N

SemU

Sale N

Event_noun

Copenhagen, Oct. 2001

slide14
Argument Structure

Comprendere V

Comprensione N

SemU: 61725

Type: [Cognitive_event]

To understand

SemU: 61726

Type: [Cognitive_event]

Understanding

master

SemU: 6962

Type: [Constitutive_state]

To include

verb_nominalization

Comprendere#1

,

Comprendere#2

,

master

problems

with

selection

restrictions

!!!

Copenhagen, Oct. 2001

slide15
SIMPLE/CLIPS figures (now)

(11,000 Lex. Units) 16,903SemUs

  • Nouns:12161
  • Verbs:3476
  • Adjectives:1266
  • Predicates:4368
  • Templates
  • Instrument 734
  • Human 712
  • PsychologicalProperty 586
  • Profession 541
  • Purpose_Act 535
  • Part 503
  • Human_Group 502
  • Relational_Act 521
  • AgentTemporaryActivity 320
  • Domain 303
  • Features & Relations
  • Agentive 1945
  • EventTypeProcess1846
  • EventTypeTransition1463
  • AgentiveCause 1175
  • Usedfor 1488
  • Synonym 1258
  • ResultingState 1197
  • Isapartof 909
  • Hasaspart 800
  • Istheactivityof 611
  • Objectoftheactivity 598
  • AntonymGrad 575
  • Createdby 525
  • Agentverb 454
  • Concerns 421

Copenhagen, Oct. 2001

core lexicons enlarged in national projects
Core Lexicons enlarged inNational Projects

PAROLE/SIMPLE/EWNstartproviding the common platform

  • For the subsidiarity concept the process started at the EU level is continued at thenational level:

extended in (at least) 9 National Projects

(Danish, Greek, Italian, Portuguese, Swedish, ...)

(to be) used in applications

True Infrastructure of harmonised LRs in EU

Basis for Multilingual LR

ENABLER(coord. A. Zampolli)

Copenhagen, Oct. 2001

slide17
Harmonisation:Need for a Global View
  • Interaction/sharing of data & software/tools
  • Need of compatibility among various components
  • An “exemplary cycle”:

Formalisms

Grammars

Software: Taggers,

Chunkers, Parsers

Representation Annotation

LexiconCorpora

Software:

Acquisition Systems

I/O Interfaces

Languages

Copenhagen, Oct. 2001

simple wrt eagles isle standards for multilingual lexical resources
SIMPLE wrt EAGLES/ISLEStandards for Multilingual Lexical resources

EAGLES guidelines for syntactic and semantic lexicons

PAROLE/SIMPLE

Lexicons

MT systems

ISLE recommendations for multilingual lexicons

Multilingual

Lexicons

Copenhagen, Oct. 2001

mission http lingue ilc pi cnr it eagles96 isle isle home page htm
Mission(http://lingue.ilc.pi.cnr.it/EAGLES96/isle/ISLE_Home_Page.htm)
  • MT and multilingual HLT need to enhance production, maintenance & extension of computational lexical resources
  • ISLE goals
    • provide a common environment for the development, integration, interchange & sharing of lexical resources with various types of linguistic information
    • establish a virtuous circle betw. research, applications, & standardization process: lay down a bridge betw. the worlds of research and application
    • mark the boundary between well-consolidated practice and theoretical achievements in multilingual HLT, and areas still open to research but critical for future technological improvements
  • Crucial role of intercontinental cooperation for preparing ISLE recommendations and for their validation

Copenhagen, Oct. 2001

isle and mt
ISLE and MT
  • Academic and industrial members of the MT community actively involved in the ISLE group
    • Microsoft, NMSU, Sail Labs, Systran, UMIACS, UPenn, ISI, etc.
  • Survey phase:
    • a number of lexical resources for MT systems surveyed by ISLE
  • MT systems requirements provide the main reference points for ISLE work, to determine:
    • types of lexical information critical to SL  TL mapping
    • criteria to create bilingual resources from existing monolingual ones
    • common data structures to develop reusable multilingual resources
    • critical areas of the lexicon:MWEs, complex transfer cases, collocational/example-based information, etc.

MWE

parenthesis

Copenhagen, Oct. 2001

mwe in isle xmellt 2 types of mwe
MWE in ISLE & XMELLT - 2 types of MWE:

1st

  • (Deverbal) nominalisations +support (light) verbs
  • make an acquisition1 (noun.act; verb.possession)
  • complete an acquisition1
  • undertake an acquisition1
  • make an application1 (noun/verb.communication)
  • have an application1 in
  • decide on an application1 (consider, hear)
  • get an application1 (receive, take)
  • submit an application1 (file)
  • Noun(/Adj/Poss)+Noun MW(Ital.: N+PP/N+Adj/N+Vinf/...)
  • air pollution
  • job application
  • murder suspect
  • police action; police scandal
    • coltello damacellaio butcher's knife
    • carta di credito credit card
    • carta telefonica (adj) phone card
    • agenzia di viaggi travel agency
    • film per adulti adult movie (adj)
    • macchina da scriveretypewriter (comp.)

2nd

No

equivalent

structures

Copenhagen, Oct. 2001

the boundaries support verbs more than light verbs nominalisations to a broader set
1stThe Boundaries:·Support Verbs: more than Light Verbs? · Nominalisations: …. to a broader set

Both verbs,combined with an event noun, whose subjects are :

  • participants in the event identified by the noun
  • related to some scenario associated with the event
    • Type 1: take an exam, give an exam
    • Type 2: pass an exam, fail an exam, grade (evaluate) an exam
    • Type 1: perform an operation, undergo an operation
    • Type 2: survive an operation

But also … enlarge the concept of nominalisation to

    • event/result/abstract nouns not morphologically derived
      • dare un ceffone (to slap)
      • provare rancore (to bear sb. a grudge)
      • fare una festa (to have a party)
      • fare festa (to have a holiday)
      • fare festa a qno(to give sb. a warm welcome)
      • prestare attenzione (to pay attention)
      • fare la guerra(to wage war)
  • fare una cessione (cedere) vs.make? a cession (…)
  • avere una cessazione (cessare) delle ostilita vs.have? a cessation of hostilities (…)

No verb

(for diachronic reason)

Copenhagen, Oct. 2001

hypothesis for encoding mel cuk type lexical functions lf
1stHypothesis for encoding:“Mel’cuk type” Lexical Functions (LF)
  • to record semantic contribution and/or aspectual properties conveyed by the V
  • to express argument-sharingbetw 2 arg structures
      • Oper1: perform an operation;made an apology
      • Oper2: undergo an operation; merits discussion;had a visit
      • Func0: silence reign
      • Laborij: take into consideration
      • Incep: start the attack
      • Cont: maintain influence
      • Fin: complete the acquisition
      • Liqu: eradicate the disease
      • Real: keep the promise, approve the application
      • AntiReal: turn down, withdraw the application
      • ….

Copenhagen, Oct. 2001

nominalisations examples from corpus
1stNominalisations: examples from Corpus

accusa

(supp-v: formulare, lanciare, muovere, rivolgere,...(Oper1)

subire[default], beccarsi, attirarsi, rischiare,...(Oper2)

mettere, porre,... sotto a.(Laborij)

rintuzzare, rigettare,smontare, …(Liqu)

Problematic?:

ritorcere, rovesciare… (...)

sostenere,… (...)

ripetere,… (...)

…..

____________________________________________________________

acquisizione

(supp-v: (fare)[default], condurre, curare,effettuare,...(Oper1)

varare,...(Incep)

perfezionare, completare,concludere, …(Fin)

evitare, compromettere, …(Liqu)

sfumare, …(LiquFunc0)

Problematic?:

annuciare, dichiarare,… (say)

decidere, proporre, promuovere, stimolare,… (...)

consentire, permettere, proporre, garantire,… (...)

…..

Automatic

acquisition

Copenhagen, Oct. 2001

support verbs what to list for multilingual lexicons
1stSupport Verbs:whatto listfor multilingual lexicons?
  • Decide if to include/list, for a noun
    • all the verbs usable for a Melcukian LF
      • INCEP: cominciare [default] vs. varare, intraprendere, …
      • INCEP: begin [default] vs. open (an investigation), …
      • OPER1:say a prayer(not make, like with other speech act nouns)
      • OPER1:pay attention
    • only those lexically dedicated to that noun (needed for generation) (not the general & available by default for a LF)
      • begin an exam/operation or finish an exam/operation
  • similar words preferentially select different verbs to express similar meanings (same lexical functions): lexical preference

Copenhagen, Oct. 2001

complex nominals in a multilingual framework
2ndComplex nominalsin a multilingual framework
  • Different syntactic patterns in L1 & L2
    • N+Nh (= head noun) in English is usually Nh+PP in Italian
      • tooth brush spazzolino da denti
    • & the syntactic pattern is not predictable
      • hair/clothes brush spazzola per capelli/abiti
      • nail brush spazzola per le unghie
      • travel agency agenziadi viaggi
      • real estate agency agenzia immobiliare
      • marriage bureau agenzia matrimoniale
  • A MWE in L1 corresponding to a fully compositional phrase
      • cucchiaino da caffè coffee spoon???
  • For MT implies some conceptual (interlingual?) representation
  • but the “encoding” process must find an appropriate MWE if it is called for
      • analogous to blocking/pre-emption:a regular/compositional process is not carried out (dispreferred) because the semantic space occupied by the concept associated with that formation is already claimed by some ready-made expression

Fillmore

Copenhagen, Oct. 2001

broader scope extension to non mwe
2ndBroader scope :extension to non MWE?

If look at devices in grammar that allow to produce new MWEs

a continuum:

N+PP>collocation>multi-word>idiom

      • productivemechanisms in the language
      • but idiosyncratic

information at the borderline betw. grammar & lexicon

Amounts to:

  • describeproductive modification relation of Nin general:
  • in particular those lexically selected/preferred by a N (its semantic paradigm)

MWE are a subset of these

(give good hints to discover most prominent relations??)

  • look at thesemantic structure of Nouns: i.e. at the variety of modifiers they can select by virtue of their meaning

Fillmore

Copenhagen, Oct. 2001

noun compounds complex nominals are pervasive
2ndNoun Compounds/Complex Nominals…are pervasive

Fillmore

Busa

  • There is a motivation in most N+N construction:
    • the context provides it
  • The FrameNet (SIMPLE) way
    • appeal tospecific frame structures (qualia structures) associated with the head noun,
    • determine from corpus attestationswhich frame elements (qualia) can get instantiatedas a modifier word
  • “container”:complex nominals can specify:
    • material (aluminium c., glass c., …)
    • contents (food c., trash c., …)
    • size (3 quart c., …)
    • function (shipping c., storage c., …)
    • ...

Copenhagen, Oct. 2001

noun compounds complex nominals multidimensional semantic approaches
2ndNoun Compounds/Complex Nominals& multidimensional semantic approaches

a. FrameNet

Container Frame: Frame Elements: Material,Contents,Size,Function

  • Material:aluminum container, glass c., metal c., tin c.
  • Contents:food container, beverage c., trash c., water c., milk c., fuel c.
  • Size:3 quart container
  • Function:shipping container, storage c.

b. SIMPLE

Qualia Relationsof "container" used in compounds:

  • Constitutive:made_of [MATERIAL]

aluminum container, glass c., metal c., tin c.

  • Telic:contains [ENTITY]

food container, beverage c., trash c., water c., milk c., fuel c.

  • Constitutive:size [QUANTITY]

3 quart container

  • Telic:is_used_for [EVENT]

shipping container, storage c.

Copenhagen, Oct. 2001

complex nominals lexical constructions in a multilingual context
2ndComplex Nominals/Lexical Constructionsin a multilingual context…

describe vs. list?

  • if a compound noun is clearly lexicalized, it's simply one of the words in L1
  • but if it is an instance of some productive word-formation rule, we should describe it

both describe & list:

  • list explicitly in the lexical entry
    • what isidiomatic/idiosyncratic wrt generation for
      • lexical selection
        • mucca pazzavs. matta
        • prestare attenzionevs. pay attention
      • structural pattern
        • travel agency agenzia di viaggi
        • marriage bureauagenzia matrimoniale (*di matrimonio)
        • real estate agency agenzia immobiliare
  • but also,an apparatusto describehow word semantics of Ns interact when they co-occur (co-selection, co-composition, ...)

Copenhagen, Oct. 2001

in a multilingual context
2ndIn a multilingual context…

...regularities in each language, but they don’t match

  • Both for decoding & encoding, we need both:
    • a linguistic apparatus for interpretation

(e.g. to go to a language where it is not a MWE:

cucchiaino da caffèfor a Japanese useful to know … “used for”)

    • lists for idioms…, for unpredictable/idiosyncratic
  • Same apparatus to interpret both MWE & regular N constructions(similar power of expressiveness): general principles of semantic constitution of lex. items & their combinatorics in terms e.g. of frames/qualia/…:
    • basic sem. notions &
    • a general schema to characterise the problem, e.g.
      • frame (qualia) structure of the headN
      • semantic Type of the modifier N
      • allow the headN to impose its interpretation on the modification rel.
      • ...

Copenhagen, Oct. 2001

complex nominals e g knife coltello triggers
2ndComplex nominals, e.g.knife (coltello) triggers
  • a “cutting frame” (FrameNet)
  • specific SIMPLE dimensions of meaning
      • extensively evaluate whetherqualia roles(already) encoded in SIMPLE correspond to what is necessary to interpret N-N modification relations

SIMPLE Extended Qualia structure

for the interpretation of the semantic relation betw. Ns

(internal relational structure of MWE)

  • butcher’s knife (coltello da macellaio) TELIC (used_by) Y [Human] PPda
  • plastic knife (coltello di plastica) CONST(made_of) X [Material]PPdi
  • table knife (coltello da tavola)TELIC (used_in) Z [Location]PPda
  • hunting knife (coltello da caccia) TELIC (used_in_activity) E [Activity] Ppda
      • piatto di legnoCONST (made_of) X [Material]PPdi
      • piatto di pasta CONST(contains) X [Food]PPdi

PP

disambig.

Copenhagen, Oct. 2001

in simple possible extension
2ndIn SIMPLE: possible extension
  • Deverbal nominalisation:
  • noun murder (uccisione, delitto, omicidio(different sem. pref.)

PPdiPRED:MURDER(uccidere)

PPda_parte_di, diARG1:agent[Hum/Anim?]

  • verbmurder (uccidere)ARG2:patient[Hum/Anim?]

subj:NPMOD1:instr[Weapon]

obj:NPMOD2:means[Action]

MOD3:...[...]

:instr: PPcon [Weapon] (knife m., con coltello)

:means: PPper [Action] (strangulation m., per strangolamento)

:loc: Ppploc|di [Location] (Kent State murders, nel ...)

:time: Ppptime|di [Time] (1983 murders, del 1983)

As if it were a Situation

Copenhagen, Oct. 2001

slide34
… Monolingual Linguistic Representation

Strategy:

  • consider as the starting point for MILE the edited union of the basic notions represented in the existing syntactic/semantic lexicons (their models)
  • evaluate their notions wrtEAGLES recommendations for syntax and semantics
  • evaluate their usefulness & adequacy for multilingual tasks
  • evaluate integrability of their notions in a unitary MILE
  • look for deficient areas, e.g. MWE
  • ...

To be decided: should ISLE reach a consensus at the level of the “types” of information only, or also at the level of their “token” values? …. different answers for diff. notions

Copenhagen, Oct. 2001

the multilingual isle lexical entry mile
… the Multilingual ISLE Lexical Entry(MILE)
  • General methodological principles (from EAGLES):
  • Basic requirements for theMILE:
    • Discover and list the (maximal) set ofbasic notionsneeded to describe the MILE (up to which level standardisation is feasible?)
    • Granularity
    • The leading principle for the design of the MILE: theedited unionof existing lexicons/models (redundancyisnot a problem)
    • Modular and layered: various degrees of specification possible
    • Allow for underspecification (& hierarchical structure)

Copenhagen, Oct. 2001

the mile
The MILE
  • Main features
    • factor out primitive units of lexical information
    • explicit representation of information to be targeted by multilingual NLP tools
    • rely on lexical analyses with the highest degree of inter-theoretical agreement
    • avoid framework-specific representational solutions
    • open to different paradigms of multilinguality
    • oriented to the creation of large-scale lexical databases

Copenhagen, Oct. 2001

slide37
MILE
  • Objective: definition of the MILE
  • as a meta-entryto act as acommon format for resource sharing and integration/architecture for lexical data encoding

 its basic notions

 general architecture

        • formalizedas an entity-rel.

model (XML, RDF, etc.)

        • with a tool to support it

open to task- & system-dependent parameterisation

Copenhagen, Oct. 2001

agreed principles
Agreed Principles
  • MILE builds on the monolingual entry & expands it
  • MILEincorporates previous EAGLES recommendations
  • is the “complete” entry
        • adopt as starting point the PAROLE/SIMPLE DTD
        • to be revised, augmented, ...

We consider 2 broad categories ofapplications :

        • MT
        • CLIR(linking module may be simpler/ontology based)
          • (label info types wrt application)

Copenhagen, Oct. 2001

modularity in mile
Modularity in MILE
  • Advantages:
    • Flexibility of representation
    • Easy to customise andupdate
    • Easy integration of existing resources
    • High versatility towards different applications

Modularity at least under three respects:

    • in themacrostructureandgeneral architectureof the MILE
    • in themicrostructureof the MILE
      • monolingual linguistic representation(previous EAGLES revised/updated)
      • collocational/corpus-driven information(new)
      • multilingual apparatus (e.g. transfer conditions and actions; interlingua)(new)
    • in the specific microstructure of theMILE word-sense

Copenhagen, Oct. 2001

slide40
Meta-information

Architecture

1. Coarse-grained

2. Fine-grained

1. Monolingual

2. Collocational

3. Multilingual

Modularity in MILE

A. MILE Macrostructure

C. Word-Sense Microstructure

MILE

B. MILE Microstructure

Copenhagen, Oct. 2001

the mile architecture monolingual lexical description
The MILE ArchitectureMonolingual Lexical Description
  • three independent and yet linked layers characterising the MILE in a source language
  • possibly corresponds to the typology of information contained in major existing lexicons, such as PAROLE-SIMPLE, (Euro)WordNet, COMLEX, FrameNet, etc.
  • simple and complex lexical unit (to account for MWEs)
  • various degrees of granularity of lexical units representation

semantic layer

correspondence

conditions

syntactic layer

morphological layer

Copenhagen, Oct. 2001

the mile architecture multilingual layer
The MILE ArchitectureMultilingual Layer
  • acts as an (independent) interface layer between monolingual lexicons

multilingual layer

semantic layer

correspondence

conditions

syntactic layer

Lexicon 1

Lexicon 2

morphological layer

Copenhagen, Oct. 2001

the mile multilingual layer new
The MILE Multilingual Layer….(NEW)
  • Correspondences can be established between different types of linguistic objects (strings, syntactic descriptions, semantic elements, predicates, etc.)
  • Transfer tests and actionsto target various types of lexical information in the monolingual layers
    • constrain syntactic positions and their fillers
    • lexicalize syntactic positions
    • add positions or arguments
    • add new features to define more fine-grained sense distinctions relevant at the multilingual level
    • restructuring argument configurations
    • collocational information
    • ...

Copenhagen, Oct. 2001

paths to discover the basic notions of mile
a list of critical information types that will compose each module of the MILEPaths to Discover theBasic Notions of MILE
  • clues in dictionaries to decide on target equivalent
  • guidelines for lexicographers
  • clues (to disambiguate/translate) in corpus concordances
  • lexical requirements from various types of transfer conditions and actions in MT systems
  • lexical requirements from interlingua-based systems

Copenhagen, Oct. 2001

slide45
Organisational Proposal:

division of labour

  • Highlighted somehot issues& assignedtasks:
    • sense indicators (EU)
    • selection preferences (EU)
    • lexicographic relevance (EU)
    • argument structure (US)
    • MWE (EU & US)
    • collocations & parallel corpora (US)
    • modifiers (EU)
    • semantic relations (EU)
    • transfer conditions (EU & US)
    • collocational patterns (US)
    • ontology (US)
    • metaphors (EU)
    • interlingua requirements (US)
    • spoken lexicon (EU)
    • meta-representation (US & EU)
    • ...

Copenhagen, Oct. 2001

organisational proposal the tasks will lead to
Organisational ProposalThe tasks will lead to:
  • an in-depth analysis of eacharea aiming at identifying:
    • the most stable solutions adopted in the community
    • linguistic specifications and criteria
    • possible representational solutions, their compatibility, etc.
    • evaluation of their respective weight/importance in a multilingual lexicon (towards a layered approach to recommendations)
    • open issues and current boundaries of the state-of-the-art (which cannot be standardised yet)
    • model limitations through creation of a sample dictionary
  • see how the various pieces fit together & can be merged in a unified proposal
  • evaluate if we can combine in a “hybrid super-model” the transfer & interlingua approaches

Copenhagen, Oct. 2001

slide47
Information Types:

examples

Selectional preferences

  • How to represent them (e.g. features, reference to an ontology, word-senses, etc.)
  • Different status of the preferences
  • Criteria to identify them
  • Expressive limits of existing formal resources

Ontology

  • Architectural issues (types of ontologies: e.g. taxonomies, “Qualia”-based type systems, etc.)
  • Inheritance
  • Which roles for ontologies in the MILE
  • Representational issues
  • Customisation and development criteria

Transfer conditions and actions

  • Identification of categories of transfer phenomena
  • Ranking of hard cases
  • Possible parameterisation wrt language types
  • How to formalise them
  • Types of actions

Copenhagen, Oct. 2001

clwg ongoing activities
CLWG Ongoing Activities

… to prepare a preliminary proposal of the MILE:

  • existing models for lexical representation and data interchange (Genelex, Olif, etc.) are explored
  • model limitations and expressive power are tested through creation of sample entries in a few languages
  • groups at work
    • lexical description and information: types of relevant info
    • lexicographic exploration: systematic summary & classification of types of transfer tests (also extracted from MRDs)
    • multilingual correspondences
    • lexical data modeling: format & representation issues
    • tool development

Copenhagen, Oct. 2001

representation issues
Representation issues
  • Working with GENELEX,lexicon development work is (can be) affected by:
    • impossibility (or difficulty) of defining abstract and general classes or types of objects
    • lack of inheritance mechanisms
    • lack of default expression and default rewriting mechanisms

Cf. Lexical templates in SIMPLE:

      • not included in the GENELEX data-structure
      • implemented in the editing sw. tool
      • very useful to capture relevant lexical generalizations, enhance consistency in encoding, speed-up lexicographers’ work, etc.

Copenhagen, Oct. 2001

slide50
CLWG Ongoing Activity

MILE Lexical Objects

Formal Specifications

MILE Lexical Entry

Formal Specifications

MILE

Shared Lexical

Objects

User Defined

Lexical Objects

Monolingual & Multilingual

Lexicons

Copenhagen, Oct. 2001

slide51
MILE Repository of Shared Lexical Objects:
  • Basic syntactic constructions (e.g. transitive, etc.)
  • (Micro-)semantic objects (e.g. features, relations)
  • (Macro-)semantic objects (e.g. lexical templates)
  • Multilingual constructions (e.g. basic transfer conditions and actions)

MILE

Shared Lexical

Objects

Simplify using MILE

  • - New Lexical objects defined by the User according to the common MILE formal data-structure specification.
  • Sub-types of the Shared MILE Objects
  • Possibly enriched with metadata defining their “semantics” and “usage”

User-Defined

Lexical Objects

Monolingual

&

Multilingual

Lexicons

- Lexical entries obtained by referring to various lexical objects (both Shared and User-defined)

- The MILE lexical entry model specifies how lexical objects can be combined to achieve the proper lexical representation

Copenhagen, Oct. 2001

slide52
Involvement

of Asian Languages

  • participation in last meetings
  • some input fromAsia
  • formal cooperation EU-ASIA: steps to put in motion

Copenhagen, Oct. 2001

impact synergies
Impact & synergies
  • real impact… to be evaluated later

through the use in applications

  • already its being a US/EU project &
  • the Asian interest
  • synergies now, e.g.:
  • PAROLE/SIMPLE(also instantiated in 9 national projects): main input
  • EuroWordNet:provides input
  • XMELLT (NSF): provides input
  • OLIF: expects (& provides) input
  • SALT: complementary
  • ENABLER:validation (& expects input)
  • ELSNET: validation
  • SENSEVAL: validation
  • NIMM WG for Metadata for CL(also with the US OLAC)
  • ...

Copenhagen, Oct. 2001

target multilingual content management the resources viewpoint
Target: ….Multilingual Content Managementthe Resources viewpoint

The relevance/impact of(good vs. less good)LRsfor high-quality Cross/Multilingual systems ishigh, even if not easily measurable.

Different applications, component technologies - & approaches within - need different info types(e.g. CLIR or content access systems wrt MT)

For each, need to specify (not an easy task):

  • clear lexical/linguistic/conceptual requirements
  • priority info types(which, how encoded, etc.)
  • the respective role of e.g. annotated corpora, mono- bi- multilingual lexicons (with different info types), ontologies, KBs

Copenhagen, Oct. 2001

economic feasibility for which multilingual resources to invest
Economic Feasibility:for which (Multilingual) Resources to invest?

Wrt short- vs. medium-term impact:

  • Basic, general purpose bi-/multilingual lexicons, but to be tuned, adapted to different applications

need ofrobust systems able to acquire/tune

(multilingual) lexical/linguistic/conceptual knowledge,

to accompany static basic resources

We shouldn’t rely only on parallel corpora.More advisable to aim at

reliable methods for acquisition & use of ‘comparable corpora’,

accompanied by

robust technologies for annotation (at different levels: morphosyntactic,

syntactic/functional, semantic, …), andby

a shared set of (text) annotation schemata

Copenhagen, Oct. 2001

target multilingual knowledge management technical feasibility
Target…..Multilingual Knowledge ManagementTechnical Feasibility:
  • Prerequisite: is it an achievable goal a commonly agreedtext/lexicon annotation protocol also for the semantic/conceptual level (to be able to automatically establish links among different languages)?

Yes, at thelexical level

More complex, for corpus annotation?

EAGLES/ISLE

Copenhagen, Oct. 2001

content for practical use gap betw resources and systems
?Content for practical use:Gap betw. Resources and Systems?
  • If we had real-size lexicons with very fine-grained semantic/conceptual info, would there be systems(non ad-hoc toy systems) able to use them?
  • Avicious circle between
    • i)lack of suitable, large-size and knowledge intensive, resources(lexicons and corpora, with many different types of syntactic and semantic information encoded), and
    • ii)systems’ ability to use them effectively
  • Thetwo targets should be pursued in parallel,
  • should closely interact with each other, and
  • be gradually integrated

Copenhagen, Oct. 2001

ad