1 / 22

UNL Lexical Selection with Conceptual Vectors

This paper presents a method for disambiguation in UNL-French deconversion using conceptual vectors, with a focus on lexical selection and finding the best French lemma for a given Universal Word (UW).

eckhart
Download Presentation

UNL Lexical Selection with Conceptual Vectors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UNL Lexical Selection with Conceptual Vectors LREC-2002, Las Palmas, May 2002 Mathieur Lafourcade & Christian BoitetLIRMM, Montpellier GETA, CLIPS, IMAG, Grenoble Christian.Boitet@imag.frhttp://www-clips.imag.fr/geta Mathieu.Lafourcade@lirmm.frhttp://www.lirmm.fr/~lafourca

  2. Outline • The problem: disambiguation in UNL-French deconversion • Finding the known UW nearest to an unknown UW • Finding the best French lemma for a given UW • Conceptual vectors • Nature & example on French (873 dimensions) • Building (Dec. 201: 64,000 terms, 210,000 CVs) • CVD (CV Disambiguation) running for French • Recooking the vectors attached to a document tree • Placing each recooked vector in the word sense tree • Using CVD in UNL-French deconversion: ongoing

  3. Validation & Localization Graph to tree conversion Lexical Transfer UNL-FRA Graph (UW) UNL-FRA Graph (French LU) UNL-L1 Graph “UNL Tree” Structural transfer GMA structure Paraphrase choice UMA structure Conceptual vectors computations Syntactic generation UMC structure Morphological generation French utterance The UNL-FR deconversion process

  4. The problem: disambiguation in UNL-French deconversion • Find the known UW nearest to an unknown UW • known UWs: obj(open(icl>occur),door) • (in KB context) a door opens obj(open(icl>do),door) • one opens a door • input graph: obj(open(icl>occur,ins>concrete thing),door) • ins(open(icl>occur,ins>concrete thing),key…) a key opens a door / a door opens with a key • ==> choose nearest open(icl>occur) for correct result • Find best French lemma for a UW in a given context • meeting(icl>event) ==> réunion [ACTION, DURATION…] • rencontre [EVENT, MOMENT…]

  5. How to solve them? • unknown UW  best known UW • Accessing KB in real time impractical (web server) • KB not enough: still many possible candidates • known UW  best LU • Often no clear symbolic conditions for selection • Possibility to transform UNLLUfr dictionary into a kind of neural net (cf. MSR MindNet) • a possible unifying solution: • Lexical selection through DCV, • Disambiguation using Conceptual Vectors • which works quite well for French on large scale experiments

  6. Conceptual vectors • CV = vector in concept space (4th level in Larousse) • V(to tidy up) = CHANGE [0.84], VARIATION [0.83], EVOLUTION [0.82], ORDER [0.77], SITUATION [0.76], STRUCTURE [0.76], RANK [0.76] … • V(to cut) = GAME [0.8], LIQUID [0.8], CROSS [0.79], PART [0.78] MIXTURE [0.78], FRACTION [0.75], TORTURE [0.75] WOUND [0.75], DRINK [0.74] … • Global vector of a term = normalized sum of the CVs of its meanings/senses • V(head) = HEAD [0.83], . BEGINNING [0.75], ANTERIORITY [0.74], PERSON [0.74] INTELLIGENCE [0.68], HIERARCHY [0.65], …

  7. Conceptual vectors and sense space • Conceptual vector model • Reminiscent of Vector Models (Salton and all.) & Sowa • Applied on preselected concepts (not terms) • Concepts are not independent • Set of k basic concepts • Thesaurus Larousse = 873 concepts (translation of Roget’s) • A vector = a 873 uple of reals in [0..1] • Encoding for each dimension C = 215 : [0..32767] • Sense space = vector space + vector set

  8. x’ x y  Thematic relatedness • Conceptual vector distance • Angular Distance DA(x, y) = angle (x, y) • 0 <= DA(x, y) <=  • Interpretation • if DA(x, y) = 0 x // y (colinear): same idea • if DA(x, y) = /2 x  y (orthogonal): nothing in common • if DA(x, y) =  DA(x, y) = DA(x, -x): -x anti-idea of x

  9. Collection process Start from a few handcrafted term/meanings/vectors <do forever> //running constantly on Lafourcade’s Mac • <choose a word at random (with or without a CV) • find NL definitions of its senses (mainly on the Web) • for each sense definition SD • analyze SD into linguistic tree TreeDef • attach existing or null CVs to lexical nodes of TreeDef • iterate  propagation of CVs in TreeDef (ling. rules used here) • until CV(root) converges or limit of cycle numbers is reached • CV(sense)  CV(root(TreeDef)) • use vector distance to arrange the CVs of senses into a binary « discrimination tree » • </choose> </do>

  10. An example discrimination tree

  11. Status on French CVs • By Dec. 2001 • 64,000 terms • 210,000 CVs • Average of 3.3 senses/term • Method • robot to access web lexicon servers • large coverage French analyzer by J.Chauché in Sigmart • See more details on • http://www.lirmm.fr/~lafourca

  12. Disambiguation in French • Recook the vectors attached to a document tree • Take a document • Analyze it with Sigmart analyzer into ONE possibly big tree (30 pages OK as a unit) • Use the same process as for processing definitions • Final CV(root) usable as thematic classifier of document • Final CV (lexemes) used as « sense in context » • Place each recooked vector in the discrimination tree • Walk down the discrimination tree, using vector distance • Stop at nearest node: • If leave node, full disambiguation (relative to available sense set) • If internal node, partial disambigation (subset of senses)

  13. Example with some ambiguities • The white ants strike rapidly the trusses of the roof

  14. Initialize: attach CVs to lexemes • The white ants strike rapidly the trusses of the roof

  15. Up / Down propagation of the CVs

  16. Result: sense selection • The white ants strike rapidly the trusses of the roof

  17. Disambiguation in UNL-French deconversion • Our set-up • Example input UNL-graph • Outline of the process • Two usages of DCV (disambiguation with CV) • Finding the known UW nearest to an unknown UW • Finding the best French lemma for a given UW

  18. A UNL input graph • Ronaldo has headed the ball into the left corner of the goal”

  19. Corresponding UNL-treewith CVs attached: localization DCV V = Vevent(score)+ Vhuman(score) + Vsport(score) score(icl>event,agt>human,fld>sport) .@entry.@past.@complete head(pof>body): ins 1- Ronaldo: agt Vbody(head) V(human) 2- Ronaldo: pos corner: plt 1- goal(icl>thing): obj V(human) Vplace(corner) Vthing(goal) 1- goal(icl>thing): obj left: mod V(left) Vthing(goal)

  20. Result of first step: the « best » UWs • The vector contextualization generalizes both kinds of localization (lexical and cultural). • On each node, the selected UW is the one in the UNL-French database which vector is the closest to the contextualized vector. • Formulas used for up and dow propagation:

  21. Second step: select the « best » LUs • Depending on the strategy of the generator, a lexical unit (LU) may be • a lemma • a whole derivational family • (pay, payment, payable…) • Dictionay: <UW, CVdict> {<LUi, CVi>} • Input: <UW,CVcontext> • Output: LU i with nearest CVi

  22. Conclusion • Another case of fruitful integration of symbolic & numerical methods • Further work planned • integration into running UNL-FR server • work on feed-back (Pr SU’s line of thought) • if user corrects the choice of LU for chosen UW • or worse, if user chooses a LU corresponding to another UW! ==> then recompute vectors by giving more weight to chosen CVs

More Related