html5-img
1 / 15

Project co-financed by the European Regional Development Fund

Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future". General Word Sense Disambiguation System applied to Romanian and English Languages - SenDiS -. Project co-financed by the European Regional Development Fund. Word sense disambiguation

kolya
Download Presentation

Project co-financed by the European Regional Development Fund

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sectoral Operational Programme "Increase of Economic Competitiveness""Investments for your future" General Word Sense Disambiguation System applied to Romanian and English Languages- SenDiS - Project co-financed by the European Regional Development Fund Word sense disambiguation using lexicon nets Alin Ştefănescu, Oana Șoica, Andrei Mincă & SenDiS team June 27, 2013

  2. Introduction Alin Ştefănescu

  3. The ambiguous hen • „Găinaceanouă ne ouănouănouăouă.“ Image from aliexpress.com

  4. Natural Language Processing (NLP) • NLP developssystemsthatallowcomputerstocommunicatewithpeopleusingeverydaylanguage. • An importantarea, naturallanguageunderstanding • Subproblem: word sense disambiguation

  5. Softwin NLP @ SOFTWIN Research • NLP is an activeresearchareaatSoftwin Research • biometricsistheotheractivearea • previously, antivirus research in the same R&D department led to the creation of a award-winning, internationally certified internet security and antivirus software

  6. NLP @ SoftwinReseach – SenDiS project • SenDiSprojectatSoftwin Research • „A general Word Sense Disambiguation System appliedto • Romanianand English languages“ • 2010-2013 • co-financedthroughSectoralOperational Programme “IncreaseofEconomicCompetitiveness” (POS-CCE) • teamof 7-10 computerscientistsandlinguists • method: useofstructuredlinguisticknowledgeencodedwithSoftwin‘s GRAALAN formalism • previousprojects: PALIROM & LINCOR (withcollaboratorsfrom UB, ILIR, UPB etc)

  7. NLP system - GRAALAN • SenDiSbuilds upon andfurtherdevelopsthe NLP system GRAALAN atSoftwin Research 1. Linguistic theoretical background 2. GRAALAN Grammar Abstract Language 3. Linguistic tools 4. Linguistic knowledge bases 5. Linguistic applications SenDiS

  8. Word Sense Disambiguation (WSD) • identifythemeaningofwordsin context • in a computationalmanner • verydifficultproblem • threemainapproaches: • superviseddisambiguation • unsuperviseddisambiguation • knowledge-baseddisambiguation “Tower of Babel” by Brueghel SenDiS

  9. GRAALAN knowledge bases can encode several types of ambiguities: multiword expression (MWE) ambiguity morphologic ambiguity (synthetic & analytic) lexical ambiguity (synthetic & analytic) morphemic ambiguity syntactic ambiguity Dealing with ambiguity SenDiS

  10. a simple and intuitive knowledge-based WSD approach computes the word overlap between sense definitions of context target words For a two-word context (w1,w2) and S1 in Senses(w1) and S2 in Senses(w2): scoreLesk(S1,S2) = | gloss(S1) ∩ gloss(S2) | another variant, less computational intensive, computes the word overlap between a word sense definition and other context words scoreLeskVar(S) = | context(w) ∩ gloss(S) | Lesk Algorithm - basic idea

  11. Our approach: LeskAlgorithm extended Our approach: Leskalgorithm reasoning extended.Every annotated sense is extended with its definition that also has words with disambiguated senses and so on.

  12. Lesk Algorithm extended - example Generic example (Principle): <lemma>…=Sense 1 : <word> <word> <word> <word> Sense 2 : <word> <word> <word> <word> Sense 3 : <word> <word> <word> <word> <lemma>…=Sense 1 : <word> <word> <word> <word> Sense 2 : <word> <word> <word> <word> Sense 3 : <word> <word> <word> <word> <lemma>…= Sense 1 : <word> <word> <word> <word> Sense 2 : <word> <word> <word> <word> Sense 3 : <word> <word> <word> <word>

  13. Lesk Algorithm extended - example • Romanian example: • "radio" = • “0” : "Aparat de receptieradiofonica; radioreceptor." • “1” : "Instalatie de transmitere a sunetelorprinundeelectromagnetice, cuprinzândaparatele de emisiuneşipecele de receptie." • "aparat" = • "0" : "Sistem de piese care servestepentru o operatiemecanica, tehnica, stiintifica etc." • "1" : "Sistemtehnic care transforma o forma de energieînalta." • "2" : "Ansamblu de organeanatomice care servesc la îndeplinireauneifunctiunifundamentale." • "3" : "Totalitateaserviciilorsau a personalului care asigurabunulmers al uneiinstitutiisau al unuidomeniude activitate. " • "4" : "Ansamblulmijloacelor care servescpenrtu un anumitscop." • "receptie" = • "0" : "Operatie de luareînprimire a unui material sau a uneilucrari, pebazaverificariilorcantitativeşicalitative." • "1" : "Serviciuîntr-o întreprinderehoteliera care are evidentapersoaneloraflateîn hotel, face repartizareaîncamere a solicitatorilor etc." • "2" : "(Tehn) Primire a uneianumiteforme de energiepentru a o transformaînalta forma de • energie." • "3" : "Reuniune, banchet cu caracter, festiv (Încercurileoficiale). • "4" : "Primire, întâmpinare (cu caracterceremonios) a unuioaspete." • "radiofonic" = • "0" : "Care aparţineradiofoniei, privitor la radiofonie, care utilizeazaradiofonia." • "radioreceptor" = • "0" : "Aparatfolositpentrureceptionareaundelorradiofonice (prinantene), pentrutransformarealorînsemnalesonoreşitransmiterealorprinintermediuldifuzoarelor;radio."

  14. WSD using a specific lexicon network Word Sense defines defined by Word Sense a LARGE lexicon net “gloss tagged” relation

  15. SenDiS - workflow

More Related