1 / 127

German Rigau i Claramunt lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics

Ontologies. German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya. WordNet (Miller et al. 90, Fellbaum 98) EuroWordNet (Vossen et al. 98) Spanish WordNet

salena
Download Presentation

German Rigau i Claramunt lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontologies German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya

  2. WordNet (Miller et al. 90, Fellbaum 98) EuroWordNet (Vossen et al. 98) Spanish WordNet Combining Methods (Atserias et al. 97) Mapping hierarchies (Daudé et al. 01) Mikrokosmos (Viegas et al. 96) Cyc (Malesh et al. 96) WordNet 2 (Harabagiu 98) MindNet (Richardson et al. 97) ThoughtTreasure (Mueller 00) Meaning ... Ontologies Outline

  3. WordNet & EuroWordNet German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya

  4. Universidad de Princeton (Miller et al. 1990) Conceptos lexicalizados (parabras, lexíes) Relacionados entre sí por relaciones semánticas sinonimia antonimia hiperonimia-hiponimia meronimia implicación causa ... WordNet & EuroWordNetWordNet

  5. Sinonimia Conceptos Lexicalizados (SYNSETS) Noción débil de sinonimia: Sinonimia en contexto Synset: Conjunto de palabras o lexías que en un contexto dado expresan un concepto Hiperonimia / Hiponimia Relación de clase a subclase WordNet & EuroWordNetRelaciones Semánticas de WN1.5

  6. Meronimias Parte componente {mano}{brazo} Elemento de colectividad {persona}{gente} Sustancia {periódico}{papel} WordNet & EuroWordNetRelacions Semàntiques de WN1.5

  7. Antonimia {grande}{pequeño} Causa {matar}{morir} Implicación {divorciarse}{casarse} Derivación {presidencial}{presidente} Similitud {bueno}{positivo} WordNet & EuroWordNetRelaciones Semánticas de WN1.5

  8. WordNet & EuroWordNetEjemplo WordNet <conveyance> <vehicle> <doorlock> <car door> <motor vehicle, automovile,...> <cruiser, squad car, patrol car, ...> <cruiser, squad car, patrol car, ...> <cab, taxi, hack, ...>

  9. Proyecto LE-2 4003 Telematics Application Programme de la UE Redes semánticas de diversas lenguas Integradas e interconectadas Inglés Universidad de Sheffield Holandés Univ. de Amsterdam Italiano I.L.C. de Pisa Español UB, UPC, UNED. Computers and the Humanities (Vol.monográfico,1998) http://www.hum.uva.nl/~ewn/ WordNet & EuroWordNetEuroWordNet

  10. EWN2 Alemán, Francés, Checo, Sueco, Estonio Proyecto ITEM Castellano, Catalán, Vasco CREL (Centre de Referència d’Enginyeria Lingüística) Catalán (UB, UPC) WordNet & EuroWordNetExtensiones EuroWordNet

  11. Desarrollo de recursos Básicos Tratamiento interlingüístico de la información - Sistemas multilingües de recuperación de información (p.e., Internet) - Módulo léxico-semántico de los sistemas de ingeniería lingüística  Extracción de información  Traducción automática WordNet & EuroWordNetAplicaciones

  12. Preservación de las relaciones semánticas específicas de cada lengua Máxima compatibilidad entre los diferentes recursos Relativa independencia de los WordNets en el proceso de construcción en el resultado final WordNet & EuroWordNetRequisitos de Diseño

  13. Núcleo El ILI La Top Concept Ontology (TCO) Ontología de dominios (DO) Periferia WordNets específicos WordNet & EuroWordNetComponentes de EuroWordNet

  14. Colección no estructurada de elementos Ligados con al menos, un synset de un EWN un elemento de la TCO o DO Asociados a synsets de WN 1.5 WordNet & EuroWordNetInterlingual Index of EuroWordNet

  15. Jerarquía de conceptos independientes de la lengua distinciones semánticas: objeto, lugar, dinámico, … abstracta (no léxica) Superpuesta al ILI Tres tipos de entidades: Primer orden: entidades concretas Segundo orden: situaciones estáticas o dinámicas Tercer orden: proposiciones abstractas WordNet & EuroWordNetTop Concept Ontology of EuroWordNet

  16. WordNet & EuroWordNetTop Concept Ontology of EuroWordNet

  17. Jerarquía de etiquetas de dominio Reducción de la polisemia Dominios: Tráfico: Tráfico rodado, tráfico aéreo Información Internacional Micología Medicina WordNet & EuroWordNetDomain Ontology of EuroWordNet

  18. Riqueza superior a WN Entre: synsets (módulos monolingües) registros ILI (multilingües): {actuar-1} EQ-SYNONYM {‘behave in a certain manner’} registros ILI y TCO o OD WordNet & EuroWordNetRelaciones de EuroWordNet

  19. WordNet & EuroWordNetRelaciones Interlingüísticas de EuroWordNet

  20. WordNet & EuroWordNetRelaciones de EuroWordNet

  21. Spanish WordNet:Building Process German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya

  22. 1)Mapping to WN1.5 manual work automatic derivation of equivalents, using bi-lingual dictionaries 2) Manual correction 3) Re-structuring Spanish WordNetGeneral Methodology

  23. Nouns: A) WN1.5’s Tops File plus first level of hyponyms (about 800 synsets). B) The rest of EWN’s Common Base Concepts (which were not in our set). C) Manual translation of synsets intermediate between (A) and (B) following WN1.5 hyerarchy ¾thus building a compact taxonomy equivalent to WN1.5 without gaps¾ Verbs: Manual translation of EWN’s Base Concepts (about 150 synsets) Spanish WordNetMain Steps: First Core (Manual Translation)

  24. Nouns: Applying authomatic methods using bi-lingual dictionaries Manual validation of several subsets to check if the link is correct Deriving a Confidence Score (CS) for every authomatic method (heuristic) Selecting pairs synset-word above 85% CS Some manual correction of this Subset 1 (mainly, filling gaps) Verbs: 3600 English verbs connected to WN1.5 senses and ambiguously translated to Spanish are manually inspected and disambiguated Spanish WordNetMain Steps: Subset 1 (Semi-automatic)

  25. Spanish WordNetMain Steps: Subset 1 (Results 1)

  26. Spanish WordNetMain Steps: Subset 1 (Results 2)

  27. Main goals enhance the quality of the Subset 1 by manual revision extend it by manual building of synsets 4 Sub-tasks Spanish WordNetMain Steps: Subset 2

  28. 1) Covering manually those gaps in the hyponymy chains covered by other languages 2) Manual cleaning of some automatically-generated variants. (a) pairs of synsets which are adjacent in the hyponymy chain and share at least one variant. deleting redundant variants re-locating to either pre-existant or newly created synsets (b) multi-word expressions present in synsets. Deleting non-lexicalized Spanish WordNetMain Steps: Subset 2

  29. 3) Manual addition of new vocabulary which has been considered relevant. It mainly comes from the Catalan WordNet: since we are building both wordnets in parallell, we detected those synsets which were built for Catalan and not for Spanish 4) Manual addition of cross-part of speech relations between nominal and verbal synsets. This work has been based mainly on noun-verb pairs obtained by means of morphological criteria. (Work carried out by UNED –Madrid-) Spanish WordNetMain Steps: Subset 2

  30. Spanish WordNetMain Steps: Subset 2 (Results)

  31. Spanish WordNetMain Steps: Subset 2 (Results)

  32. Massive Manual Checking (from Nov’98) Using WEI Variants automatically generated Filling gaps in the hierachy New vocabulary New Adjectives Spanish WordNetMain Steps: Beyond Subset 2

  33. Spanish WordNetMain Steps: Beyond Subset 2

  34. Spanish WordNetMain Steps: Beyond Subset 2

  35. Spanish WordNetMain Steps: Parole Coverage

  36. Spanish, Catalan, Basque, (English) http://nipadio.lsi.upc.es/wei2.html Spanish WordNetCurrent Figures

  37. Combining Multiple Methods for the Automatic Construction of Multilingual WordNets German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya

  38. Ten class methods Four monosemic criteria Four polysemic criteria two hybrid criteria Three conceptual distance methods CD1: using pairwise word coocurrences CD2: using headword and genus CD3: using bilingual Spanish entries with multiple translations Combining Multiple Methods ...Outline

  39. Four Classes SW EW SW EW EW SW EW SW SW EW SW EW Combining Multiple Methods ...Ten class methods

  40. Four monosemic criteria SW EW EW SW EW Synset SW EW Synset Combining Multiple Methods ...Ten class methods SW EW Synset Synset Synset SW EW Synset SW

  41. Four polysemic criteria SW EW EW SW EW SW Combining Multiple Methods ...Ten class methods SW EW Synset+ Synset+ Synset+ Synset+ SW EW Synset+ SW EW Synset+

  42. Variant criterion Field criterion Combining Multiple Methods ...Ten class methods <..., EW, ..., EW, ...> SW <..., headword-EW, ..., Ind-EW, ...> SW

  43. Results Combining Multiple Methods ...Ten class methods

  44. Conceptual Distance (Agirre et al. 94) length of the shortest path specificity of the concepts Combining Multiple Methods ...Conceptual Distance methods • using WordNet • Bilingual dictionary

  45. Three conceptual distance methods CD1: using pairwise word coocurrences CD2: using headword and genus CD3: using bilingual Spanish entries with multiple translations Combining Multiple Methods ...Conceptual Distance methods

  46. <structure, construction> <building, edifice> <place of worship, ...> <church, church building> <abbey> <monastery> <convent> <abbey> <abbey> Combining Multiple Methods ...Conceptual Distance methods (Example CD2) <entity> <object, ...> <artifact, artefact> <house, lodging> <religious residence, cloiser> abadía_1_2 Iglesia o monasterio regido por un abad o abadesa (abbey, a church or a monastery ruled by an abbot or an abbess)

  47. <monastery> <convent> <abbey> <abbey> Combining Multiple Methods ...Conceptual Distance methods (Example CD2) <entity> <object, ...> <artifact, artefact> <structure, construction> <house, lodging> <building, edifice> <place of worship, ...> <religious residence, cloiser> <church, church building> <abbey> 06 ARTIFACT abadía_1_2 Iglesia o monasterio regido por un abad o abadesa (abbey, a church or a monastery ruled by an abbot or an abbess)

  48. Results Combining Multiple Methods ...Three CD methods

More Related