Download
the italian clips lexicon and its reuse in a bilingual environment n.
Skip this Video
Loading SlideShow in 5 Seconds..
The Italian CLIPS Lexicon and its reuse in a bilingual environment PowerPoint Presentation
Download Presentation
The Italian CLIPS Lexicon and its reuse in a bilingual environment

The Italian CLIPS Lexicon and its reuse in a bilingual environment

118 Views Download Presentation
Download Presentation

The Italian CLIPS Lexicon and its reuse in a bilingual environment

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. The Italian CLIPS Lexiconand its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

  2. Outline Part I Part II • The origin of the CLIPS lexicon • The PAROLE-SIMPLE model • General encoding criteria • Phonological and morphological levels • Syntactic level: information content • The semantic lexicon • Theoretical background: GL theory • The original Qualia Structure • The SIMPLE ontology • The Extended Qualia Structure • Semantic level: information content • Predicative structure • Syntax-semantics mapping • Encoding methodology • CLIPS essential features & applications • Creating a bilingual resource • The two scenarios • Scenario I • Drawbacks • Scenario II • The cognate approach • The sense indicator approach • Results • Concluding remarks Nilda Ruimy september 2004

  3. PAROLE Corpus lexical units SIMPLE European project PAROLE European project DMI phonology CLIPS: a bit of genealogy Semantic Information for Multifunctional Plurilingual Lexica SIMPLE lexicons 12 harmonized lexicons Italy: enlargment of these core lexicons in a national follow-up project PAROLE lexicons CLIPS lexicon XML format 12 harmonized lexicons morphology: 20,000 entriessyntax: 20,000 lemmas phonology: 374,000 entriesmorphology: 49,000 entriessyntax: 55,000 lemmassemantics: 55,000 senses semantics: 10,000 senses Nilda Ruimy september 2004

  4. PAROLE-SIMPLE Theoretical model GENELEX-PAROLE Representational Model • EAGLES recommendations • Extended GENELEX model • Results from EU projects: • EUROWORDNET • ACQUILEX • DELIS • GENERATIVE LEXICON The PAROLE-SIMPLE Model Nilda Ruimy september 2004

  5. The Linguistic Model • common EAGLES-conformant model • common representation language • common building methodology • Innovative • Tackles misrepresented areas of knowledge • Extendible and multifunctional • Multilingual perspective R E U S A B I L I T Y PAROLE-SIMPLE lexicons Nilda Ruimy september 2004

  6. Representational Model (1) Entity/Relationship Model: • implemented through a DTD that defines: • the structure of every descriptive element • the relationships holding among the various descriptive elements as well as their co-occurence restrictions • non ridondant data representation Nilda Ruimy september 2004

  7. Representational Model (2) • specific representational structures for the every level of linguistic description; • link among the different levels although the information encoded at each level is perfectly autonomous Nilda Ruimy september 2004

  8. General encoding criteria • Reduce the lexicographer’s margin of subjectivity by setting precise guidelines for the treatment of particular phenomena • Base as much as possible the encoding on corpus data • Find a balance between the encoding of attested structures / senses only and an exhaustive encoding including rare structures / senses as well Nilda Ruimy september 2004

  9. Splitting entries • Avoid both redundancy and over-powerful gatherings • Use criteria strictly relevant to the description level, e.g. at the syntactic level, syntactic-driven criteria: • arity • syntactic function: disporre i libri negli scaffali / disporre di due auto • complement optionality: attraversare (la strada) (lit. sense) /attraversare un momento difficile • different (non alternative) realization of complements: Leo evita Lia / L. ha evitato di guardare L., che L. si ferisse • Encode, at the semantic level, most common senses distinguished in average size dictionaries (ca.150,000 words) Nilda Ruimy september 2004

  10. stress position vowel openness cons. prononciation Phonological Unit Corresp. PhnU-MrphU PoS & subcat. inflectional paradigm Morphological Unit Corresp. MrphU-SynU position synt. restr. position synt. restr. a. head properties b. subcat. frame a. head properties b. subcat. frame syntactic structure 1 Syntactic Unit The four-level architecture The first three levels Frameset syntactic structure 2 Nilda Ruimy september 2004

  11. Syntactic entry information content P0 P1 optional optional subject adverbial NP di_PP decausativization locativealternation reciprocal altern. symmetrical altern. Aumentare: Il governo ha aumentato i prezzi del 3%. I prezzi sono aumentati del 3% ‘to increase: The government has increased the prices by 3%. Prices have increased by 3%’ • Specific properties of the entry in the syntactic context described main verb aux. :avere • Subcategorization frame syntactic frame: MAIN P1 P2 P0 oblig. optional optional object adverbial subject complex synt. entry NP di_PP NP RELATED syntactic frame: • Link between syntactic structures FRAMESET relating systematic frame alternations: relates main syntactic frame to alternating one relates respective frame positions Nilda Ruimy september 2004

  12. The semantic lexicon Theoretical linguistic background: Extended version of Pustejovsky’s Generative Lexicon (GL) theory Nilda Ruimy september 2004

  13. Generative Lexicon theory • lexical meanings of various levels of complexity • bambino HUMAN, age (childhood), sex (male) • dottore HUMAN, age (adult), sex (male), • giornale 1. printed paper, 2. location 3. istitution 4. human group function polysemy • simplest ones : definable by a taxonomic relation • more complex ones:hypernymic relation not sufficient • Qualia Structure allows : • to coherently model the pluridimensionality of meaning • to capture the relationships holding btw. semantic units • to represent uniformly semantic units of different degree of complexity Nilda Ruimy september 2004

  14. formal = what is X?constitutive = what is X made of?agentive = how does X come about?telic = what is X’s function? The Original Qualia structure Consists of four roles: • formal role: distinguishes the denoted entity from others • constitutive role: expresses its components • agentive role: expresses its coming about • telic role: specifies its funtion Qualia Nilda Ruimy september 2004

  15. The SIMPLE ontology (1) Lexicon structured on the basis of a type ontology: Possible creation of language / application specific types • Core Ontology: • top level, general types; • large consensus; • provide essential information; • mappable on EuroWordNet ontology • Recommended Ontology: • hierarchically lower and more specific types; • provide finer-grained information Nilda Ruimy september 2004

  16. The SIMPLE ontology (2) 157 language independent semantic types • simple types (one-dimensional) : can be fully characterized in terms of a hypernymic relation, e.g. Entity Concrete_entity Living_entity Animal Earth_Animal Nilda Ruimy september 2004

  17. The SIMPLE ontology (3) • unified types (multi-dimensional) : can only be defined through the combination of: • the relation to their supertype • the reference to orthogonal dimensions of meaning Agentive Entity Telic Abstract_Entity Institution Nilda Ruimy september 2004

  18. The SIMPLE ontology (4) SimpleOntology: multidimensional type hierarchy based on both hierarchical and non-hierarchical conceptual relations Nilda Ruimy september 2004

  19. Semantic types • In the SIMPLE ontology, types are not mere labels but the repository of a specific set of structured semantic information Nilda Ruimy september 2004

  20. some semantic types for abstract & concrete entities TOP TELIC ENTITY CONSTITUTIVE AGENTIVE ... ... Event Concrete_entity ... Representation Property • Living_entity • Human • Animal • Vegetal_entity • Artifact • Susbstance • Location • Food • Material Abstract_entity • Quality • Quantity • Physical_prop • Psychol_prop • ..... • Sign • Language • Information • ..... • Convention • Cognitive_fact • ..... Artifact • Furniture • Instrument • Clothing • Artwork Artifactual_material Nilda Ruimy september 2004

  21. some semantic types for events EVENT Phenomenon Aspectual Cause_change Psych_event ... State Act ... ... ... Change ... ... Creation Relational_state ... ... Relational_change Acquire_knowledge Non_relational_act Move Change_possession Natural_transition Cause_act Relational_act Speech_act Change_location ... Nilda Ruimy september 2004

  22. some semantic types for adjectives TOP Intensional Extensional Temporal Psychological_prop Relational_prop Social_prop Modal Emphasizer Physical_prop Intensifying_prop Emotive Manner Temporal_prop Object_related Nilda Ruimy september 2004

  23. Descriptive elements • Features: PlusHuman, PlusCollective,.. • Relations between semantic units: R (<SemU1>, <SemU2>) Nilda Ruimy september 2004

  24. Formal Constitutive Agentive Telic made_of is_a_follower_of has_as_member is_a_member_of has_as_part instrument kinship is_a_part_of resulting_state relates uses C O N S T I T U T I V E result_of agentive_prog agentive_cause agentive_experience caused_by source used_for used_as used_by used_against A G E N T I V E isa antonym_comp antonym_grad mult_opposition INSTRUMENTAL indirect_telic purpose TELIC created_by derived_from ARTIFACTUAL AGENTIVE is_the_activity_of is_the_ability_of is_the_habit_of ACTIVITY causes concerns affects constitutive_activity contains has_as_colour has_as_effect has_as_property measured_by measures produces produced_by property_of quantifies related_to successor_of precedes typical_of contains feeling DIRECT TELIC object_of_activity P R O P E R T Y is_in lives_in typical_location LOCATION Extended Qualia Extended roles Structure Nilda Ruimy september 2004

  25. Formal Constitutive Agentive Telic made_of is_a_follower_of has_as_member is_a_member_of has_as_part instrument kinship is_a_part_of resulting_state relates uses C O N S T I T U T I V E result_of agentive_prog agentive_cause agentive_experience caused_by source used_for used_as used_by used_against A G E N T I V E isa antonym_comp antonym_grad mult_opposition INSTRUMENTAL indirect_telic purpose TELIC created_by derived_from ARTIFACTUAL AGENTIVE is_the_activity_of is_the_ability_of is_the_habit_of ACTIVITY causes concerns affects constitutive_activity contains has_as_colour has_as_effect has_as_property measured_by measures produces produced_by property_of quantifies related_to successor_of precedes typical_of contains feeling DIRECT TELIC object_of_activity P R O P E R T Y is_in lives_in typical_location LOCATION disgusto, provare (disgust, feel) casa, costruire (house, build) pane, farina (bread, flour) mohair, capra (mohair, goat) senato, senatore (senate, senator) proiettile, colpire (projectile, hit) manubrio, bicicletta (handlebar, bicycle) metano, combustibile (methane, fuel) bisturi, chirurgo (lancet, surgeon) arancio, arancia (orange tree, orange) medico, curare (doctor, cure) antitarmico, tarma (moth balls, moth) abbaiare, cane (bark, dog) fumatore, fumare (smoker, smoke) Nilda Ruimy september 2004

  26. Orthogonal dimensions of meaning Formal role is_a Telic role instrument Constitutive role is_made_of used_for created_by Agentiverole Nilda Ruimy september 2004

  27. musical_instrument is_a has_as_part used_for is_made_of wood strings playing make created_by Orthogonal dimensions of meaning Formal role Constitutive role Telic role violin Agentiverole Nilda Ruimy september 2004

  28. Constitutive: made_of Formal: isa Agentive: created_by Constitutive: made_of Telic: Used_for Constitutive: contains meaning dimensions expressed by Qualia relations botte barrel recipiente di legno traditional dictionary definition fatto di doghe arcuate tenute unite da cerchi di ferro che serve per la conservazione e il trasporto di liquidi, specialmente vino Nilda Ruimy september 2004

  29. Qualia informative power (1) Within a semantic type population, further clusterings can be made through the is-a relation: Nilda Ruimy september 2004

  30. Qualia informative power (2) utensile INSTRUMENT is-a is-a graticola posata colabrodo is-a frusta forchetta coltello used for contenitore used for cucinare CONTAINER mangiare is-a used for pentola tegame padella Nilda Ruimy september 2004

  31. stress position vowel openness cons. prononciation Phonological Unit Corresp. PhnU-MrphU PoS & subcat. inflectional paradigm Morphological Unit Corresp. MrphU-SynU position synt. restr. position synt. restr. a. head properties b. subcat. frame a. head properties b. subcat. frame syntactic structure 1 Frameset syntactic structure 2 Syntactic Unit regular polysemy semant. relations event type ontological type Corresp. SynU-SemU semant. class domain derivation semant. features synonymy formalrole Semantic Unit predicative represent. constitutiverole Extended Qualia Structure agentive role telic role type of link predicate sem. restr. arguments semantic level: information content Nilda Ruimy september 2004

  32. Predicative Representation • Describes the semantic scenario a word sense is involved in • Assigned to predicative semantic units • assignment of a lexical predicate • type of link holding btw. entry and predicate • predicate argument stucture • semantic role of arguments • selection restrictions of arguments • link semantic arguments / syntactic complements Nilda Ruimy september 2004

  33. Assignment of a lexical predicate • verbs; • predicative nouns: deverbals (costruzione) and collective simple nouns (gruppo), nouns denoting a relation (madre), quantity (bottiglia), part (fetta), unit of measurement (metro), property (bellezza); • adjectives; • some adverbs (indipendentemente da) Nilda Ruimy september 2004

  34. Predicate-semantic unit link accusa accusare accusation to accuse master process nominalisation PRED_ACCUSARE patient nominalisation agent nominalisation accusato accusatore accused accusator Nilda Ruimy september 2004

  35. Semantic arguments: thematic roles • ProtoAgent: volitional subject of verb: ARG0 of kill • ProtoPatient: object undergoing an action: ARG1 of kill • 2ndParticipant: indirect object: ARG2 of give • SoA (State of Affair): sentential complement: ARG2 of ask • Location: ARG2 of put • Direction: ARG2 of move • Origin: ARG1 of move • Kinship: ARG0 of father • HeadQuantified: ARG0 of metre, bottle Nilda Ruimy september 2004

  36. Semantic arguments: selectional restrictions • Features, used transversely across semantic types (eg.: plusEdible), allow to capture wider preferences w.r.t. single semantic types: ARG1 eat : [PlusEdible] / ARG1 eat : [FOOD] • Not proper restrictions, but rather preferences of combinations in prototypical situations. • Expressible through: • semantic types; • notions (combination of types or type + feature…) • features; • semantic units Nilda Ruimy september 2004

  37. ONTOLOGICAL INFO. EXTENDED QUALIA INFO. PREDICATIVE REPRESENTATION Semantic entry information content (1) Aumento: L’aumento dei prezzi da parte del governo increase: the increase of prices by the government • Semantic type: Cause_change_of_value • Supertype:Cause_relational_change • Eventype:transition • Domain:general, economics • Gloss:accrescimento in dimensione o quantità • aumento isacambiamento • aumento resulting_statemaggiore • Agentivecause:yes • Direction:up • Morphological derivation:Eventverbaumentare • Lexical semantic predicate:PRED_aumentare • Type of link:event nominalization • Predicate arg. struct.:range, semantic role & selectional restrictions of args.: Arg0 Arg1 Arg2 Protoagent ProtoPatient Quantifier Human / Institution Entity Amount Nilda Ruimy september 2004

  38. ONTOLOGICAL INFO. EXTENDED QUALIA INFO. PREDICATIVE REPRESENTATION Arg0 Arg1 Arg2 Protoagent ProtoPatient Location Human / Instrument +liquid Concrete_entity Semantic entry information content (2) vaporizzatore: spruzzare acqua con un vaporizzatore spray: to spray water with a spray • Semantic type: Instrument • Supertype:Artifact • Eventype: === • Domain:general, cleaning, gardening, cosmetics • Gloss:apparecchio usato per ridurre in minuscole particelle un liquido • vaporizzatoreisaapparecchio • vaporizzatorehas_as_partpulsante • vaporizzatorecreated_byfabbricare • vaporizzatoreused_foratomizzare • Synonymy:nebulizzatore • Morphological derivation:Eventverbvaporizzare • Lexical semantic predicate: PRED_vaporizzare • Type of link:instrument nominalization • Predicate arg. struct.:range, semantic role & selectional restrictions of args.: Nilda Ruimy september 2004

  39. position synt. restr. position synt. restr. a. head properties b. subcat. frame a. head properties b. subcat. frame syntactic structure 1 Frameset Syntactic Unit ontological type event type semant. relations regular polysemy Corresp. SynU-SemU semant. class domain derivation semant. features synonymy formalrole Semantic Unit predicative represent. constitutiverole Extended Qualia Structure agentive role telic role type of link predicate sem. restr. arguments Syntax-semantics mapping (1) syntactic structure 2 Corresp. Syntax-Semantics Nilda Ruimy september 2004

  40. Syntax-semantics mapping (2) SynU_migliorare Transitive structure P0 P1 Intransitive structure P0 SemU1_migliorare SemU2_migliorare CAUSE_CHANGE_OF_STATE CHANGE_OF_STATE PRED_migliorare ARG0 : Agent ARG1 : Patient SYNTACTIC LEVEL ‘to improve’ Frameset SEMANTIC LEVEL LINK PREDICATE-SEMANTIC UNIT SEMANTIC PREDICATE Nilda Ruimy september 2004

  41. Syntax-semantics mapping (2) SynU_migliorare ‘to improve’ Transitive structure P0 P1 Intransitive structure P0 Frameset CORRESPONDENCE SYNTACTIC-SEMANTIC FRAME isomorphic non-isomorphic SemU1_migliorare SemU2_migliorare CAUSE_CHANGE_OF_STATE CHANGE_OF_STATE PRED_migliorare ARG0 : Agent ARG1 : Patient Nilda Ruimy september 2004

  42. Template-drivenencoding methodology • a template is a schema providing, for each semantic type, a set of structured information that are deemed crucial to its definition • twofold function: • interface between ontology and lexicon • guide for the lexicographer • ensures systematicity, consistency and uniformity of representation of the lexical meaning Nilda Ruimy september 2004

  43. A template Nilda Ruimy september 2004

  44. CLIPS’ key features • The largest electronic, multilevel lexical resource of Italian language • 55,000 words encoded • 4 description levels: phonology, morphology, syntax, semantics • Based on a rich and multifunctional linguistic and representational modelshared by 11 otherEuropean lexica • Lexical description conformant to international standards • Respect of the principles of uniformity, consistency and exhaustivity • Generic lexicon  large coverage (vocabulary and synt. structures) • Fine-grained information, highly structured, innovative, most useful for HLT applications • High level of reusability Nilda Ruimy september 2004

  45. Application fields • surface and deep analysis of texts • information retrieval • machine translation • natural language understanding, etc. The wealth of information the lexicon contains allows: • building semantic networks • extracting the vocabulary of a specific domain • NP recognition: disambiguating the semantic contribution of some PPsin complex nominals Nilda Ruimy september 2004

  46. as the PAROLE and SIMPLE lexicons, CLIPS does meet these requirements To lend itself to further uses, a lexicon must have: • flexible model • generic database • uniformly structured data • precise and explicit linguistic description Nilda Ruimy september 2004

  47. Creating a bilingual electronic lexical resource Strategy I: 1)Use CLIPS and the PAROLE-SIMPLE French lexicon 2) Perform a semi-automatic linking of their respective entries Nilda Ruimy september 2004

  48. Creating a bilingual electronic lexical resource Strategy II: 1)Derive , in a semi-automatic way, a semantically annotated French lexicon from CLIPS 2) Use source and derived lexicons as a basis for building a bilingual resource Nilda Ruimy september 2004

  49. ALGORITHM capo xxxxxtête yyyyychef zzzzzbout ufficio xxxxxbureau yyyyycharge …….. …….. tête xxxxxtesta yyyyycapo zzzzzfaccia wwwcima bureau xxxxxufficio yyyyyscrivania …….. capo_1 phon:…….morph:.……syn:……….sem:……. capo_2 …. ufficio_1 …………………………. tête_1morph:.……syn:……….sem:……. tête_2 ….. tête_3 … bureau_1 …………………………. Strategy I: CLIPS bilingual dictionary IT-FR & FR-IT PAR-SIMPLE French lex. ? capo ufficio gentile residenza tessere pompa scrivere tessuto vestibolo testo amministratore vincere ? Nilda Ruimy september 2004

  50. Analysis of the inherent properties of the SL & TL senses: • identity of ontological classification or subsumption relation btw. • the semantic type of the SL & TL senses • identity of semantic class or subsumption relation btw. their semantic class • identity of domain or subsumption relation btw. their domain info. • identity / corrispondence of semantic features • identity / corrispondence of semantic relations • Analysis of their contextual properties: • compatibility of syntactic valency • function and grammaticalinstantiation of complements • compatibility of semantic valency • semantic role and semantic restrictions of arguments cf. Villegas et al. LREC 2000, Athens Nilda Ruimy september 2004