1 / 39

Division of semantic labor in the Global WordNet Grid

Overview. KYOTO as a domain implementation of the Global Wordnet GridScope of knowledge integrationDivision of linguistic laborHow to integrate resources?How to make inferences?. KYOTO ? some statistics. European-Asian projectMarch 2008 ? March 20117 countries (The Netherlands, Italy, Germany,

randall
Download Presentation

Division of semantic labor in the Global WordNet Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Division of semantic labor in the Global WordNet Grid Piek Vossen, VU University Amsterdam German Rigau, University of the Basque Country 5th Global Wordnet Conference Mumbai, India, Jan 30 – Feb 5, 2010

    2. Overview KYOTO as a domain implementation of the Global Wordnet Grid Scope of knowledge integration Division of linguistic labor How to integrate resources? How to make inferences?

    3. KYOTO – some statistics European-Asian project March 2008 – March 2011 7 countries (The Netherlands, Italy, Germany, Spain, Taiwan, Japan, Czech Republic) 12 sites Universities & research institutes: VUA, CNR-ILC, CNR-IIT, BBAW, EHU, AS, NICT, Masaryk Companies: Synthema, Irion User organizations: ECNC, WWF 7 languages (English, Italian, Japanese, Dutch, Spanish, Basque, Chinese)

    4. Overview of the KYOTO process

    5. GWC2010, Mumbai 5 Applying ontology mappings

    6. GWC2010, Mumbai 6 Gobal Wordnet Grid

    7. GWC2010, Mumbai 7 Available repositories in KYOTO Environment domain Term database: 500,000 terms per 1,000 documents per language Open data project: DBPedia: 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples) GeoNames: 8 million geographical names and consists of 6.5 million unique features whereof 2.2 million populated places and 1.8 million alternate names Domain thesauri and taxonomies: Species 2000: 2,1 million species Wordnets for 7 languages: about 50,000 to 120,000 synsets per language Ontologies: SUMO, DOLCE, SIMPLE

    8. GWC2010, Mumbai 8 Here is the architecture of the kyoto grid or knowldege base, where we have the wordnets, connected to the ontology. The nebula in the center represents the sense axis which groups together different corresponding each synsets and via them points to the ontology. This ia the initial situation. At the end we will have the lex and onto bases extended for the environmental domainHere is the architecture of the kyoto grid or knowldege base, where we have the wordnets, connected to the ontology. The nebula in the center represents the sense axis which groups together different corresponding each synsets and via them points to the ontology. This ia the initial situation. At the end we will have the lex and onto bases extended for the environmental domain

    9. GWC2010, Mumbai 9 Species in the ontology

    10. GWC2010, Mumbai 10 Should all knowledge be stored in the central ontology? Vocabularies are too large for full inferencing with current reasoners Vocabularies are linguistically too diverse to be represented in an ontology Inferencing capabilities of formal ontologies is not needed for all levels of knowledge

    11. GWC2010, Mumbai 11 Modeling knowledge in a domain Knowledge needs to be divided over different lexical and ontological layers: Precisely define the relations between lexical and ontological layers Precisely define the inferencing based on the distributed knowledge layers

    12. GWC2010, Mumbai 12 Division of linguistic labor principle Putnam 1975: No need to know all the necessary and sufficient properties to determine if something is "gold" Assume that there is a way to determine these properties and that domain experts know how to recognize instances of these concepts. Speakers can still use the word "gold" and communicate useful information

    13. GWC2010, Mumbai 13 Division of semantic labor principle Digital version of Putnam (1975): Computer does not need to have all the necessary and sufficient properties to determine if something is a "European tree frog" Computer assumes that there is a way to determine this and that domain experts (people) know how to recognize instances of these concepts. Computers can still reason with semantics and do useful stuff with textual data

    14. GWC2010, Mumbai 14 What does the computer need to know? Distinction between rigid and non-rigid (Welty & Guarino 2002): being a "cat" is essential to individual's existence and therefore rigid being a "pet" is a temporarily role and therefore non-rigid; a cat can become a pet and stop being a pet without ceasing to exist Felix is born as a cat and will always be a cat, but during some period Felix can become a pet and stop being a pet while he continuous to exist as a cat All 2.1 million species are rigid concepts

    15. GWC2010, Mumbai 15 What does the computer need to know? Roles and processes in documents have more information value than the defining properties of species: Species defined in terms of physical properties already known to expert; Roles such as "invasive species", "migration species", "threatened species" express THE important properties of instances of species Roles are typically the terms we learn from the text not the species!

    16. GWC2010, Mumbai 16 Wordnet-ontology-relations Rigid synset relations to ontology: Synset:Endurant(Object); Synset:Perdurant(Event); Synset:Quality: sc_equivalenceOf (= relation in WN-SUMO) or sc_subclassOf (+ relation in WN-SUMO) Non-rigid synset relations to ontology: Synset:Role; Synset:Endurant(Object); Synset:Perdurant(Event) sc_domainOf: range of ontology types that restricts a role sc_playRole: role that is being played sc_participantOf: the process in wich the role is played Rigidity can be detected automatically (Rudify, 80% precision, IAG 80%) and is stored in wordnets as attributes to synsets

    17. Global Wordnet Grid Model

    18. Global Wordnet Grid Model

    19. Wordnet to ontology mappings {create, produce, make}Verb, English -> sc_ equivalenceOf construction {artifact, artefact}Noun, English -> sc_domainOf physical_object -> sc_playRole result-existence -> sc_participantOf construction {kunststof}Noun, Dutch // lit. artifact substance -> sc_domainOf amount_of_matter -> sc_playRole result-existence -> sc_participantOf construction

    20. Wordnet to ontology mappings {teacher}Noun, English -> sc_domainOf human -> sc_playRole done-by -> sc_participantOf teach {leraar}Noun, Dutch // lit. male teacher -> sc_domainOf man -> sc_playRole done-by -> sc_participantOf teach {lerares}Noun, Dutch // lit. female teacher -> sc_domainOf woman -> sc_playRole done-by -> sc_participantOf teach

    21. Wordnet-LMF <LexicalEntry id="footmark"> <Lemma writtenForm="footmark" partOfSpeech="n"/> <Sense id="footmark_1" synset="eng-30-06645039-n"> <MonolingualExternalRefs> <MonolingualExternalRef externalSystem="Wordnet3.0" externalReference="" /> </MonolingualExternalRefs> </Sense> </LexicalEntry> <Synset/> <SenseAxis/> <SenseAxis id="sa_ita16-eng30_001" relType="eq_synonym"> <Target ID="ita-16-1251-n" /> <Target ID="eng-30-13480848-n"/> </SenseAxis>

    22. WN-LMF Synset relations <Synset id="eng-30-06645039-n" baseConcept="0"> <!-- footprint --> <Definition gloss="mark of a foot or shoe on a surface"> <Statement example="the police made casts of the footprints in the soft earth outside the window" /> </Definition> <OntologicalMetaProperties rigidValue=”1”> <rigid score=”0.57” author=”Rudify1.0” date="2008-07-01"> <non-rigid score=”0.09” author=”Rudify1.0” date="2008-07-01"> </OntologicalMetaProperties> <SynsetRelations/> <MonolingualExternalRefs> <MonolingualExternalRef externalSystem="SUMO" reference="superficialPart" relType="at"/> <MonolingualExternalRef externalSystem="KYO" reference="mark" relType="sc_subclassOf"/> </MonolingualExternalRefs> </Synset>

    23. WN-LMF Synset relations <Synset id="eng-30-02356039-n" baseConcept="0"> <!-- migration bird --> <Definition gloss="birds that migrate in winter to warmer regions"/> <OntologicalMetaProperties rigidValue=”0”> <rigid score=”0.00” author=”Rudify1.0” date="2008-07-01"> <non-rigid score=”0.69” author=”Rudify1.0” date="2008-07-01"> </OntologicalMetaProperties> <SynsetRelations/> <MonolingualExternalRefs> <Statement> <MonolingualExternalRef externalSystem="KYO" reference="bird" relType="sc_domainOf"/> <MonolingualExternalRef externalSystem="KYO" reference="done-by" relType="sc_playRole"/> <MonolingualExternalRef externalSystem="KYO" reference="migration" relType="sc_participantOf"/> </Statement> </MonolingualExternalRefs> </Synset>

    24. GWC2010, Mumbai 24 Division of labor in knowledge sources

    25. GWC2010, Mumbai 25 How to make inferences? Sparql queries to large Virtuoso databases: Aligned Species 2000, DBPedia Sql queries to term database Graph matching on wordnets stored in DebVisDic Reasoning on a small ontology

    26. KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong 26 Ontotagger applied to KAF Apply WSD to every term in the KAF representation of a text For each term in KAF representation of a text: If wordnet synset (WSD) then check for ontology mappings, if none traverse wordnet hierarchy to find first mapping Else check the SKOS database for wordnet mapping, if necessary traverse broader relations up to the first wordnet mapping and go to a.) Else check the term database for wordnet mappings, if necessary traverse parent relations up to the first wordnet mapping and go to a.) Collect all mappings from the ontology and all (relevant) ontological implications and insert them into the KAF representation of the text.

    27. KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong 27 Examples Migration birds in the Humber Estuary. The migration of birds to the Humber Estuary Bird migration in the Humber Estuary Birds that migrate to the Humber Estuary

    28. Annotation of ontological implications in KAF <!-- Migration birds in the Humber Estuary --> <term lemma=“migration bird” pos=”N.pl”> <externalReference> <!-- Tagging terms with ontological implications based on wordnet mappings --> <externalRef resource=“ontology" relation=”sc_domainOf” reference=“bird"/> <externalRef resource=“ontology" relation=“sc_participantOf” reference=“migration"/>  <externalRef resource=“ontology” relation=“sc_playRole" reference=“done-by"/> <externalRef resource=“ontology" relation=“implied” reference=“ done-by" some=”physical-plurality”/>  <externalRef resource=“ontology" relation=“implied” reference=“ has-destination" some=”particular”/>  <externalRef resource=“ontology" relation=“implied” reference=“ has-source" some=”particular”/>  <externalRef resource=“ontology" relation=“implied” reference=“ has-path" some=”particular”/>  </externalReference> </term> <term lemma=”in” pos=”P”/> <term lemma = “Humber Estuary”><externalRef resource=“ontology” reference=“location"/>

    29. <!-- Bird migration in the Humber Estuary --> <term lemma=“bird” pos = “N.pl”> <externalReference> <!-- Tagging terms with ontological implications based on wordnet mappings --> <externalRef resource=“ontology" relation=”sc_equivalentOf” reference=“bird"/> </externalReference> </term> <term lemma=“migration” pos=”N”> <externalReference> <!-- Tagging terms with ontological implications based on wordnet mappings --> <externalRef resource=“ontology" relation=“sc_equivalentOf” reference=“migration"/>  <externalRef resource=“ontology" relation=“implied” reference=“ done-by" some=”physical-plurality”/>  <externalRef resource=“ontology" relation=“implied” reference=“ has-destination" some=”particular”/>  <externalRef resource=“ontology" relation=“implied” reference=“ has-source" some=”particular”/>  <externalRef resource=“ontology" relation=“implied” reference=“ has-path" some=”particular”/>  </externalReference> </term> <term lemma=”in”/> <term lemma = “Humber Estuary”><externalRef resource=“ontology” reference=“location"/> Annotation of ontological implications in KAF

    30. <!-- Birds that migrate to the Humber Estuary --> <term lemma=“bird” pos=”N.pl”> <externalReference> <!-- Tagging terms with ontological implications based on wordnet mappings --> <externalRef resource=“ontology" relation=”sc_equivalentOf” reference=“bird"/> </externalReference> </term> <term lemma=“migrate” pos=”V”> <externalReference> <!-- Tagging terms with ontological implications based on wordnet mappings --> <externalRef resource=“ontology" relation=“sc_equivalentOf” reference=“migration"/>  <externalRef resource=“ontology" relation=“implied” reference=“ done-by" some=”physical-plurality”/>  <externalRef resource=“ontology" relation=“implied” reference=“ has-destination" some=”particular”/>  <externalRef resource=“ontology" relation=“implied” reference=“ has-source" some=”particular”/>  <externalRef resource=“ontology" relation=“implied” reference=“ has-path" some=”particular”/>  </externalReference> </term> <term lemma=”to”/> <term lemma = “Humber Estuary”><externalRef resource=“ontology” reference=“location"/> Annotation of ontological implications in KAF

    31. KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong 31 Kybot profiles IF <! bird migration to HE> T1 + to + T2 & T1.impliedType="change_of_location" & T1.impliedRole="has-target" & T2.Type="location" THEN <location-target, T1, T2> IF <! species migration from HE> T1 + from + T2 & T1.impliedType="change_of_location" & T1.impliedRole="has-source" & T2.Type="location" THEN <location-source, T1, T2>

    32. Kybot Knowledge Patterns <events> <event eid="e1" target="t2" lemma="feed" pos="V" tense="PAST" aspect="NONE" polarity="POS"/> <event eid="e2" target="t20" lemma="migrate" pos="V" tense="PRESENT" aspect="NONE" polarity="POS"/> <role rid="r1" event="e1" target="t1" rtype="agent"/> <role rid="r2" event="e1" target="t3" rtype="patient"/> <role rid="r3" event="e1" target="t9" rtype="theme"/> <role rid="r3" event="e2" target="t21" rtype="agent"/> <role rid="r4" event="e2" target="t22" rtype="source"/> <role rid="r5" event="e2" target="t24" rtype="goal"/> </events>

    33. GWC2010, Mumbai 33 Conclusion: Should all knowledge be stored in the central ontology? Vocabularies are too large for full inferencing Vocabularies are linguistically too diverse to be represented in an ontology Inferencing capabilities of formal ontologies is not needed for all levels of knowledge A model of division of labor (along the lines of Putnam 1975) in which knowledge is stored in 3 layers: SKOS vocabularies and term databases wordnet (WN-LMF) ontology (OWL-DL), Each layer supports different types of inferencing ranging from Sparql queries, graph algorithms to reasoning. Mapping relations that support the division of labour and different types of inferencing and that allow for the encoding of language-specific lexicalizations and restrictions.

    34. Conclusions Ontologies are abstract and minimal and lexicons are large and rich Semantic relations in lexicons are complementary to ontological relations Semantic relations expressed in a language system should be compatible with ontologies Large vocabularies of types (rigid things in the world) can be mapped to the ontology through combinations of lexical relations and basic ontological mappings Lexicalizations of contextual and subjective concepts need to be expressed through more complex relations Equivalences across languages partially through ontological expressions and partially across lexicons

    35. Applying WSD to terms

    36. GWC2010, Mumbai 36 How to integrate the data? Species 2000 vocabulary: 2,171,281 concepts in MySql database with parent relations: Kingdom -> Class -> Order -> Family -> Genus -> Species -> Infra species Animalia -> Chordata -> Amphibia -> Anura -> Leptodactylidae -> Eleutherodactylus -> Eleutherodactylus augusti Converted to SKOS format Aligned with DBPedia for language labels Aligned with Wordnet using vocabulary and relation mappings Published in Virtuoso, accessed with SPARQL queries

    37. GWC2010, Mumbai 37 How to integrate data? Extending language labels using DBPedia

    38. GWC2010, Mumbai 38 Vocabulary match with Wordnet synsets If polysemous then SSI-Dijkstra weighting of senses based on the hyperonym chain Results still to be evaluated: Animalia (animal:1)-> Chordata (chordate:1) -> Amphibia (amphibian:3) -> Anura -> Leptodactylidae -> Eleutherodactylus -> Eleutherodactylus augusti (barking frog:1) How to integrate data? Alignment Species 2000 with wordnet

    39. GWC2010, Mumbai 39 Word-sense-disambiguation is applied to terms in KAF (Kyoto Annotation Format) Term hierarchy is extracted from KAF: land:5 grassland:1 -> biome:1 woodland:1 -> biome:1 cropland urban land Results still to be evaluated: SemEval2010 How to integrate data? Alignment of terms with wordnet

More Related