slide1
Download
Skip this Video
Download Presentation
Asian Language Resources Summit , Phuket, March, 2009

Loading in 2 Seconds...

play fullscreen
1 / 38

Asian Language Resources Summit , Phuket, March, 2009 - PowerPoint PPT Presentation


  • 49 Views
  • Uploaded on

KYOTO ( ICT - 211423) Y ielding O ntologies for T ransition-Based O rganization FP7: Intelligent Content and Semantics http://www.kyoto-project.eu/ Piek Vossen, VU University Amsterdam. Asian Language Resources Summit , Phuket, March, 2009. Overview. Background information

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Asian Language Resources Summit , Phuket, March, 2009' - derora


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
KYOTO (ICT-211423)Yielding Ontologies for Transition-Based OrganizationFP7: Intelligent Content and Semantics http://www.kyoto-project.eu/

Piek Vossen, VU University Amsterdam

Asian Language Resources Summit, Phuket, March, 2009

overview
Overview
  • Background information
  • Baseline for retrieval in environment domain
  • System architecture
  • Knowledge mining
  • Conclusions

Asian Language Resources Summit, Phuket, March, 2009

kyoto ict 211423 overview
KYOTO (ICT-211423) Overview
  • Title: Knowledge Yielding Ontologies for Transition-Based Organization
  • Funded:
    • 7th Framework Program-ICT of the European Union: Intelligent Content and Semantics
    • Taiwan and Japan funded by national grants
  • Goal:
    • Open and free platform for knowledge sharing across languages and cultures
    • Wiki environment that allows people in the field to maintain their knowledge and agree on meaning without knowledge engineering skills
    • Bootstrap through open text mining & concept learning
    • Enables knowledge transition and information search across different target groups, transgressing linguistic, cultural and geographic boundaries.
    • Enables deep semantic search for facts and knowledge
  • URL: http://www.kyoto-project.eu/ (http://www.kyoto-project.eu/)
  • Duration:
    • March 2008 – March 2011
  • Effort:
    • 364 person months of work.

Asian Language Resources Summit, Phuket, March, 2009

consortium
Consortium
  • Vrije Universiteit Amsterdam (Amsterdam, The Netherlands),
  • Consiglio Nazionale delle Ricerche (Pisa, Italy),
  • Berlin-Brandenburg Academy of Sciences and Humantities (Berlin, Germany),
  • Euskal Herriko Unibertsitatea (San Sebastian, Spain),
  • Academia Sinica (Tapei, Taiwan),
  • National Institute of Information and Communications Technology (Kyoto, Japan),
  • Irion Technologies (Delft, The Netherlands),
  • Synthema (Rome, Italy),
  • European Centre for Nature Conservation (Tilburg, The Netherlands),
  • Subcontractors:
    • World Wide Fund for Nature (Zeist, The Netherlands),
    • Masaryk University (Brno, Czech)

Asian Language Resources Summit, Phuket, March, 2009

kyoto ict 211423 overview1
KYOTO (ICT-211423) Overview
  • Languages:
    • English, Dutch, Italian, Spanish, Basque, Chinese, Japanese
  • Domain:
    • Environmental domain, BUT usable in any domain
  • Global:
    • Both European and non-European languages
  • Available:
    • Free: as open source system and data (GPL)
  • Future perspective:
    • Content standardization that supports world wide communication

Asian Language Resources Summit, Phuket, March, 2009

baseline for environment domain
Baseline for environment domain
  • Mainly use Google, first 10 hits, no advanced options
  • Textual search with linguistic enhancements but no real semantic search:
    • polluted water….
    • polluting water….
  • Growing time & information pressure:
    • deliver actual information from diverse & dynamic sources
    • regional, local situations ►no general source
    • various subdomains ► government, legal, biology, health, industry
    • difficult access ► scientific publications
    • no time to read ► too much information and work pressure
    • dependent on trust: scientists ► environmentalist ►government ►general public

Asian Language Resources Summit, Phuket, March, 2009

high level targets low level questions
High-level targets &Low-level questions
  • High level target (about 300 questions collected)
    • Are there huge negative effects with regard to ecological networks and alien invasive species?
  • Low level facts that support answering the high level targets:
    • cases of alien invasion
    • amount of species
    • causal relations associated with these (increments of) invasions
    • causes related to ecological networks
    • limit in the same time and location boundary

Asian Language Resources Summit, Phuket, March, 2009

baseline retrieval results 6 persons 30 high level questions
Baseline retrieval results 6 persons, 30 high-level questions,

Asian Language Resources Summit, Phuket, March, 2009

kyoto s solution
KYOTO's Solution
  • Text mining:
    • Massive and accurate indexing of facts from vast amounts of text;
    • In any language/culture from scattered sources;
    • Again and again to detect trends and changes;
    • Direct relation between knowledge modeling effort and text mining
  • Knowledge modeling:
    • automatic learning of terms and concepts from text in any language;
    • formalization of knowledge in computer usable format -> wordnets & ontologies
  • Community software:
    • For experts in the field and not knowledge engineers
    • Continuous and collaborative effort:
      • adapt to the changing domain;
      • consensus in the field;
      • consensus across languages and cultures
    • Produce interoperable, formal, standardized knowledge structures;
    • Relate knowledge structure to expressions in languages

Asian Language Resources Summit, Phuket, March, 2009

slide12
Distributed, diverse & dynamic data

1

Citizens

4

Governments

maintain

terms & concepts

Companies

Wikyoto

Capture text:

"Sudden increase of

CO2 emissions in 2008 in Europe"

Ontology

2

Top

Abstract

Physical

Tybot: term yielding robot

Wordnets

Process

Substance

3

CO2 emission

Middle

H20

CO2

H20

Pollution

CO2

Emission

Greenhouse

Gas

Domain

Kybot: knowledge yielding robot

Index facts:

Process: Emission

Involves: CO2

Property: increase, sudden

When: 2008

Where: Europe

5

6

Text & Fact Index

Semantic

Search

Environmental organizations

slide14
Multilingual Knowledge Base

Linguistic Processor

2

Kybot

Wikyoto

Semantic & Syntactic Base

Kyoto Annotation

Format (KAF)

Fact Extractor

Wiki Term Editor

1

3

Fact Base

Tybot

Term Base

Term Extractor

Semantic Search

Original Document

Base

Keyword Search

Wordnets

Ontologies

interlinked

Concept User

Fact User

Data Flow Diagram of Kyoto System

End User

End User

kyoto annotation format kaf
Kyoto Annotation Format KAF

ENG-3.0-107695012-N

  • Kyoto Annotation Format (Level 1)

a multi-layered annotation format for:

    • Tokenizaton and word form segmentation
    • POS tagging
    • Lemmatization and Term extraction
    • Constituency Tagging
    • Dependency Tagging

Asian Language Resources Summit, Phuket, March, 2009

semantic annotation
Semantic Annotation

no synsets

  • Semantic Annotation Format for:
    • Named Entity Recognition (time, events, quant. …)
    • Word Sense Disambiguation (D-WSD)
    • Semantic Role Labeling (SRL)
  • KAF level2 (SemKAF)

ENG-3.0-107630294-N

Asian Language Resources Summit, Phuket, March, 2009

kaf annotation wsd
KAF annotation: WSD

.......

data formats
Data formats

Level of annotation:

  • Morpho-syntax annotation
  • Semantic annotation
  • Terms representation
  • Facts annotation
  • Wordnets
  • Ontologies
  • Standard format
  • }KAF <=(MAF, SYNAF, SEMAF)
  • TMF
  • KAF
  • Wordnet-LMF
  • OWL

Asian Language Resources Summit, Phuket, March, 2009

knowledge mining1
Knowledge mining
  • Concept mining (Tybots):
    • Extract terms and relations in a language
    • Map the terms to an existing wordnet
    • Ontologize terms to concepts and axioms
  • Fact mining (Kybots)
    • Define logical patterns
    • Define expression rules in a language

Asian Language Resources Summit, Phuket, March, 2009

what tybots do
What Tybots do...
  • Input are text documents
  • Linguistic processors generate KAF annotation (sequential):
    • morpho-syntactic analysis
    • semantic roles
    • named entities
    • wordnet and ontology mappings
  • Output are term hierarchies in TMF (generic):
    • structural parent relations
    • quantified structural and semantic relations
    • statistical data

Asian Language Resources Summit, Phuket, March, 2009

slide22
English Wordnet

Ontology

Term hierarchy

location:3

substance:1

naturalprocess:1

of

Synthesize

Ontologize

Abstract

Physical

region:3

area

emission

gas

emission:3

Process

Substance

geographical

area:1

area:1

gas:1

CO2

emission:2

greenhouse

gas

agricultural

area

Chemical

Reaction

H20

CO2

GreenhouseGas

greenhouse gas:1

rural area:1

in

CO2

GlobalWarming

CO2Emission

farmland:2

WaterPollution

Axiomatize

Conceptual modeling

Source

Documents

[[the emission]NP

[of greenhouse gases]PP

[in agricultural areas]PP] NP

TYBOT

Concept

Miners

Linguistic

Processors

Morpho-syntactic analysis

(instance s1 Substance) (instance e1 Warming) (katalyist s1 e1)

Asian Language Resources Summit, Phuket, March, 2009

what kybots do
What Kybots do
  • Input:
    • KAF annotations of text: sequential & encoded by language
    • Conceptual frame from the ontology
    • Expression rules for frame to language mapping:
      • Wordnet in a language
      • Morpho-syntactic mappings rules
  • Output are a database of facts in FactAF (generic):
    • aggregated facts
    • inferred facts
    • language neutral

Asian Language Resources Summit, Phuket, March, 2009

fact mining
Fact mining
  • KYBOT = Knowledge Yielding Robot
  • Logical expression
    • (instance, e1, Burn) (instance, e2, Warming) (cause, e1, e2)
    • (instance, s1, CO2) (instance, e1, GlobalWarming) (katalyist, s1,e1)
  • Expression rules per language:
    • [N[s1]V[e1]]S e.g. "CO2 is emitted", "fine dust blocks sun-light"
    • [N[s1]N[e1]N e.g. "CO2 emission", "sun-light blocking"
    • [[N[e1]][prep][N[s2]]NP e.g. "emission of CO2", "sun light blocking by fine dust"
  • Ontology * Wordnets
    • Capabilities: WNT -> adjectives ("explosive", "toxic"), WNT -> nouns ("explosive", "poison")
    • Causes: WNT -> verbs ("eat") , WNT -> nouns ("consumption")
    • Process: DamageProcess, ProduceProcess
  • Kybot compiler
    • kybots = logical pattern+ ontology + WN[Lx] + ER[Lx]

Asian Language Resources Summit, Phuket, March, 2009

fact mining by kybots
Source

Documents

Morpho-syntactic analysis (KAF)

[[the emission]NP

[of greenhouse gases]PP

[in agricultural areas]PP] NP

Logical

Expressions

Fact analysis

Generic

[[the emission]NP ] Process: e1

[of greenhouse gases]PP Patient: s2

[in agricultural areas]PP] Location: a3

Domain

Fact mining by Kybots

Linguistic

Processors

Ontology

Wordnets &

Linguistic Expressions

Abstract

Physical

Patient

Substance

Process

  • semantic role labelling
  • time & place
  • aggregation from all relevant phrases and documents
  • inferencing
  • adding trust and reliability

Chemical

Reaction

H2O

CO2

Patient

CO2

emission

water

pollution

Asian Language Resources Summit, Phuket, March, 2009

slide27
Facts in RDF

Wordnets in LMF

Ontologies in OWL-DL

G-WN

G-KON

SUMO

DOLCE

GEO

plugin

plugin

DE-WN

DE-KON

WIKIPEDIA

FRAMENET

pdf

Simplified

Term Fragment

Simplified

Ontology Fragment

population

Group

?Population

marine

species

terrestrial

species

Interview

Interview

Do populations

consist of

marine species?

Smart Kytext

Are terrestrial species

a type of

populations?

.... populations

such as

terrestrial and

marine species .....

.... populations

declined

.....terrestrial and

marine species..

in forests

.....declined

FactAF

KAF

Kybots

Kyoto

Server

KAF

Tybots

DE-TN

Hidden

Shown

A..

...

decline

...

population

...

..Z

Do populations

always consist of

marine species?

Are terrestrial species

never

marine species?

Asian Language Resources Summit, Phuket, March, 2009

kyoto knowledge base
Kyoto Knowledge Base

Domain

WnJP

Domain

Domain

WnIT

WnNL

Domain Ontology

Ontology

Domain

Domain

Ontology

WnES

WnEN

Domain

Domain

WnEU

WnCH

ultimate goal
Ultimate goal
  • Global standardization and anchoring of meaning such that:
    • Machines can start to approach text understanding -> semantic web connects to the current web
    • Communities can dynamically maintain knowledge, concepts and their terms in an easy to use system
    • Cross-linguistic and cross-cultural sharing and communication of knowledge is enabled
  • Establish a Global-Wordnet-Grid: formalization of Wikipedia for humans AND machines across languages

Asian Language Resources Summit, Phuket, March, 2009

global wordnet grid
Fahrzeug

1

Auto

Zug

2

vehicle

German Words

1

car

train

2

English Words

3

3

vehículo

1

auto

tren

veicolo

1

2

Spanish Words

auto

treno

2

Italian Words

Global WordNet Grid

Inter-Lingual

Ontology

voertuig

1

auto

trein

Object

2

liiklusvahend

Dutch Words

1

Device

auto

killavoor

TransportDevice

2

Estonian Words

véhicule

1

voiture

train

2

dopravní prostředník

French Words

1

auto

vlak

2

Asian Language Resources Summit, Phuket, March, 2009

Czech Words

linking open data dataset cloud
Wordnet

environment

terms

Wordnet

environment

terms

Wordnet

environment

terms

Wordnet

environment

terms

Wordnet

environment

terms

Linking Open Data dataset cloud

http://richard.cyganiak.de/2007/10/lod/

legal

facts

environment

facts

medical

facts

Wordnet

sailing

terms

Wordnet

legal

terms

Wordnet

medical

terms

Ontology

environment

concepts

Ontology

legal

concepts

Ontology

medical

concepts

Ontology

sailing

concepts

Asian Language Resources Summit, Phuket, March, 2009

kyoto main assets
Kyoto main assets
  • Wiki platform (WIKYOTO) for connecting, transferring and controlling knowledge and information across people and computers
  • Term yielding robots (TYBOT): software that extracts terms and concepts from documents
  • Knowledge yielding robots (KYBOT): fact extraction software that generates a comprehensive list of facts from collection of sources
  • Fact repositories & fact alert: reports changes in facts on a collection of sources
  • Domain WORDNETS and domain ONTOLOGIES
  • Create the backbone for the Global Wordnet Grid

Asian Language Resources Summit, Phuket, March, 2009

what makes kyoto unique
What makes KYOTO unique?
  • Integrates & combines all ► knowledge engineering, language engineering, wikis, term & concept learning, fact mining from text in and across languages, & standardization
  • Direct relation between concept modeling and text mining ► make it worth the effort
  • Wikyoto community tool ► hides technology and complex knowledge and language representation
  • Operated by community people and not by knowledge engineers and language technology people ► exploits massive labor force of communities all over the world

Asian Language Resources Summit, Phuket, March, 2009

what makes kyoto unique1
What makes KYOTO unique?
  • Text mining and ontology learning developed for separate languages
    • ►KYOTO multi and cross-lingual & cultural
    • ►cross-lingual and cross-cultural semantic interoperability
  • Text mining and ontology learning is often limited to a specific domain and/or application ►KYOTO for any domain and application
  • Text mining and ontology learning does not relate the terms and concepts to generic language and knowledge resources ►KYOTO anchors knowledge from a community to general vocabulary and likewise to other communities

Asian Language Resources Summit, Phuket, March, 2009

contribution of kyoto
environment

facts

Wordnet

environment

terms

Wordnet

environment

terms

Wordnet

environment

terms

Wordnet

environment

terms

Ontology

environment

concepts

Contribution of KYOTO
  • KYOTO delivers a Web 2.0 environment for community based control
  • Connects people across language and cultures
  • Establish consensus and knowledge transition
  • KYOTO learns terms and concepts from text documents,
  • Stored as structures that people and computers understand
  • hundreds of thousands sources in the environment domain
  • in many different languages
  • spread all over the world
  • changing every day
  • KYOTO enables semantic search and fact extraction
  • Software can partially understand language and exploit web 1 data
  • Understanding is helped by the terms and concepts defined for each language

html

pdf

xls

KYBOT

WIKYOTO

TYBOT

Asian Language Resources Summit, Phuket, March, 2009

ad