slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
歐盟科研架構計畫之人文及社會科學領域 (EU-FP7 SSH) 計畫徵求說明會 2009.12.30 國立中山大學 - 歐盟科研架構計畫之人文及社會科學國家聯絡據點 PowerPoint Presentation
Download Presentation
歐盟科研架構計畫之人文及社會科學領域 (EU-FP7 SSH) 計畫徵求說明會 2009.12.30 國立中山大學 - 歐盟科研架構計畫之人文及社會科學國家聯絡據點

Loading in 2 Seconds...

play fullscreen
1 / 44

歐盟科研架構計畫之人文及社會科學領域 (EU-FP7 SSH) 計畫徵求說明會 2009.12.30 國立中山大學 - 歐盟科研架構計畫之人文及社會科學國家聯絡據點 - PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on

KYOTO ( ICT - 211423) Y ielding O ntologies for T ransition-Based O rganization FP7: Intelligent Content and Semantics http://www.kyoto-project.eu/ Chu-Ren Huang 黃居仁 , Academia Sinica. 歐盟科研架構計畫之人文及社會科學領域 (EU-FP7 SSH) 計畫徵求說明會 2009.12.30 國立中山大學 - 歐盟科研架構計畫之人文及社會科學國家聯絡據點. Overview.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '歐盟科研架構計畫之人文及社會科學領域 (EU-FP7 SSH) 計畫徵求說明會 2009.12.30 國立中山大學 - 歐盟科研架構計畫之人文及社會科學國家聯絡據點' - ramona-smith


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

KYOTO (ICT-211423)Yielding Ontologies for Transition-Based OrganizationFP7: Intelligent Content and Semantics http://www.kyoto-project.eu/

Chu-Ren Huang 黃居仁,Academia Sinica

歐盟科研架構計畫之人文及社會科學領域(EU-FP7 SSH)計畫徵求說明會 2009.12.30

國立中山大學-歐盟科研架構計畫之人文及社會科學國家聯絡據點

December 30 2009, NSYSU, Kaohsiung

overview
Overview
  • History: Background information
  • What is KYOTO
  • Personal Journey: Building an internationally recognized career on Taiwan-based research
  • Key Perspectives:
    • Global View
    • Integrative Thinking/Opposable Mind

December 30 2009, NSYSU, Kaohsiung

history background information

History: Background information

December 30 2009, NSYSU, Kaohsiung

pre history
Pre-History
  • Pioneering Chinese Language Resources and Language Processing: since 1988
  • Construction of WordNet – Since 2000
  • Organized COLING: 2002
  • ISLE: International Standards in Language Engineering 2000-2002
    • EC (ISLE – IST-1999-10647)+NSF+Asia

December 30 2009, NSYSU, Kaohsiung

brief history of kyoto
Brief History of KYOTO
  • January 2006: Concept of Global WordNet Grid
  • 2006 discussion of possibilities
  • January 2007: Meeting in Kyoto

(Amsterdam, Princeton/Berlin, Pisa, Kyoto, Taipei)

    • Identify the FP7 call to submit to
    • -Identify ecology/environment as the domain

December 30 2009, NSYSU, Kaohsiung

application timeline i 2007
Application Timeline I (2007)
  • feb-15:  General comments
  • feb-15:  Contact end-users
  • feb-22:  Find out the possibilities for non-European partners
  • feb-22:  Determine the final consortium a.o. based on the outcome of 2.
  • feb-28:  Determine the details  (part A of the proposal) required from the EU for each partner

December 30 2009, NSYSU, Kaohsiung

application timeline iii 2007
Application Timeline III (2007)
  • mar-apr: Revision and finalizing proposal
  • May 10: formal Submission Acknowledged
  • July 15: Review Result
    • 8 out or 45 project passed review
    • Call ID:* FP7-ICT-2007-1*Instrument:* CP-FP-INFSO*Title:* Knowledge Yielding Ontologies for Transition-based Organization

December 30 2009, NSYSU, Kaohsiung

application timeline iii 20071
Application Timeline III (2007)
  • apr-13: Collect all forms (part A of the proposal) and signatures from the partners (PISA, AMSTERDAM)
  • apr-13: Finalize the proposal part B (PISA, AMSTERDAM)
  • may-02:Submit proposal part A and B (PISA, AMSTERDAM)

December 30 2009, NSYSU, Kaohsiung

what is kyoto

What is KYOTO

December 30 2009, NSYSU, Kaohsiung

kyoto ict 211423 overview
KYOTO (ICT-211423) Overview
  • Title: Knowledge Yielding Ontologies for Transition-Based Organization
  • Funded:
    • 7th Framework Program-ICT of the European Union: Intelligent Content and Semantics
    • Taiwan and Japan funded by national grants
  • Goal:
    • Open and free platform for knowledge sharing across languages and cultures
    • Wiki environment that allows people in the field to maintain their knowledge and agree on meaning without knowledge engineering skills
    • Bootstrap through open text mining & concept learning
    • Enables knowledge transition and information search across different target groups, transgressing linguistic, cultural and geographic boundaries.
    • Enables deep semantic search for facts and knowledge
  • URL: http://www.kyoto-project.eu/ (http://www.kyoto-project.eu/)
  • Duration:
    • March 2008 – March 2011
  • Effort:
    • 364 person months of work.

December 30 2009, NSYSU, Kaohsiung

consortium
Consortium
  • Vrije Universiteit Amsterdam (Amsterdam, The Netherlands),
  • Consiglio Nazionale delle Ricerche (Pisa, Italy),
  • Berlin-Brandenburg Academy of Sciences and Humantities (Berlin, Germany),
  • Euskal Herriko Unibertsitatea (San Sebastian, Spain),
  • Academia Sinica (Tapei, Taiwan),
  • National Institute of Information and Communications Technology (Kyoto, Japan),
  • Irion Technologies (Delft, The Netherlands),
  • Synthema (Rome, Italy),
  • European Centre for Nature Conservation (Tilburg, The Netherlands),
  • Subcontractors:
    • World Wide Fund for Nature (Zeist, The Netherlands),
    • Masaryk University (Brno, Czech)

December 30 2009, NSYSU, Kaohsiung

kyoto ict 211423 overview1
KYOTO (ICT-211423) Overview
  • Languages:
    • English, Dutch, Italian, Spanish, Basque, Chinese, Japanese
  • Domain:
    • Environmental domain, BUT usable in any domain
  • Global:
    • Both European and non-European languages
  • Available:
    • Free: as open source system and data (GPL)
  • Future perspective:
    • Content standardization that supports world wide communication

December 30 2009, NSYSU, Kaohsiung

the taiwan team
The Taiwan Team
  • PI: Chu-Ren Huang
  • Co-I: Jason S. Chang (NTHU), Shu-Kai Hsieh (NTNU), Sue-jin Ker (SCU)
  • Other Participants: Kathleen Ahrens (NTU), Ya-min Chou (MCU), Shu-chuan Tseng (AS)
  • Funded: by NSC

December 30 2009, NSYSU, Kaohsiung

background multilingualism s challenges to hlt
Background: Multilingualism’s Challenges to HLT

The scaling up of language resources in a complex and distributed environment

  • Language resources are inherently distributed
  • Language resources are best created and updated where the language is spoken and by people who speak it: human expertise, updating ling. changes,
  • Impractical to maintain all language resources at the same site: huge quantity, rights

December 30 2009, NSYSU, Kaohsiung

multilingualism challenges to hlt ii
Multilingualism: Challenges to HLT II

The scaling up of language resources in a complex and distributed environment

  • To overcome linguistic diversity to support shared tasks and applications: web search etc.
  • To create synergy of information from different languages
  • To function as a foundation of inter-cultural collaboration

December 30 2009, NSYSU, Kaohsiung

proposed answer to the challenge
Proposed Answer to the Challenge

Wordnet as shared language resource

  • Wordnet: a concept-driven and relation-based lexical knowledgebase
    • About 40 language wordnets have been built
    • Sharing basic representation of meaning (synset indexes), which is mapped to an upper ontology (SUMO, among others)
    • Sharing a (universal) set of lexical semantic relations

Information can be exchange using the same format regardless of source language

December 30 2009, NSYSU, Kaohsiung

proposed answer to the challenge1
Proposed Answer to the Challenge

Wordnets as Web Services

  • Wordnet are distributed, just like grid nodes
    • Each wordnet site will be a grid node
    • Each will be a natural hosts for language related information service based on wordnet
    • Including any meta-NLP task: bootstrapping wordnets, harmonizing ontologies, building bilingual lexica, supporting cross-lingual alignments, etc.
    • And applications: multilingual query expansion, second language e-learning, machine translation, etc.

December 30 2009, NSYSU, Kaohsiung

the global wordnet grid
The Global Wordnet Grid
  • First discussed at the 3rd GWA at Jeju, Korea in February 2006, by Chu-Ren Huang, Adam Pease, and Pied Vossen, among others
  • A call for contribution can be found on GWA website

http://www.globalwordnet.org/gwa/gwa_grid.htm

  • Small scale experiment being carried out by ILC-CNR (Italy) and Academia Sinica (Taiwan) teams
    • Soria et al. (2006)
  • Planned strategic session in January 2007 in Kyoto

December 30 2009, NSYSU, Kaohsiung

baseline retrieval results 6 persons 30 high level questions
Baseline retrieval results 6 persons, 30 high-level questions,

December 30 2009, NSYSU, Kaohsiung

kyoto s solution
KYOTO's Solution
  • Text mining:
    • Massive and accurate indexing of facts from vast amounts of text;
    • In any language/culture from scattered sources;
    • Again and again to detect trends and changes;
    • Direct relation between knowledge modeling effort and text mining
  • Knowledge modeling:
    • automatic learning of terms and concepts from text in any language;
    • formalization of knowledge in computer usable format -> wordnets & ontologies
  • Community software:
    • For experts in the field and not knowledge engineers
    • Continuous and collaborative effort:
      • adapt to the changing domain;
      • consensus in the field;
      • consensus across languages and cultures
    • Produce interoperable, formal, standardized knowledge structures;
    • Relate knowledge structure to expressions in languages

December 30 2009, NSYSU, Kaohsiung

slide23

Distributed, diverse & dynamic data

1

Citizens

4

Governments

maintain

terms & concepts

Companies

Wikyoto

Capture text:

"Sudden increase of

CO2 emissions in 2008 in Europe"

Wordnets

Ontology

2

Top

Abstract

Physical

Tybot: term yielding robot

Process

Substance

3

CO2 emission

Middle

H20

CO2

H20

Pollution

CO2

Emission

Greenhouse

Gas

Domain

Kybot: knowledge yielding robot

Index facts:

Process: Emission

Involves: CO2

Property: increase, sudden

When: 2008

Where: Europe

5

6

Text & Fact Index

Semantic

Search

Environmental organizations

December 30 2009, NSYSU, Kaohsiung

available data repositories
Available data repositories
  • Open data project:
    • DBPedia: 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples).
    • GeoNames
  • Domain database Species 2000: 2,1 million species
  • Term database: 500,000 terms per 10,000 documents per language
  • Wordnets for 7 languages: about 50,000 to 120,000 synsets per language
  • Ontologies: EuroWordNet top ontology, SUMO, DOLCE

December 30 2009, NSYSU, Kaohsiung

how to integrate the data
How to integrate the data?
  • Species 2000 vocabulary: 2,171,281 concepts in MySql database with parent relations:
    • Kingdom -> Class -> Order -> Family -> Genus -> Species -> Infra species
    • Animalia -> Chordata -> Amphibia -> Anura -> Leptodactylidae -> Eleutherodactylus -> Eleutherodactylus augusti
  • Converted to SKOS format
  • Aligned with DBPedia for language labels
  • Aligned with Wordnet using vocabulary and relation mappings
  • Published in Virtuoso, accessed with SPARQL queries

December 30 2009, NSYSU, Kaohsiung

how to integrate data extending language labels using dbpedia
How to integrate data?Extending language labels using DBPedia

December 30 2009, NSYSU, Kaohsiung

kyoto knowledge base
Kyoto Knowledge Base

500K

T

Terms

Domain

Domain

T

T

Domain

Wn

2,100K

V

Wn

Vocabularies

Wn

500K

Terms

Ontology

Base concepts

Domain

Domain

Wn

DBPedia

Wn

2,100K

Domain

Domain

Vocabularies

T

V

Wn

Wn

V

T

Domain

Domain

DOLCE/OntoWordnet

December 30 2009, NSYSU, Kaohsiung

T

V

should all knowledge be stored in the central ontology
Should all knowledge be stored in the central ontology?
  • Vocabularies are too large for full inferencing
  • Vocabularies are linguistically too diverse to be represented in an ontology
  • Inferencing capabilities of formal ontologies is not needed for all levels of knowledge
  • A model of division of labor (along the lines of Putnam 1975) in which knowledge is stored in 3 layers:
    • SKOS vocabularies and term databases
    • wordnet (WN-LMF)
    • ontology (OWL-DL),
  • Each layer supports different types of inferencing ranging from Sparql queries, graph algorithms to reasoning.
  • Mapping relations that support the division of labour and different types of inferencing and that allow for the encoding of language-specific lexicalizations and restrictions.

December 30 2009, NSYSU, Kaohsiung

what does the computer need to know
What does the computer need to know?
  • Distinction between rigid and non-rigid (Welty & Guarino 2002):
    • being a "cat" is essential to individual's existence and therefore rigid
    • being a "pet" is a temporarily role and therefore non-rigid; a cat can become a pet and stop being a pet without ceasing to exist
    • Felix is born as a cat and will always be a cat, but during some period Felix can become a pet and stop being a pet while it continuous to exist
  • All 2.1 million species are rigid concepts

December 30 2009, NSYSU, Kaohsiung

what does the computer need to know1
What does the computer need to know?
  • Roles and processes in documents have more information value than the defining properties of species:
    • Species defined in terms of physical properties already known to expert;
    • Roles such as "invasive species", "migration species", "threatened species" express THE important properties of instances of species
  • Telicity: Roles are typically the terms we learn from the text not the species!

December 30 2009, NSYSU, Kaohsiung

division of labor in knowledge sources
Division of labor in knowledge sources

Skos database

Wordnet

Ontology

2.1 million species

100,000 synsets

1,000 types

animal:1

Base Concept

endurant

Animalia

Chordata

physical-endurant

chordate:1

Amphibia

physical-object

vertebrate:1,craniate:1

Anura

amphibian:3

Leptodactylidae

Term database

frog:1, toad:1, toad frog:1,

anuran:1, batrachian:1, salientian:1

Eleutherodactylus

500,000 terms

endemic frog

endangered frog

poisonous frog

alien frog

Eleutherodactylus

atrabracus

Eleutherodactylus

augusti

barking frog

December 30 2009, NSYSU, Kaohsiung

wordnet ontology relations
Wordnet-ontology-relations
  • Rigid synsets:
    • Synset:Endurant; Synset:Perdurant; Synset:Quality:
    • sc_equivalenceOf or sc_subclassOf
  • Non-rigid synsets:
    • Synset: Role
    • sc_domainOf: range of ontology types that restricts a role
    • sc_playRole: role that is being played

December 30 2009, NSYSU, Kaohsiung

lexicalization of process related concepts
Lexicalization of process-related concepts

{create, produce, make}Verb, English

-> sc_ equivalenceOf ConstructionProcess

{artifact, artefact}Noun, English

-> sc_domainOf PhysicalObject

-> sc_playRole ConstructedRole

{kunststof}Noun, Dutch // lit. artifact substance

-> sc_domainOf AmountOfMatter

-> sc_playRole ConstructedRole

{meat}Noun, English

-> sc_domainOf Cow, Sheep, Pig

-> sc_playRole EatenRole

{名 肉, 食物, 餐}Noun, Chinese

-> sc_domainOf Cow, Sheep, Pig, Rat, Mole

-> sc_playRole EatenRole

{غذاء, لحم, طعام}Noun, Arabic

-> sc_domainOf Cow, Sheep

-> sc_playRole EatenRole

December 30 2009, NSYSU, Kaohsiung

how to make inferences
How to make inferences?
  • Sparql queries to large Virtuoso databases: Aligned Species 2000, DBPedia
  • Sql queries to term database
  • Graph matching on wordnets
  • Reasoning on a small ontology

December 30 2009, NSYSU, Kaohsiung

personal journey building an internationally recognized career on taiwan based research

Personal Journey: Building an internationally recognized career on Taiwan-based research

December 30 2009, NSYSU, Kaohsiung

pre history1
Pre-History
  • Pioneering Chinese Language Resources and Language Processing: since 1988
  • Construction of WordNet – Since 2000
  • Organized COLING: 2002
  • ISLE: International Standards in Language Engineering 2000-2002
    • EC (ISLE – IST-1999-10647)+NSF+Asia

December 30 2009, NSYSU, Kaohsiung

key perspectives global view integrative thinking opposable mind

Key Perspectives:Global ViewIntegrative Thinking/Opposable Mind

December 30 2009, NSYSU, Kaohsiung

global view
Global View
  • Think and Act Globally
    • Put what is good for the world before what is good for Taiwan
    • What is good for the world must be good for Taiwan, but what is good for Taiwan (thinking parochially) may not be good for the world
    • Hence cannot be supported by other partners
    • CANNOT be done  NOT GOOD for Taiwan

December 30 2009, NSYSU, Kaohsiung

think globally
Think Globally
  • Research Direction: Think of Global Impact
    • Not of local ranking
    • Find your own niche 寧為雞首,不為牛後
  • Think of the scale of Taiwan
    • And act strategically
    • Contributing Team Partner vs. Team Leader : Choose the RIGHT team, NOT my team

December 30 2009, NSYSU, Kaohsiung

integrative thinking opposable mind
Integrative Thinking/Opposable Mind
  • Create a Win-Win Situation out of a Zero-Sum Game
  • The Opposable Mind (Roger Martin 2007)
  • The Design of Business: Why Design Thinking is the Next Competitive Advantage (Martin 2009)

December 30 2009, NSYSU, Kaohsiung

in sum
In Sum: 友

多聞

December 30 2009, NSYSU, Kaohsiung