towards mutual understanding ontologies ontology matching and their applications n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Towards Mutual Understanding: Ontologies, Ontology Matching, and their Applications PowerPoint Presentation
Download Presentation
Towards Mutual Understanding: Ontologies, Ontology Matching, and their Applications

Loading in 2 Seconds...

play fullscreen
1 / 47

Towards Mutual Understanding: Ontologies, Ontology Matching, and their Applications - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

Towards Mutual Understanding: Ontologies, Ontology Matching, and their Applications. Jingshan Huang Assistant Professor School of Computer and Information Sciences University of South Alabama http://cis.usouthal.edu/~huang/. CIS Department @ UO Eugene, OR May 21, 2010.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Towards Mutual Understanding: Ontologies, Ontology Matching, and their Applications' - ruana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
towards mutual understanding ontologies ontology matching and their applications

Towards MutualUnderstanding: Ontologies, Ontology Matching,and their Applications

Jingshan Huang

Assistant Professor

School of Computer and Information Sciences

University of South Alabama

http://cis.usouthal.edu/~huang/

CIS Department @ UO Eugene, OR May 21, 2010

presentation outline
Presentation Outline

Research Motivation

Learning-Based Ontology Matching – SOCCER

Ongoing Research

Summary

research motivation overview
Research Motivation – Overview
  • Information from heterogeneous sources has different semantics

Long (English)

Long (Chinese Pinyin) -> 龙 ->

  • Integrating the information from heterogeneous sources must make use of all available clues, including syntax, semantics, context, and pragmatics
  • Ontologies are a formal model to encode semantics
  • Ontological techniques are critical in semantic integration
quick facts
Quick Facts

What is Ontology?

a computational model of some domain of the world

describes the semantics of the terms used in the domain

often captured in the form of DAG (directed acyclic graph)

a finite set of concepts + properties + relationships

What is Ontology Heterogeneity?

an inherent characteristic of ontologies developed by different parties for the same (or similar) domains

the heterogeneous semantics may occur in different ways

(1) different terms could be used for the same concept;

(2) an identical term could be adopted for different concepts;

(3) properties and relationships could be different

“translation” is way from good enough, not even close…

What is Ontology Matching?

a.k.a. “Ontology Alignment” or “Ontology Mapping”

the process of determining correspondences between concepts from heterogeneous ontologies

involving many different relationships, e.g., equivalentWith, subClassOf, superClassOf, and siblings

slide5

Heterogeneity in Ontologies – A Simple Example

  • Formal definition of ontologies

A knowledge representation model of some portion of the world

It reflects its designers’ conceptual views

  • Ontology = Concepts + Relationships + Constraints
  • Concept – a category

“President”

  • Property – maps between concepts and data types

“gender” of “President”

  • Relationship – maps between concepts

“President” is a subClassOf “People”

  • Constraint – on properties or relationships

“gender”: range = “male”

Concept semantics: name + properties + relationships

President

sex

Person

female or male

heterogeneity in ontologies running example cont
Heterogeneity in Ontologies – Running Example (cont.)
  • Type “professor university” in Swoogle, 129 different results are returned
  • All created and maintained by ontology professionals
research motivation summary
Semantic integration is important in Computer Science and Information Technology

Ontologies are the foundation for semantic integration; at the same time, they are inherently heterogeneous

The only way out – match/align ontologies such that to understand different semantics

Ontology matching is far from being solved despite its importance and the number of researchers that have investigated it

Research Motivation – Summary
classification for current algorithms
Classification For Current Algorithms

Rule-Based Matching

Consider schema information alone

Specify a set of rules

Apply them to schema information

Learning-Based Matching

Consider both schema and instances

Apply different machine learning techniques

pros and cons for current approaches
Pros and Cons for Current Approaches
  • Rule-Based Matching
    • Is relatively fast ()
    • Ignores instance information ()
    • Uses ad hoc predefined weights ()

concept semantics: name + properties + relationships

  • Learning-Based Matching
    • Obtains extra clues from instances ()
    • Runs longer ()
    • Has difficulty in getting sufficient instances ()
presentation outline1
Presentation Outline

Research Motivation

Learning-Based Ontology Matching – SOCCER

Ongoing Research

Summary

soccer s imilar o ntology c oncept c lust e ring a learning based algorithm
SOCCER (Similar Ontology Concept ClustERing) – a learning-based algorithm

Challenges and main idea

Details

Evaluation

problems with existing matching algorithms
Problems with Existing Matching Algorithms

Rule-Based Matching

Ignores instance information ()

Requires ad hoc predefined weights ()

Learning-Based Matching

Runs longer ()

Has difficulty in getting sufficient instances ()

Try to:

Adopt machine learning techniques to avoid ad hoc predefined weights

Base learning on schema information alone to avoid the difficulty in getting sufficient instances

The goal:

To find equivalent concept pairs among different ontologies, which is the first, and the most critical step in semantic integration

challenges
Challenges

Very difficult for machines to learn how to match ontology schemas by providing schema information alone

Diversities in terminology

Diversities in relationships

Current learning-based algorithms make use of instances, more or less

Anecdotally, instances usually has much less variety than schemas have

main idea of soccer
Main Idea of SOCCER

Equivalent concepts from different ontologies tend to stay “closer” to each other in a clustering space with structural dimensions

Each cluster contains a number of concepts that are from different ontologies and are equivalent to each other

SOCCER aims at finding such clusters by exploiting ontology schemas alone

details overview
Details – Overview

Build a three-dimensional vector for each concept, corresponding to name, properties, and relationships

Calculate the similarity between pairwise concepts

Apply an agglomerative algorithm to generate clusters

Therefore, SOCCER has two phases:

Phase I – weight learning

Phase II – clustering

soccer phase i learn weights 1
Task T: match two ontologies

Performance measure P: Precision, Recall, F-Measure, and Overall with regard to manual matching

Training experience E: a set of equivalent concept pairs by manual matching

Target function V: a pair of concepts

Target function representation:

SOCCER Phase I – learn weights (1)

Learning problem’s formal description

soccer phase i learn weights 2
SOCCER Phase I – learn weights (2)

Hypothesis space: weight vector (w1, w2, w3)

Learning objective: find the weight vector that best fits the training examples

Training rule: delta rule

Searching strategy: minimize the training error

soccer phase i learn weights 3
SOCCER Phase I – learn weights (3)

Similarity in concept names

d: edit distance between two strings

l: length of the longer string

Similarity in concept properties

n: number of pairs of matched properties

m: smaller cardinality of lists p1 and p2

Similarity in concept relationships (super/subClassOf)

calculate the similarity values for pairwise concepts in ancestor lists and choose the maximum value

soccer phase i learn weights 4
SOCCER Phase I – learn weights (4)

Overall similarity

Create a matrix M between O1 and O2 (n1 x n2)

cell[i, j] stores the similarity between the ith concept in O1 and the jth concept in O2

wi’s are randomly initialized, and then updated by the learning process

soccer phase i learn weights 5
SOCCER Phase I – learn weights (5)

Training error

Weight update rule

D: training example set

tr: maximum value for row i

tc: maximum value for column j

od: network output for a specific training example d

: the learning rate

sid: the si value for d

soccer phase ii clustering 1
SOCCER Phase II – clustering (1)
  • Apply the learned weights to recalculate similarity matrices for pairwise ontologies
  • Cluster similar concepts among a set of ontologies

Input: A set of ontologies and the corresponding matrices

    • Each concept forms a singleton cluster
    • Find two clusters, (a) and (b), with maximum similarity
    • If s[(a), (b)] > threshold, go to step 4; else go to step 7
    • Merge (a) and (b) into (a, b)
    • Update matrix: s[(a, b), (c)] = (s[(a), (c)] + s[(b), (c)])/2
    • Repeat steps 2 and 3
    • Output current clusters

The key is then to determine the threshold

soccer phase ii clustering 2
SOCCER Phase II – clustering (2)
  • Let the number of concepts in Oi be ni (i in [1, k])
  • WLOG, suppose n1 is the largest one in ni’s
  • Total number of clusters should be in [ ]
evaluation strategy
Evaluation Strategy
  • The hypothesis: a set of clusters exist across different ontologies
  • Need to show:
    • Weight learning is correct
    • Resultant clusters are meaningful
evaluation test ontologies 1
Evaluation – test ontologies (1)

Test ontologies are eight independently developed, real-world ones

http://www.csd.abdn.ac.uk/~cmckenzi/playpen/rdf/akt_ontology_LITE.owl

http://www.mindswap.org/2004/SSSW04/aktive-portal-ontology-latest.owl

http://annotation.semanticweb.org/iswc/iswc.owl

http://www.mondeca.com/owl/moses/ita.owl

http://protege.stanford.edu/plugins/owl/owl-library/ka.owl

http://ontoware.org/frs/download.php/18/semiport.owl

http://www.mondeca.com/owl/moses/univ.owl

http://reliant.teknowledge.com/DAML/Mid-level-ontology.owl

evaluation test ontologies 2
Evaluation – test ontologies (2)

Characteristics of test ontologies

evaluation result 1
Evaluation – result (1)

Weight convergence

evaluation result 2
Evaluation – result (2)

Clustering result

evaluation four measures
Evaluation – Four Measures

Precision p – percentage of correct predictions over all predictions

Recall r – percentage of correct predictions over correct matching

F-Measure f (= ) – a.k.a. Harmonic Mean, avoids the bias from adopting Precision or Recall alone

Overall o(= ) – Post-Match Effort, i.e., how much human effort is needed to remove false matches and add missed ones

soccer summary
SOCCER Summary

SOCCER: A learning-based ontology matching algorithm, and the first one based on ontology schemas alone

Our contributions:

1. ANN technique was integrated so that the weights for different semantic aspects can be learned instead of being specified by a human in advance

2. Moreover, the learning technique was carried out based on the ontology schemas alone, which distinguishes it from most other learning-based algorithms.

presentation outline2
Presentation Outline

Research Motivation

Learning-Based Ontology Matching – SOCCER

Ongoing Research

Summary

ongoing research bioinformatics medical informatics 1
Ongoing Research: Bioinformatics/Medical Informatics (1)

An abundance of medical/biological digital data has promised a profound impact in both the quality and rate of discovery and innovation

Worldwide health scientists are producing, accessing, analyzing, integrating, and storing massive amounts of digital medical data daily

If we were able to effectively transfer and integrate data from all possible resources, then the following would be granted:

A deeper understanding of all these data sets

Better exposed knowledge

Appropriate insights and actions that follow

But…in many cases, the data users are not the data producers, and they thus face challenges in harnessing data in unforeseen/unplanned ways

Fortunately, ontological techniques can render help in this regard!

slide37
Ontological techniques have been widely applied to medical and biological research

The most successful example is the Gene Ontology (GO) project

The GO’s aim: to standardize the representation of gene and gene product attributes across species and databases

Three ontologies in the GO: Cellular Component, Molecular Function, and Biological Process

The GO provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data

It also provides tools to access and process such data

The focus of the GO is to describe how gene products behave in a cellular context

Ontologies constructed under the auspices of the OBO (Open Biomedical Ontologies) group exhibit great variety

Semantic integration becomes an indispensable step in biological and biomedical data mining

Ongoing Research: Bioinformatics/Medical Informatics (2)

slide38

Ongoing Research: Bioinformatics/Medical Informatics (3)

An Experiment in Bio Data Mining

  • The characteristics of many biomedical ontologies: i) a rich set of super/subClassOf relationships; ii) numeric strings adopted as concept names; and iii) little, if any, instance data
  • SOCCER suitably serves the goal of integrating semantics in computational biology
slide39

Ongoing Research: Digital Forensics (1)

  • Challenges exist in Digital Forensics
    • to maintain the integrity of evidence found by different parties (usually from distributed geographic areas, or even with cultural barriers)
    • the accurate interpretation of evidence
    • the trustworthy conclusion drawn thereafter
  • Different parties are likely to adopt different formats and metadata for storing evidence’s contents – due to different people’s specific needs
  • The seamless communication among different parties, along with the knowledge sharing and reuse that follow, become a non-trivial problem
slide40

Ongoing Research: Digital Forensics (2)

  • Being a formal knowledge representation model, ontologies may help us to handle the aforementioned challenges in Digital Forensics
  • But …

There is no such central ontology that is large enough to include all concepts of interest to every individual criminal investigator

  • Anyone can design ontologies according to his/her own conceptual view, ontological heterogeneity is thus an inherent feature
  • That is, each need for a conceptual model from any individual party will have to provide its own particular extensions – different from and incompatible with extensions added by other parties
slide41

Ongoing Research: Digital Forensics (3)

  • An agreed-upon, global, and “all-in-one” ontology is not a feasible solution
  • Different groups should maintain their own conceptual models, while utilizing ontological techniques to synthesize their data with others’ models
  • This way, it is possible to effectively decouple the evidence semantics from its logical description and organization

Digital Investigation Evidence Acquisition Model Based on Ontology Matching (DIEAOM) to facilitate:

(1) knowledge collection from disparate, heterogeneous evidence sources

(2) knowledge sharing and reuse

(3) decision support for criminal investigators

  • The DIEAOM aims to synthesize vast amounts of evidence from different parties by matching conceptual models
  • Our goal is to benefit the current criminal investigation procedure with higher automation, enhanced effectiveness, and better knowledge sharing and reuse
other research opportunities 1 heterogeneous knowledge acquisition management
Other Research Opportunities (1) Heterogeneous Knowledge Acquisition/Management

Increasing growth in the scale, complexity, and diversity of data has been witnessed in recent years

In addition, the data are often used in ways not envisioned by those who created them

New techniques are thus needed to repurpose, transform, and integrate multiple and uncoordinated data sources; interoperability is the fundamental goal

In order to better achieve interoperability among distributed knowledge sources, accurate and effective semantic integration is the first, critical step to handle the heterogeneity in data

other research opportunities 2 component based software engineering
Other Research Opportunities (2) Component-Based Software Engineering

Engineered software is decomposed into functional or logical components, with well-defined interfaces for communication across components

Reusability is an important feature of a high quality component

(Semi)automated methodology to annotate, discover, compose, and execute the software components

Semantic integration techniques are important and fundamental in such automation processes

other research opportunities 3 semantics enriched image knowledge bases
Other Research Opportunities (3) Semantics-Enriched Image Knowledge Bases

Create image knowledge bases by using ontologies to semantically encode image features

Semantic search allows users to make use of concept search, instead of traditional keyword search

It also paves the way for more advanced search strategies

Users can specialize or generalize a query with the help of a concept hierarchy

Queries can be formed using information from ontologies

presentation outline3
Presentation Outline

Research Motivation

Learning-Based Ontology Matching – SOCCER

Ongoing Research

Summary

summary
Summary

Information from heterogeneous sources has different semantics, and semantic integration is necessary for a better use of every possibly available information

As a formal knowledge representation model, ontologies can render help in this regard

SOCCER, the first learning-based approach relied on schemas alone, was developed to tackle the ontology-matching problem, which is a critical component in semantic integration

Ontological techniques can be applied to many areas to generate challenging interdisciplinary research topics

thank you
Thank you!!!
  • Suggestions?
  • Comments?
  • Questions?