An empirical study of instance based ontology mapping
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

An Empirical Study of Instance-Based Ontology Mapping PowerPoint PPT Presentation


  • 135 Views
  • Uploaded on
  • Presentation posted in: General

An Empirical Study of Instance-Based Ontology Mapping. Antoine Isaac, Lourens van der Meij, Stefan Schlobach , Shenghui Wang [email protected] funded by NWO Vrije Universiteit Amsterdam Koninklijke Bibliotheek Den Haag Max Planck Instutute Nijmegen. Metamotivation. Ontology mapping in practise

Download Presentation

An Empirical Study of Instance-Based Ontology Mapping

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


An empirical study of instance based ontology mapping

An Empirical Study of Instance-Based Ontology Mapping

Antoine Isaac, Lourens van der Meij, Stefan Schlobach, Shenghui Wang

[email protected] funded by NWO

Vrije Universiteit Amsterdam

Koninklijke Bibliotheek Den Haag

Max Planck Instutute Nijmegen


Metamotivation

Metamotivation

  • Ontology mapping in practise

    • Based on real problems in the host institution at the Dutch Royal Library

    • Task-driven

      • Annotation support

      • Merging of thesauri

    • Real thesauri (100 years of tradition)

      • Really messy

      • Conceptually difficult

      • Inexpressive

  • Generic Solutions to Specific Questions & Tasks

  • Using Semantic Web Standards (SKOSification)


Overview

Overview

  • Use-case

  • Instance-based mapping

  • Evaluation

  • Experiments

  • Results

  • Conclusions


The alignment task context

The Alignment Task: Context

  • National Library of the Netherlands (KB)

  • 2 main collections

    • Legal Deposit: all Dutch printed books

    • Scientific Collections: history, language…

  • Each described (indexed) by its own thesaurus


A need for thesaurus mapping

A need for thesaurus mapping

  • The KB wants

    • (Scenario 1) Possibly discontinue one of both annotation and retrieval methods.

    • (Scenario 2) Possibly merge the thesauri

  • We try to explore mapping

    • (Task 1) In case of single/new/merged retrieval system, find books annotated with old system, facilitated by using mappings

    • (Task 2) Candidate terms for merged thesaurus

  • We make use of the doubly annotated corpus to calculate Instance-Based mappings


Overview1

Overview

  • Use-case

  • Instance-based mapping

  • Evaluation

  • Experiments

  • Results

  • Conclusions


Calculating mappings using concept extensions

how much are

they related?

Calculating mappings using Concept Extensions


Standard approach jaccard

Standard approach (Jaccard)

  • Use co-occurrence measure to calculate similarity between 2 concepts: e.g.

Elements of B

B

G

Elements of G

Joint Elements

Set of books in the library

Similarity = 5/9 = 55 % (overlap, e.g. Degree of Greenness )

Similarity = 1/7 = 14 % (overlap, e.g. Degree of Greenness )


Issues with this measure sparse data

Issues with this measure (sparse data)

  • What is more reliable?

  • We need

    • more reliable measures

    • Or thresholds (at least n doubly annotated books)

Or

?

Jacc = 1/1 = 100 %

Jacc = 18/21 = 86 %

The second solution is worse: bB = {MemberOfParliament} and bG = {Cricket}


Issue with measure hierarchy

Issue with measure (hierarchy):

Consider a hierarchy

Jacc(B’,G) = ½ = 50%

B’

Jacc(B’,G) = 2/6 = 33%

·

G

B

Non hierarchical

Hierarchical Elements

Set of books in the library


An empirical study of instance based om

An empirical study of instance-based OM

  • We experimented with three dimensions

Jaccard

Corrected Jaccard

Pointwise Mutual Information

Log Likelihood Ratio

Information Gain

0

10

Similarity measure

Threshold

Yes

No

Hierarchy

Why only 2 thresholds? Because of evaluation costs!


Overview2

Overview

  • Use-case

  • Instance-based mapping

  • Evaluation

  • Experiments

  • Results

  • Conclusions


Evaluation building a gold standard

Evaluation: building a gold standard

Possible Thesaurus relations (~ SKOS)

GTT

Brinkman


User evaluation statistics

User Evaluation Statistics

  • 3 evaluators with 1500 evaluations

  • 90% agreement ONLYEQ

  • If some evaluator says "equivalent", 73% of other evaluators say the same

  • Comparing two evaluators, correspondence in assignment is best for equivalence, followed by "No Link", "Narrower than", "Broader than", at or above 50% agreement, "Related To" has 35% agreement.

  • There are correlations between evaluators.

    • For example, Ev1 and Ev2 agreed much more on saying that there is no link than the Ev3.


Evaluation interpretation what is a good mapping

Evaluation Interpretation: What is a good mapping?

  • Is use case specific. We considered:

    • ONLYEQ: Only Equivalent answer → correct

    • NOTREL: EQ, BT,NT → correct

    • ALL: EQ, BT, NT, RT → correct

      ONLYEQ NOTREL ALL

  • The question is obviously: do they produce the same results


Evaluation validity of the different methods

Evaluation: validity of the (different) methods

Answer is: yes

All evaluations produce the same results (in different scales)


A remark about evaluation

A remark about Evaluation

  • Use of mappings strongly task dependant

    • Scenario 1 (legacy data/annotation support) and Scenario 2 (thesaurus merging) require different mappings.

    • Our evaluation is useful (correct) for Scenario 2 (intensional)

    • Scenario 1 can be evaluated differently (e.g. cross-validation on test-data)

  • See our paper at the Cultural Heritage Workshop.


Overview3

Overview

  • Use-case

  • Instance-based mapping

  • Evaluation

  • Experiments

  • Results

  • Conclusions


Experiments setup data and thesauri

Experiments: Setup, Data and Thesauri

  • We calculated

    • 5 different similarity measures with

    • Threshold: 0 and 10

    • Hierarchy: yes or no.

    • Based on on

      • 24.061 GTT concepts with

      • 4.990 Brinkman concepts based on

      • 243.886 books with double annotations


Experiments result calculation

Experiments: Result calculation

  • Average precision at similarity position i:

    • Pi = Ngood,i/Ni

      (take the first i mappings, and return the percentage of correct ones)

      Example:

      This means that from the first 798 mappings 86% were correct

  • Recall is estimated based on lexical mappings

  • F-measure is calculated as usual

100%

86 %

798th mapping


Overview4

Overview

  • Use-case

  • Instance-based mapping

  • Evaluation

  • Experiments

  • Results

  • Conclusions


Results three research questions

Results: Three research questions

  • What is the influence of the choice of threshold?

  • What is the influence of hierarchical information?

  • What is the best measure and setting for instance-based mapping?


What is the influence of the choice of threshold

What is the influence of the choice of threshold?

Threshold needed for Jaccard

Threshold NOT needed for LLR


What is the influence of hierarchical information

What is the influence of hierarchical information?

Results are inconclusive!


Best measure and setting for instance based mapping

Best measure and setting for instance-based mapping?

We have two winners!

10

The corrected Jaccard measures


Conclusion

Conclusion

  • Summary

    • About 80% precision at estimated 80% recall

    • Simple measures perform better, if statistical correction applied, (threshold or explicit statistical correction)

    • Hierarchical aspects unresolved

    • Some measures really unsuited

  • Future work:

    • Generalize results

      • Other use cases, web directories, …

    • Study other measures


Thank you

Thank you.


Similarity measures formulae

Similarity measures Formulae

  • Jaccard:

  • Corrected Jaccard: assign a smaller score to less frequently co-occurring annotations.


Information theoretic measures

Information Theoretic Measures

  • Pointwise Mutual Information:

    • Measures the reduction of

      uncertainty that the annotation

      of one concept yields for the

      annotation with another concept.

    • -> disadvantage: inadequate for spare data

  • LogLikelihoodRatio:

  • Information Gain:

    • Information gain is the difference in entropy,

    • determine the attribute that distinguishes best between positive an negative example


  • Login