evaluating semantic metadata without the presence of a gold standard l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Evaluating Semantic Metadata without the Presence of a Gold Standard PowerPoint Presentation
Download Presentation
Evaluating Semantic Metadata without the Presence of a Gold Standard

Loading in 2 Seconds...

play fullscreen
1 / 32

Evaluating Semantic Metadata without the Presence of a Gold Standard - PowerPoint PPT Presentation


  • 150 Views
  • Uploaded on

Evaluating Semantic Metadata without the Presence of a Gold Standard. Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute, The Open University {y.lei,a.nikolov,v.s.uren,e.motta}@open.ac.uk. Focuses.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Evaluating Semantic Metadata without the Presence of a Gold Standard' - delu


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
evaluating semantic metadata without the presence of a gold standard

Evaluating Semantic Metadata without the Presence of a Gold Standard

Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta

Knowledge Media Institute,

The Open University

{y.lei,a.nikolov,v.s.uren,e.motta}@open.ac.uk

focuses
Focuses
  • A quality model which characterizes quality problems in semantic metadata
  • An automatic detection algorithm
  • Experiments
slide3

Ontology

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

Metadata

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

Data

semantic metadata generation
Semantic Metadata Generation

Semantic Metadata

Acquisition

Semantic Metadata Repositories

semantic metadata generation5
Semantic Metadata Generation

Semantic Metadata

Acquisition

Semantic Metadata Repositories

A number of problems can happen that decrease the quality of metadata

quality evaluation
Quality Evaluation
  • Metadata providers: ensuring high quality
  • Users: facilitate assessing the trustworthiness
  • Applications: filtering out poor quality data
our quality evaluation framework
Our Quality Evaluation Framework
  • A quality model
  • Assessment metrics
  • An automatic evaluation algorithm
the quality model
The Quality Model

Real World

Modelling

Describing

Representing

Data Sources

Ontologies

Instantiating

Annotating

Semantic Metadata

quality problems
Quality Problems

Data Objects

Semantic Entities

(a) Incomplete Annotation

quality problems11

(a) Incomplete Annotation

Quality Problems

(c) Ambiguous Annotation

(b) Duplicate Annotation

quality problems12

(a) Incomplete Annotation

Quality Problems

(c) Ambiguous Annotation

(b) Duplicate Annotation

(d) Spurious Annotation

quality problems13

(a) Incomplete Annotation

(d) Spurious Annotation

Quality Problems

(c) Ambiguous Annotation

(b) Duplicate Annotation

(e) Inaccurate Annotation

quality problems14

C1

C2

C3

(a) Incomplete Annotation

Quality Problems

(c) Ambiguous Annotation

(b) Duplicate Annotation

Class

Semantic metadata

I1

R1

R2

I2

R2

I3

I4

(e) Inaccurate Annotation

(d) Spurious Annotation

(f) Inconsistent Annotation

current support for evaluation
Current Support for Evaluation
  • Gold standard based:
    • Examples: Gate[1], LA[2], BDM[3]
  • Feature: assessing the performance of information extraction techniques used.
  • Not suitable for evaluating semantic metadata
    • Gold standard annotations are often not available
the semantic metadata acquisition scenario
The Semantic Metadata Acquisition Scenario

KMi News Stories

Information Extraction Engine (ESpotter)

High Quality Metadata

Raw Metadata

Evaluation

Departmental Databases

Semantic Data Transformation Engine

  • Evaluation needs to take place dynamically whenever a new entry is generated.
  • In such context, gold standard is NOT available.
our approach
Our Approach
  • Using available knowledge instead of asking for gold standard annotations
    • Knowledge sources specific for the domain:
      • Domain ontologies, data repositories, domain specific lexicons
    • Knowledge available at background
      • Semantic Web, Web, and general lexicon resources
  • Advantages:
    • Making possible for automatic operation
    • Making possible for large scale data evaluation
using domain knowledge
Using Domain Knowledge

Constraints and restrictions

1. Domain Ontologies

Inconsistent Problems

Example: one person classified as both KMi-Member and None-KMi-Member when they are disjoint classes.

using domain knowledge19
Using Domain Knowledge

Constraints and restrictions

1. Domain Ontologies

Inconsistent Annotations

Lexicon – instance mappings

2. Domain Lexicons

Duplicate Annotations

Example: when OU and Open-University both appear as values of the same property of the same instance

using domain knowledge20
Using Domain Knowledge

Constraints and restrictions

1. Domain Ontologies

Inconsistent Annotations

Lexicon – instance mappings

2. Domain Lexicons

Duplicate Annotations

Ambiguous Annotations

3. Domain Data Repositories

Inaccurate Annotations

slide21
When nothing can be found in the domain knowledge, the data can be:
    • Correct but outside the domain (e.g., IBM in the KMi domain)
    • Inaccurate annotation: mis-classification (e.g., Sun Micro-systems as a person)
    • Spurious (e.g., workshop chair as an organization)
  • Background knowledge is then used to further investigate the problems
investigating the semantic web
Investigating the Semantic Web

Semantic Web

No

Found

matches

Examining

the Web

Watson

Yes

Yes

Classes

Similar?

Adding data to

the repositories

WordNet

No

Inaccurate Annotations

examining the web
Examining the Web

Web

No

Has

classification?

Pankow

Spurious

Annotations

Yes

Similar?

WordNet

No

Inaccurate Annotations

the overall picture

PANKOW

WATSON

WordNet

Semantic Web

Web

Lexical

Resources

The Overall Picture

Domain Knowledge

Ontologies

SemSearch

Pellet + Reiter

Evaluation Engine

Evaluation Results

Metadata

Step1: Using domain knowledge

Step2: Using background knowledge

Background Knowledge

Web

Semantic Web

addressed quality problems

C1

C2

C3

(a) Incomplete Annotation

Addressed Quality Problems

(c) Ambiguous Annotation

(b) Duplicate Annotation

Class

Semantic metadata

I1

R1

R2

I2

R2

I3

I4

(e) Inaccurate Annotation

(d) Spurious Annotation

(f) Inconsistent Annotation

experiments
Experiments
  • Data settings: gathered in our previous work [4] in KMi semantic web portal
    • Randomly chose 36 news stories from the KMi news archive
    • Collected a metadata set by using ASDI
    • Constructed a gold standard annotation
  • Method:
    • A gold standard based evaluation as a comparison base line
    • Evaluating the data set using domain knowledge only
    • Evaluating the data set using both domain knowledge and background knowledge
discussion
Discussion
  • The performance of such an approach largely depends on:
    • A good domain specific knowledge source
    • A good publicity of the entities that are contained in the data set, otherwise there would be lots of false alarms.
references
References
  • H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL02), 2002.
  • P. Cimiano, S. Staab, and J. Tane. Acquisition of Taxonomies from Text: FCA meets NLP. In Proceedings of the ECML/PKDD Workshop on Adaptive Text Extraction and Mining, pages 10 – 17, 2003.
  • D. Maynard, W. Peters, and Y. Li. Metrics for Evaluation of Ontology-based Information Extraction. In Proceedings of the 4th International Workshop on Evaluation of Ontologies on the Web, Edinburgh, UK, May 2006.
  • Y. Lei, M. Sabou, V. Lopez, J. Zhu, V. S. Uren, and E. Motta. An Infrastructure for Acquiring High Quality Semantic Metadata. In Proceedings of the 3rd European Semantic Web Conference, 2006.