Performing object consolidation on the semantic web data graph
This presentation is the property of its rightful owner.
Sponsored Links
1 / 15

Performing Object Consolidation on the Semantic Web Data Graph PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on
  • Presentation posted in: General

Performing Object Consolidation on the Semantic Web Data Graph. Aidan Hogan Andreas Harth Stefan Decker. Introduction. Aim: To merge equivalent RDF instances for large scale RDF datasets; a.k.a. perform object consolidation Background:

Download Presentation

Performing Object Consolidation on the Semantic Web Data Graph

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Performing object consolidation on the semantic web data graph

Performing Object Consolidation on the Semantic Web Data Graph

Aidan Hogan

Andreas Harth

Stefan Decker


Introduction

Introduction

  • Aim: To merge equivalent RDF instances for large scale RDF datasets; a.k.a. perform object consolidation

  • Background:

  • RDF (Resource Description Framework) is data model used in Semantic Web technologies

  • Ideal for entity centric applications where structured descriptions of entities are provided in RDF (e.g. SWSE); anything can be described in RDF

  • URIs are used as identifiers for entities

  • Ideally, URIs are used consistently across data sources to describe entities; information on entities can be collected and merged from different sources


Motivation

Motivation

  • Problem:

  • URIs often not agreed upon (or not provided) for entities across sources; especially real world entities (e.g. cannot achieve agreement upon a URI for a person). Therefore, may have many instances split for one entity.

  • Entity centric applications will see multiple instances as multiple entities – problematic! Example later…


Towards a solution

Towards a Solution

  • Towards a solution:

  • RDF data backed by ontologies in which certain properties may be described as being Inverse Functional

  • Inverse Functional Properties have values unique to an entity (e.g., chat usernames unique to people, ISBN code unique to books, etc.).

  • Therefore, if two instances have the same value for the same Inverse Functional Property, they are equivalent and can be merged.


Example

Example

  • Three sources provide data on one person – different identifiers used

  • Two different Inverse Functional Properties:

    • foaf:mbox referring to a person’s email

    • foaf:homepage referring to a person’s homepage


Benefit

Benefit

  • Before consolidation, three instances one entity. For example an entity centric search engine would return three results for the one person.

  • After consolidation, one instances one entity.


Our dataset

Our Dataset

  • Want to perform object consolidation on entire RDF Semantic Web data graph…

  • 470M statements from multiple schemas describing 72M instances from over 3M data sources

  • 84% of instances have no URI identifier

  • Majority of data is FOAF (Friend of a Friend) descriptions of people (78%) with 99.9% having no idenitifiers

  • => We need scalable algorithm for performing object consolidation


Step 1

Step 1

  • Need to identify Inverse Functional Properties in dataset

  • Inverse functional properties are defined in ontologies

  • Need to retrieve ontologies describing properties in the dataset

  • Can dereference the property URIs to find the pertinent ontologies

  • Examples of inverse functional properties found were

    • foaf:mbox (email property), foaf:homepage, foaf:weblog, foaf:aimChatID and other chat ID properties, doap:homepage


Step 2

Step 2

  • Need to re-order data on-disk

  • initially data in NQuads unsorted SPOC order

  • Subject = identifier of entity being described

  • Predicate = property of entity being described

  • Object = value of property

  • Context = data-source of SPO triple

  • data re-ordered to POCS order…

http://andreasharth.org#me

foaf:name

Andreas Harth

http://andreasharth.org/foaf.rdf

  • …and sorted. Now data is grouped by same predicates and then objects.


Step 3

Step 3

  • Scan data for equivalent instances

  • scan sorted POCS data looking for equivalent instances

  • if a predicate is an inverse functional property and has two identical values as object, the instances with identifiers as subject are equivalent and describe the same entity

  • equivalence is transitive and so a “same-as table” is used to store and perform transitive closure.

    • each row of the table contains equivalent identifiers

    • no identifier can appear in more than one row


Step 4

Step 4

  • Pick identifiers

  • Now we have a list of equivalent instance identifiers… we need to pick one and use it for consolidated instance

  • We…

    • Pick URIs before blank nodes

    • Pick more common used identifiers after above restriction

  • Another scan of data is performed to count the number of statements identifiers appear in (if they appear in same-as list).

  • The new identifiers are called pivot identifiers


Step 5

Step 5

  • Rewrite identifiers

  • Data is scanned and identifiers in subject and object position are rewritten to pivot identifiers

  • …one iteration complete

  • It’s possible that more than one iteration may be required… If a value of an inverse functional property is changed in one iteration, more equivalences may be found by another iteration


Evaluation

Evaluation

  • Encountered issues applying algorithm to 480M dataset

  • foaf:weblog defined as inverse functional -- given values which are communal weblogs or shared weblogs (not unique to a person)

    • we removed foaf:weblog from list of inverse functional properties

  • many people define common arbitrary values for properties such as chat IDs; e.g., ask, none

    • we define a black-list for such values


Evaluation1

Evaluation

  • 2,443,939 instances consolidated to 401,385

  • 1 iteration required

  • The following table shows the number of atomic equivalences found through the main inverse functional properties


Thanks

Thanks!


  • Login