Performing object consolidation on the semantic web data graph
1 / 15

Performing Object Consolidation on the Semantic Web Data Graph - PowerPoint PPT Presentation

  • Uploaded on

Performing Object Consolidation on the Semantic Web Data Graph. Aidan Hogan Andreas Harth Stefan Decker. Introduction. Aim: To merge equivalent RDF instances for large scale RDF datasets; a.k.a. perform object consolidation Background:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Performing Object Consolidation on the Semantic Web Data Graph' - jodie

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Performing object consolidation on the semantic web data graph

Performing Object Consolidation on the Semantic Web Data Graph

Aidan Hogan

Andreas Harth

Stefan Decker

Introduction Graph

  • Aim: To merge equivalent RDF instances for large scale RDF datasets; a.k.a. perform object consolidation

  • Background:

  • RDF (Resource Description Framework) is data model used in Semantic Web technologies

  • Ideal for entity centric applications where structured descriptions of entities are provided in RDF (e.g. SWSE); anything can be described in RDF

  • URIs are used as identifiers for entities

  • Ideally, URIs are used consistently across data sources to describe entities; information on entities can be collected and merged from different sources

Motivation Graph

  • Problem:

  • URIs often not agreed upon (or not provided) for entities across sources; especially real world entities (e.g. cannot achieve agreement upon a URI for a person). Therefore, may have many instances split for one entity.

  • Entity centric applications will see multiple instances as multiple entities – problematic! Example later…

Towards a solution
Towards a Solution Graph

  • Towards a solution:

  • RDF data backed by ontologies in which certain properties may be described as being Inverse Functional

  • Inverse Functional Properties have values unique to an entity (e.g., chat usernames unique to people, ISBN code unique to books, etc.).

  • Therefore, if two instances have the same value for the same Inverse Functional Property, they are equivalent and can be merged.

Example Graph

  • Three sources provide data on one person – different identifiers used

  • Two different Inverse Functional Properties:

    • foaf:mbox referring to a person’s email

    • foaf:homepage referring to a person’s homepage

Benefit Graph

  • Before consolidation, three instances one entity. For example an entity centric search engine would return three results for the one person.

  • After consolidation, one instances one entity.

Our dataset
Our Dataset Graph

  • Want to perform object consolidation on entire RDF Semantic Web data graph…

  • 470M statements from multiple schemas describing 72M instances from over 3M data sources

  • 84% of instances have no URI identifier

  • Majority of data is FOAF (Friend of a Friend) descriptions of people (78%) with 99.9% having no idenitifiers

  • => We need scalable algorithm for performing object consolidation

Step 1
Step 1 Graph

  • Need to identify Inverse Functional Properties in dataset

  • Inverse functional properties are defined in ontologies

  • Need to retrieve ontologies describing properties in the dataset

  • Can dereference the property URIs to find the pertinent ontologies

  • Examples of inverse functional properties found were

    • foaf:mbox (email property), foaf:homepage, foaf:weblog, foaf:aimChatID and other chat ID properties, doap:homepage

Step 2
Step 2 Graph

  • Need to re-order data on-disk

  • initially data in NQuads unsorted SPOC order

  • Subject = identifier of entity being described

  • Predicate = property of entity being described

  • Object = value of property

  • Context = data-source of SPO triple

  • data re-ordered to POCS order…


Andreas Harth

  • …and sorted. Now data is grouped by same predicates and then objects.

Step 3
Step 3 Graph

  • Scan data for equivalent instances

  • scan sorted POCS data looking for equivalent instances

  • if a predicate is an inverse functional property and has two identical values as object, the instances with identifiers as subject are equivalent and describe the same entity

  • equivalence is transitive and so a “same-as table” is used to store and perform transitive closure.

    • each row of the table contains equivalent identifiers

    • no identifier can appear in more than one row

Step 4
Step 4 Graph

  • Pick identifiers

  • Now we have a list of equivalent instance identifiers… we need to pick one and use it for consolidated instance

  • We…

    • Pick URIs before blank nodes

    • Pick more common used identifiers after above restriction

  • Another scan of data is performed to count the number of statements identifiers appear in (if they appear in same-as list).

  • The new identifiers are called pivot identifiers

Step 5
Step 5 Graph

  • Rewrite identifiers

  • Data is scanned and identifiers in subject and object position are rewritten to pivot identifiers

  • …one iteration complete

  • It’s possible that more than one iteration may be required… If a value of an inverse functional property is changed in one iteration, more equivalences may be found by another iteration

Evaluation Graph

  • Encountered issues applying algorithm to 480M dataset

  • foaf:weblog defined as inverse functional -- given values which are communal weblogs or shared weblogs (not unique to a person)

    • we removed foaf:weblog from list of inverse functional properties

  • many people define common arbitrary values for properties such as chat IDs; e.g., ask, none

    • we define a black-list for such values

Evaluation Graph

  • 2,443,939 instances consolidated to 401,385

  • 1 iteration required

  • The following table shows the number of atomic equivalences found through the main inverse functional properties

Thanks! Graph