1 / 86

# Linked Data Integration (using reasoning) - PowerPoint PPT Presentation

Linked Data Integration (using reasoning). Aidan Hogan. Day 3 Session 2. What is reasoning?. Reasoning: Conceptual Overview. (Loosely) Deriving novel conclusions from existing knowledge Deductive reasoning : inferring new facts from existing rules and facts

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Linked Data Integration (using reasoning)' - drago

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Aidan Hogan

Day 3

Session 2

Reasoning: Conceptual Overview

(Loosely) Deriving novel conclusions from existing knowledge

Deductive reasoning: inferring new facts from existing rules and facts

Given rule: All Kia cars are made in Korea;

Given premise (fact): Fred’s car is a Kia

Entails fact:Fred’s car was made in Korea

Inductive reasoning: learning new rules from existing facts and entailments (typically what us humans do: build imprecise rules from models)

Given model of existing facts: All Kia cars I’ve seen have four wheels

~Entails rule: All Kias have four wheels

Given fact:Fred’s car is a Kia

Entails (probabilistic fact): Fred’s car likely has four wheels

Abductive reasoning: guess a premise from a conclusion (similar in principle to a form of inductive reasoning)

Given entailment: Fred’s car is Korean and has four wheels?

Rule: Kias and Hyundais are Korean and typically have four wheels

Guessed premise:Fred’s car is a Kia or a Hyundai?

Reasoning: Clearing up terms

• Semantics: formally defined meanings of terms

• KiaCar ⊑ KoreanCar ⊓ FourWheels

• Entailments: the conclusions which follow from formal semantics

• Inference: a procedure to compute entailments

• (or the entailments that can be computed therefrom)

• Include Fred as an answer to “give me friends of mine who own Korean cars”

• Subsumption checking: identify subclass relationships

• Is the class KiaCar a subset of KoreanCar if all Kias are manufactured in Seoul?

• Class-Satisfiability checking: identify if a class can have a membership

• Can there be something which is a KoreanCar and a EuropeanCar?

• Consistency checking: identify formally conflicting information

• Fred tells me his Kia is European; is this correct?

• Instance checking: identify if an individual is a member of a given class

• Is Freds Kia an Asian Car?

Reasoning: RDFS and OWL (deductive)

• Formal semantics of RDFS and OWL can be leveraged for reasoning.

• :KiaCar rdfs:subClassOf :KoreanCar ,

• [ owl:hasValue :Seoul ; owl:onProperty :manufacturedIn ]

• :FredsCar a :KiaCar .

• Implies

• :FredsCar a KoreanCar ; :manufacturedIn :Seoul .

Reasoning: OWL (2)

• Eight sub-languages of OWL!

• Why eight?(^^)

• Direct Semantics (based on Description Logics):

• OWL DL (NExpTime for non-QA tasks), OWL Lite (ExpTime for non-QA)

• OWL 2 EL, OWL 2 QL, OWL 2 RL (All PTime for most tasks for non-QA)

• OWL 2 RL (2NExpTime for non-QA)

• Emphasis on soundness and completeness

• Tableaux-based algorithms

• Based on KB-satisfiability checking

• Syntactic restrictions to preserve complexity

• e.g., no datatype inverse-functional properties

• RDF-Based Semantics (layered directly on top of RDFS)

• OWL Full, OWL 2 Full

• No complete, correct inference procedure can exist for the reasoning tasks

• Incomplete reasoning possible through rules…

Opinion: OWL/OWL 2 useful stuff, but an extremely complex standard!!!

• RDFS entailment rules provide sound, complete RDFS reasoning

• OWL 2 RL/RDF provide partial support for OWL 2 RDF-based semantics

• Monotonic rules which are guarded

• Positive subset of datalog with a fixed ternary predicate

• Rules have cubic complexity (with trivial exceptions aside)

• Due to the arity of triples (3)

IF⇒THEN

Body/Antecedent/Condition

?c1 rdfs:subClassOf ?c2 .

?x rdf:type ?c1 .

⇒?x rdf:type ?c2 .

• foaf:Person rdfs:subClassOf foaf:Agent .

• timbl:me rdf:type foaf:Person .

• ⇒timbl:me rdf:type foaf:Agent .

Schema/Terminology/

Ontological

Instance/Assertional

IF⇒THEN

Body/Antecedent/Condition

?c1 owl:disjointWith ?c2 .

?x rdf:type ?c1 .

?x rdf:type ?c2 .

⇒false

• foaf:Person owl:disjointWith foaf:Organization .

• w3c rdf:type foaf:Organization .

• w3c rdf:type foaf:Person .

• ⇒false

…integration use-case!

explicit data

implicit data

How can consumers query the implicit data

…so what’s The Problem?…

…heterogeneity

…need to integrate data from different sources

foaf:page

Gimmewebpages

relating to

Tim Berners-Lee

timbl:i

timbl:ifoaf:page?pages .

Hetereogenity inschema…

webpage: properties

= rdfs:subPropertyOf

mo:musicBrainz

= owl:inverseOf

doap:homepage

mo:myspace

foaf:homepage

foaf:weblog

foaf:primaryTopic

foaf:isPrimaryTopicOf

foaf:page

foaf:topic

SKOS

Image from http://blog.dbtune.org/public/.081005_lod_constellation_m.jpg:; Giasson, Bergman

Hetereogenity in naming…

Tim Berners-Lee: URIs

dblp:100007

timbl:i

db:Tim-Berners_Lee

identica:45563

= owl:sameAs

fb:en.tim_berners-lee

mo:myspace

foaf:primaryTopic

foaf:page

foaf:topic

SKOS

doap:homepage

foaf:homepage

Gimmewebpages

relating to

Tim Berners-Lee

foaf:isPrimaryTopicOf

identica:45563

db:Tim-Berners_Lee

dblp:100007

fb:en.tim_berners-lee

timbl:i

timbl:ifoaf:page?pages .

...7 x 6 = 42 possible patterns

…what (OWL) reasoning is feasible for Linked Data?

Scalable

Expressive

Domain-Agnostic

Robust

• Scalability

• At least tens of billions of statements (for the moment)

• Near linear scale!!!

• Noisy data

• Inconsistencies galore

• Publishing errors

…need to consider the provenance of Web data

Noisy Data: Omnipotent Being

• Web data is noisy.

• Proof:

• 08445a31a78661b5c746feff39a9db6e4e2cc5cf

• sha1-sum of ‘mailto:’

• common value for foaf:mbox_sha1sum

• An inverse-functional (uniquely identifying) property!!!

• Any person who shares the same value will be considered the same

• Q.E.D.

Noisy Data: Redefining everything

• More proof (courtesy ofhttp://www.eiao.net/rdf/1.0)

• rdf:type rdf:type owl:Property .

• rdf:type rdfs:label “type”@en .

• rdf:type rdfs:comment “Type of resource” .

• rdf:type rdfs:domain eiao:testRun .

• rdf:type rdfs:domain eiao:pageSurvey .

• rdf:type rdfs:domain eiao:siteSurvey .

• rdf:type rdfs:domain eiao:scenario .

• rdf:type rdfs:domain eiao:rangeLocation .

• rdf:type rdfs:domain eiao:startPointer .

• rdf:type rdfs:domain eiao:endPointer .

• rdf:type rdfs:domain eiao:runs .

Noisy Data: Inconsistency

w3c rdf:type foaf:Organization .

w3c rdf:type foaf:Person .

foaf:Person owl:disjointWith foaf:Organization .

Class/property URIs dereference to their authoritative document

FOAF spec authoritative for foaf:Person✓

MY spec not authoritative for foaf:Person✘

Allow “extension” in third-party documents

my:Person rdfs:subClassOf foaf:Person . (MY spec) ✓

BUT: Reduce obscure memberships

foaf:Person rdfs:subClassOf my:Person . (MY spec) ✘

ALSO: Protect specifications

foaf:knows a owl:SymmetricProperty . (MY spec) ✘

AuthoritativeReasoning

Noisy Data: Redefining everything

• More proof (courtesy ofhttp://www.eiao.net/rdf/1.0)

• rdf:type rdf:type owl:Property .

• rdf:type rdfs:label “type”@en .

• rdf:type rdfs:comment “Type of resource” .

• rdf:type rdfs:domain eiao:testRun .

• rdf:type rdfs:domain eiao:pageSurvey .

• rdf:type rdfs:domain eiao:siteSurvey .

• rdf:type rdfs:domain eiao:scenario .

• rdf:type rdfs:domain eiao:rangeLocation .

• rdf:type rdfs:domain eiao:startPointer .

• rdf:type rdfs:domain eiao:endPointer .

• rdf:type rdfs:domain eiao:runs .

Not Authoritative

Authoritative Reasoning: read more …w/ essential plugs

Gong Cheng, Yuzhong Qu.

"Integrating Lightweight Reasoning into Class-Based Query Refinement for Object Search." ASWC 2008.

Aidan Hogan, Andreas Harth, Axel Polleres.

"Scalable Authoritative OWL Reasoning for the Web." IJSWIS 2009.

Aidan Hogan, Jeff Z. Pan, Axel Polleres and Stefan Decker.

"SAOR: Template Rule Optimisations for Distributed Reasoning over 1 Billion Linked Data Triples." ISWC 2010.

My thesis: http://aidanhogan.com/docs/thesis/

• Quarantined reasoning!

• Separate and cache hierarchy of schema documents/dependencies…

Quarantined Reasoning [Delbru et al.; 2008]

Quarantined Reasoning [Delbru et al.; 2008]

Quarantined Reasoning [Delbru et al.; 2008]

Quarantined Reasoning [Delbru et al.; 2008]

A-Box / Instance Data

(e.g, a FOAF file)

T-Box / Ontology Data

(e.g., the FOAF ontology and its indirect imports)

Noisy Data: Redefining everything

• More proof (courtesy ofhttp://www.eiao.net/rdf/1.0)

• rdf:type rdf:type owl:Property .

• rdf:type rdfs:label “type”@en .

• rdf:type rdfs:comment “Type of resource” .

• rdf:type rdfs:domain eiao:testRun .

• rdf:type rdfs:domain eiao:pageSurvey .

• rdf:type rdfs:domain eiao:siteSurvey .

• rdf:type rdfs:domain eiao:scenario .

• rdf:type rdfs:domain eiao:rangeLocation .

• rdf:type rdfs:domain eiao:startPointer .

• rdf:type rdfs:domain eiao:endPointer .

• rdf:type rdfs:domain eiao:runs .

Not In Here

R. Delbru, A. Polleres, G. Tummarello and S. Decker.

"Context Dependent Reasoning for Semantic Documents in Sindice. “ 4th International Workshop on Scalable Semantic Web Knowledge Base Systems, 2008.

• Use links-analysis (PageRank) to rank documents and triples

• Use annotated reasoning to rank inferences

• Repair each consistency by removing the weakest triple

• Piero A. Bonatti, Aidan Hogan, Axel Polleres and Luigi Sauro. "Robust and Scalable Linked Data Reasoning Incorporating Provenance and Trust Annotations". In the Journal of Web Semantics (in press).

…using positive (monotonic) rules.

Expressive reasoning (also) possible through tableaux, but yet to demonstrate desired scale

• Forward-chaining Materialisation

• Avoid runtime expense

• Users taught impatience by Google

• Pre-compute for quick retrieval

• Web-scale systems should scale well

• More data = more disk-space/machines

Don't materialise

too much!

One size does

not fit all!

• OUTPUT:

• Flat file of (partial) inferred triples (quads)

• INPUT:

• Flat file of triples (quads)

What rules?

• Let’s look at a recent corpus of Linked Data and see what schema’s inside

• (and what the rulesets support)

• Open-domain crawl May 2010

• 3.985 million sources (docs)

• 780 pay-level domains (e.g., dbpedia.org)

• Ran “special” PageRank over documents

• 86 thousand docs contained some RDFS/OWL schema data (2.2% of docs... but <0.2% of triples)

• Summated ranks of docs using each primitive

Survey of Linked Data schema: Top 15 ranks

# Axiom Rank(Σ) RDFS Horst O2R

• rdfs:subClassOf 0.295 ✓✓✓

• rdfs:range 0.294 ✓✓✓

• rdfs:domain 0.292 ✓✓✓

• rdfs:subPropertyOf 0.090 ✓✓✓

• owl:FunctionalProperty 0.063 ✘✓✓

• owl:disjointWith 0.049 ✘✘✓

• owl:inverseOf 0.047 ✘✓✓

• owl:unionOf 0.035 ✘✘✓

• owl:SymmetricProperty 0.033 ✘✓✓

• owl:TransitiveProperty 0.030 ✘✓✓

• owl:equivalentClass 0.021 ✘✓ ✓

• owl:InverseFunctionalProperty 0.030 ✘✓✓

• owl:equivalentProperty 0.030 ✘✓✓

• owl:someValuesFrom 0.030 ✘✓✓

• owl:hasValue 0.028 ✘✓✓

ScalableReasoning: In-mem T-Box

Main optimisation: Store T-Box in memory

T-Box: (loosely) data describing classes and properties.

Aka. schemata/vocabularies/ontologies/terminologies.

E.g.,

foaf:topic owl:inverseOf foaf:page .

sioc:UserAccount rdfs:subClassOf foaf:OnlineAccount .

Most commonly accessed datafor reasoning

Quite small (~0.1% for our Linked Data corpus)

High selectivity (if you prefer)

A-Box:Lots?s foaf:page ?o . vs.

T-Box:Fewfoaf:page ?p ?o .+?s ?p foaf:page .

Scan 1: Scan input data separate T-Box statements, load T-Box statements into memory

Do T-Box level reasoning if required (semi-naïve)

Scan 2: Scan all on-disk data, join with in-memory T-Box.

ScalableReasoning: Two Scans

Scalable Reasoning: No A-Box Joins

ON-DISKA-BOX

• Execution of three rules:

OWL 2 RL ruleprp-inv1

?p1 owl:inverseOf ?p2 .

?x ?p1 ?y .

⇒ ?y ?p2 ?x .

OWL 2 RL ruleprp-rng

?p rdfs:range ?c .

?x ?p ?y.

⇒ ?y a ?c .

OWL 2 RL ruleprp-spo1

?p1 rdfs:subPropertyOf ?p2 .

?x ?p1 ?y.

⇒ ?x ?p2 ?y .

...

ex:me foaf:homepage ex:hp .

...

IN-MEMT-BOX

ON-DISK OUTPUT

...

ex:hp rdf:type foaf:Document .

ex:me foaf:page ex:hp .

ex:hp foaf:topic ex:me .

...

Scalable Reasoning: A-Box joins?

• However: some rules do require A-Box joins

• ?p a owl:TransitiveProperty . ?x ?p ?y . ?y ?p z .

⇒ ?x ?p ?z .

• Difficult to engineer a scalable solution (which reaches a fixpoint) for Linked Data(?)

• A lot of useful reasoning still possible without A-Box joins…

• rdfs:subClassOf 0.295 ✓

• rdfs:range 0.294 ✓

• rdfs:domain 0.292 ✓

• rdfs:subPropertyOf 0.090 ✓

• owl:FunctionalProperty 0.063 ✘

• owl:disjointWith 0.049 ✘

• owl:inverseOf 0.047 ✓

• owl:unionOf 0.035 ✓

• owl:SymmetricProperty 0.033 ✓

• owl:equivalentClass 0.021 ✓

• owl:InverseFunctionalProperty 0.030 ✘

• owl:equivalentProperty 0.030 ✓

• owl:someValuesFrom 0.030 ✓/✘

Scalable Distributed Reasoning

...

...

ex:me ex:presented ex:ThisTalk

...

...

...

ex:me ex:presented ex:ThisTalk

...

...

...

ex:me ex:presented ex:ThisTalk

...

...

...

ex:me ex:presented ex:ThisTalk

...

...

...

ex:me ex:presented ex:ThisTalk

...

...

EXTRACTT-BOX

EXTRACT T-BOX

EXTRACTT-BOX

EXTRACTT-BOX

EXTRACTT-BOX

COLLECTT-BOX

COLLECTT-BOX

COLLECTT-BOX

COLLECTT-BOX

COLLECTT-BOX

SAMET-BOX

SAMET-BOX

SAMET-BOX

SAMET-BOX

SAMET-BOX

...

...

...

...

...

DIFF.A-BOX

DIFF.A-BOX

DIFF.A-BOX

DIFF.A-BOX

DIFF.A-BOX

...

...

ex:me ex:presented ex:ThisTalk

...

...

...

ex:me ex:presented ex:ThisTalk

...

...

...

ex:me ex:presented ex:ThisTalk

...

...

...

ex:me ex:presented ex:ThisTalk

...

...

...

ex:me ex:presented ex:ThisTalk

...

...

LOCAL OUTPUT

LOCAL OUTPUT

LOCAL OUTPUT

LOCAL OUTPUT

LOCAL OUTPUT

...

...

ex:me ex:presented ex:ThisTalk

...

...

...

ex:me rdf:type ex:Awesome .

...

...

ex:me ex:presented ex:ThisTalk

...

...

ex:me ex:presented ex:ThisTalk

...

...

ex:me ex:presented ex:ThisTal

9 machines: Total 3.35 hours

Aidan Hogan, Jeff Z. Pan, Axel Polleres, Stefan Decker: SAOR: Template Rule Optimisations for Distributed Reasoning over 1 Billion Linked Data Triples. International Semantic Web Conference (1) 2010: 337-353

Jesse Weaver, James A. Hendler: Parallel Materialization of the Finite RDFS Closure for Hundreds of Millions of Triples. International Semantic Web Conference 2009: 682-697

Jacopo Urbani, Spyros Kotoulas, Eyal Oren, Frank van Harmelen: Scalable Distributed Reasoning Using MapReduce. International Semantic Web Conference 2009: 634-649

Jacopo Urbani, Spyros Kotoulas, Jason Maassen, Frank van Harmelen, Henri E. Bal: OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples. ESWC (1) 2010: 213-227

A-Box Joins

Consolidation: Baseline

timbl:i

identica:45563

dbpedia:Berners-Lee

• Use provided owl:sameAs mappings in the data

timbl:i owl:sameas identica:45563 .

dbpedia:Berners-Lee owl:sameas identica:45563 .

• Store “equivalences” found

timbl:i ->

identica:45563 ->

dbpedia:Berners-Lee ->

Consolidation: Baseline

timbl:i

identica:45563

dbpedia:Berners-Lee

• For each set of equivalent identifiers, choose a canonical term

timbl:i rdf:type foaf:Person .

identica:48404 foaf:knows identica:45563 .

dbpedia:Berners-Leedpo:birthDate “1955-06-08”^^xsd:date .

dbpedia:Berners-Lee rdf:type foaf:Person .

identica:48404 foaf:knows dbpedia:Berners-Lee .

dbpedia:Berners-Leedpo:birthDate “1955-06-08”^^xsd:date .

timbl:i

identica:45563

dbpedia:Berners-Lee

Afterwards, rewrite identifiers to their canonical version:

ExtendedConsolidation

• Infer owl:sameAs through reasoning (OWL 2 RL/RDF)

• explicit owl:sameAs (again)

• owl:InverseFunctionalProperty

• owl:FunctionalProperty

• owl:cardinality 1 / owl:maxCardinality 1

foaf:homepage a owl:InverseFunctionalProperty .

timbl:i foaf:homepage w3c:timblhomepage .

…then apply consolidation as before

Consolidation: Results

• ~12 million explicit owl:sameAs triples (as before)

• ~8.7 million thru. owl:InverseFunctionalProperty

• ~106 thousand thru. owl:FunctionalProperty

• none thru. owl:cardinality/owl:maxCardinality

In terms of equivalences found (baseline vs. extended):

• ~2.8 million sets of equivalent identifiers

• (1.31x baseline)

• ~14.86 million identifiers involved

• (2.58x baseline)

• ~5.8 million URIs

• !!(1.014x baseline)!!

Heterogeneity poses a significant problem for consuming Linked Data

• Heterogenity in schema

• Heterogenity in naming

…but we can use the mappings provided by publishers to integrate heterogeneous Linked Data corpora (with a little caution)

• Lightweight rule-based reasoning can go a long way

• Deceit/Noise ≠ End Of World

• Consider source of data!

• Inconsistency ≠ End Of World

• Useful for finding noise in fact!

• Explicit owl:sameAs vs. extended consolidation:

• Extended consolidation mostly (but not entirely) for consolidating blank-nodes from older FOAF exporters

Aidan Hogan

Day 3

Session 2

RDF Index Designs (1/4): Horizontal table-per-class

Class: Car

• Pros:

• Fast for certain queries, esp. “star shaped” queries

• Little redundancy in cells

• Cons:

• Becomes very sparse for larger schema

• Lots of nulls needed

• Special handling needed for multi-valued attributes

Class: Person

RDF Index Designs (2/4): Vertical triple table

• Pros:

• No more nulls needed

• Flexible for updates (even to schema)

• Multi-valued attributes no problem

• Cons:

• Lot’s of self-joins

• Lot’s of redundancy in the cells

RDF Index Designs (3/4): Vertical table per prop.

Property: model

Property:ownsCar

Class: car

Property: type

• Pros:

• Less redundancy

• Cons:

• Potentially many tables

RDF Index Designs (4): Hybrid

Property: seeAlso

Class: Car

• Pros:

• ~Depends

• Cons:

• Likely to be more costly to manage

Class: Person

Property: img

• RDB-based indexes

• Store data in a relational database

• Typically B+Trees or similar RDB technology

• Sometimes horizontal (RDB-like) schema

• Mostly vertical (RDF-like) tables

• 4store, AllegroGraph, Bigdata, BigOWLIM, Hexastore, Jena SDB, Mulgara, Redland, Virtuoso, etc.

• Native RDF stores

• Custom storage solutions

• HPRD, Jena TDB, RDF3X, SIREn, Voldemort, YARS2

• YARS2: Sparse indexes

• SIREn: IR-style indexes over Lucene

• Distinction not always clear-cut!

• Combination of in-memory and on-disk storage

• Triple stores

• Only service simple RDF triple patterns

• RDF-3X, SIREn, 3store, etc.

• ?s rdf:type foaf:Person .

• aidan ?p galway .

• ?s ?p ?o .

• Also service patterns involving named graphs

• Typical for indexing data from multiple sources

• Needed for SPARQL querying!!

• GRAPH ?g {?s rdf:type foaf:Person}

• GRAPH foaf.rdf {aidan ?p galway }

• FROM graph1.rdf … WHERE { ?s ?p ?o . }

• Virtuoso, BigOWLIM, Jena TDB/SDB, YARS2, 4store, Hexastore, etc.

• (subject, predicate, object, graph)

• graph sometimes called context

• 2^4 = 16 patterns to service!

• Requires six different indexes to service all 16 quad patterns

• assuming prefix lookups

Data Table

Dictionary

• Pros:

• Can load more data in memory

• Faster to compute joins

• Smaller on-disk footprint

• Cons:

• Maintain a potentially massive dictionary

• Slower to externalise streaming results

x4,000

(2)

x40

(1)

• Equi-joins are commutative

• What ordering to execute them in?

• Choice of various techniques

• Nested-loop join

• Hash join

• Index join

• Use selectivity estimates…

• Other techniques known from databases!

?person foaf:based_near dbpedia:Korea .

aidan foaf:knows ?person .

• Pros:

• Speed-up response times

• Better fault-tolerance

• Cons:

• Expensive!

animation: four animantions: first, too much data for one machine, add more machines, possible to store all data

second, too much

• Pros:

• Handle more data

• Commodity hardware ~cheap

• Cons:

• Joins expensive to compute

• More complex architecture and maintenance

animation: four animantions: first, too much data for one machine, add more machines, possible to store all data

second, too much

kmi:tom ?p ?o ?c

kmi:tom foaf:interest wikipedia:Beer kmi:tomfoaf.rdf

compute hash mod 4

• Pros:

• Can route query directly to the machine

• Cons:

• Load-balancing, esp. for predicates and values of rdf:type

kmi:tom foaf:interest wikipedia:Beer kmi:tomfoaf.rdf

kmi:tom foaf:interest wikipedia:Beer kmi:tomfoaf.rdf

random distribution

• Pros:

• Cons:

• At query-time, don’t know which machine to ask…

?s foaf:interest ?p ?o

Q

random distribution

-

-

-

kmi:tom foaf:interest wikipedia:Beer kmi:tomfoaf.rdf

RDB-based indexes

• 4store,

• AllegroGraph,

• Bigdata,

• BigOWLIM,

• Hexastore,

• Jena SDB,

• Mulgara,

• Redland,

• Virtuoso, etc.

Native RDF stores

• HPRD,

• Jena TDB,

• RDF3X,

• SIREn,

• Voldemort,

• YARS2, etc.

• Benchmark of common SPARQL engines

• Set of assorted SPARQL queries and fixed data

• Results for query-mixes per hour:

Christian Bizer, Andreas Schultz: The Berlin SPARQL Benchmark.

Int. J. Semantic Web Inf. Syst. 5(2): 1-24 (2009)

• Lot’s of work in the area!

• Native stores vs. RDB-style stores

• Triple stores vs. Quad stores

• Optimisations

• OIDs

• Replication

• Distribution

• Join Selection/Reordering, etc.

• No definitive solution…

“In previous papers, some of us predicted the end of ‘one size fits all’ as a commercial relational DBMS paradigm.

“These papers presented reasons and experimental evidence that showed that the major RDBMS vendors can be outperformed by 1-2 orders of magnitude by specialized engines in the data warehouse, stream processing, text, and scientific database markets.”

[Stonebraker et al.; 2007]