1 / 19

RDF Triple Stores

RDF Triple Stores. Nipun Bhatia Department of Computer Science. Stanford University. Contents . Introduction Different Architectures Implications An Example : Jena SDB Evaluations Evaluations using LUBM/DBPedia Open Research Issues

mimi
Download Presentation

RDF Triple Stores

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RDF Triple Stores Nipun Bhatia Department of Computer Science. Stanford University

  2. Contents • Introduction • Different Architectures • Implications • An Example : Jena SDB • Evaluations • Evaluations using LUBM/DBPedia • Open Research Issues • Which RDF Store to choose for a particular application? • Possible system diagram for Phenotype Annonations.

  3. Introduction • What is an RDF store? A system to provide a mechanism for persistent storage and access of RDF graphs. • Potential Applications areas: Plenty! Backend for Protege, BioPortal, Phenotype Annotations.

  4. Different Architectures • Based on their implementation, can be divided into 3 broad categories : In-memory, Native, Non-native Non-memory. • In – Memory : RDF Graph is stored as triples in main –memory. Eg. Storing an RDF graph using Jena API/ Sesame API. • Native : Persistent storage systems with their own implementation of databases. Eg. Sesame Native, Virtuoso, AllegroGraph, Oracle 11g. • Non-Native Non-Memory : Persistent storage systems set-up to run on third party DBs. Eg. Jena SDB.

  5. Implications • Scalability • Different query languages supported to varying degrees. • Sesame – SeRQL, Oracle 11g – Own query language. • Different level of inferencing. • Sesame supports RDFS inference, AllegroGraph – RDFS++, Oracle 11g – RDFS++, OWL Prime • Lack of interoperability and portability. • More pronounced in Native stores.

  6. Jena SDB • SDB basically is a Java Loader. • Multiple stores supported: MySQL, PostgreSQL, Oracle, DB2. • Takes incoming triples and breaks them down into components ready for the database. • Multiple layouts • Integration with the Joseki server. • SPARQL supported. (Non) Interest Declaration: I was previously an intern at HP Labs with the Jena team

  7. Evaluations • Third party evaluations for Sesame, Jena SDB, Virtuoso • Oracle 11g company evaluations • Methodology • LUBM – Lehigh University BenchMark • DBPedia • Multiple Queries • Load Times

  8. Evaluations • DB Pedia – Database of structured information extracted from Wikipedia. Information about places, persons, music albums and films[2] • LUBM – Synthetically generated RDF data containing universities, departments, students etc.[1] • Dataset size: • DataSet1: 15,472,624 triples; 2.1 GB • DataSet 2: LUBM 50 – 2.75 Million & LUBM 1000 – 55.09 Million • 3 Queries

  9. Loading Time-DataSet1

  10. Results – Query 1 • Simple select query – 2 variables

  11. Query 2 • Unconstrained Select Query – only predicate was specified.

  12. Query 3 • Complex Query – Uses filter

  13. Oracle 11g – DataSet 2

  14. Observations • Native Stores perform better than systems using third party stores. • Optimizations are possible • Each of the systems uses different database layouts. • Virtuoso – OGPS,POGS,PSOG,SOPG • SDB – SPO,GSPO • Hashing on SDB is very bad.

  15. Open Research Issues • Inferencing[4] • Present common implementations: • Make a number of small queries to propagate the effects of rule firing. • Each of these queries creates an interaction with the database. • Not very efficient • Approaches • Snapshot the contents of the database-backed model into RAM for the duration of processing by the inference engine. • Performing inferencing in-stream. • Precompute the inference closure of ontology and analyze the in-coming data-streams, add triples to it based on your inference closure. • Assumes rigid seperation of the RDF Data(A-box) and the Ontology data(T-box) • Even this maynot work for very large ontologies – BioMedical Ontologies

  16. Open Research Issues • Query Optimization • Third party stores undo’s any optimization done at the API level. • Better performance of native stores points to that direction. • Some work in optimizing SPARQL queries for in-memory story.

  17. Which RDF store to choose for an app? • Frequency of loads that the application would perform. • Single scaling factor and linear load times. • Level of inferencing. • Support for which query language. W3C recommendations. • Special system needs. Eg. Allegograph needs 64 bit processor.

  18. Phenotype Annotations Jena API Jena API Inferencing j Jena Model SDB Jena API Set of Ontologies required for Phenotype Annotationseg. PATO, Fly etc. MySQL / Virtuoso Phenotype Annotations Jena API j Jena API Jena Model SDB

  19. References • [1] http://esw.w3.org/topic/RdfStoreBenchmarking • [2] http://www4.wiwiss.fu-berlin.de/benchmarks-200801/ • [3] Kurt Rohloff et al.: An Evaluation of Triple-Store Technologies for Large Data Stores. Comparing Sesame, Jena and AllegroGraph. 2007 • [4]N Bhatia, A Seaborne – ‘Ingestion pipeline for RDF’

More Related