330 likes | 451 Views
This document outlines the essential aspects of RDF databases, focusing on the Sesame system, its architecture, and scalability. It addresses the need for efficient tools to store, process, and reason with metadata, as simple availability is insufficient. Challenges related to storage and querying are discussed, highlighting issues like the performance of relational databases compared to native graph structures. Key components include the Repository Abstraction Layer and the RQL query language. Future directions for Sesame are explored, including potential migration paths to boost performance.
E N D
RDF Databases By: Chris Halaschek
Outline • Motivation / Requirements • Storage Issues • Sesame • General Introduction • Architecture • Scalability • RQL Introduction • Demo • Future Directions
Motivation • Having metadata available is not enough • Need tools to process, transform, and reason with the information • Need a way to store the metadata and interact with it
Requirements • Scalable • Good performance • Useful query language
Storage Issues • How to store the data? • In relational database as tables • Querying requires many joins…costly • Triples • Native graph structure • Querying requires graph traversals…need efficient algorithms
Sesame - Introduction • Open source RDF Schema-based repository and querying facility • Developed as a research prototype by Aidministrator Nederland bv • NLnet Foundation sponsors its further development as open source software
Sesame - Introduction • Can handle RDF data in XML-serialized RDF and N-Triples format • Can extract the contents of a Sesame repository in XML-serialized RDF, N-Triples, and N3 format
Repository • Many options due to Repository Abstraction Layer (RAL) • DBMS – relational, object-relational, etc • Existing RDF stores • RDF files • RDF network services
Repository Abstraction Layer (RAL) • Interface that translates RDF-specific methods to a specific DBMS • Defined by an RDF API • Created their own set of interfaces rather than adopt or extent the existing RDF API proposal • Existing API targeted main memory model • Theirs offers specific operations that support RDF Schema semantics (i.e. subsumption reasoning)
RAL Continued • Several of Sesame’s functional modules are clients of the RAL • Problems: • Must read from repository – performance decrease • Solution – selectively caching data in memory • For small repositories, all data can be cached
Functional Modules • Interact with RAL • RQL query module • Evaluates RQL queries • RDF administration module • Allows uploading RDF data and schema information, as well as deleting information • RDF export module • Allows extraction of schema and/or data from repository
RQL Query Module • Proposed RQL: • Developed within the European IST project C-Web • Follow-up project by ICS at FORTH, in Greece • Adopts the syntax of OQL • Sesame’s implementation of RQL is slightly different from the proposed RQL • Better compliance to W3C specificaitons • Support for optional domain and range restrictions • Queries are translated into sets of call to the RAL • Note: Also supports RDQL – based on SquishQL
Admin Module • Main functions: • Add RDF data/schema information • Clear repository • Retrieves information from an RDF(s) source and parses it using SiRPAC RDF parser • Parser delivers information to admin module in statement form – (S,P,O) • Module check statements for consistency and then inserts data
RDF Export Module • Exports the contents of a repository formatted in XML-serialized RDF • Supplies a basis for using Sesame in combination with other RDF tools
Communication with Sesame • Multiple options for various contexts • HTTP • RMI • SOAP • Intermediaries between the functional modules and their clients
Sesame - Scalability • Performance Tests • Uploaded and queried collection of nouns from Wordnet – 400,000 RDF statements • Performed on Sun UltraSPARC 5, 256 MB RAM • Used Java Servlets running on web server to communicate of HTTP • PostgreSQL version 7.1.2 repository
Scalability Continued • Uploading nouns • 94 minutes • 71 statements per second • Querying was much slower than expected • Due to distributed storage over multiple tables • Retrieving data required doing many joins
Sesame’s Future • Migration of Sesame to alternate repositories to boost performance • DAML + OIL support
RQL Introduction • Museum schema example
RQL - Syntax • Query typically built upon three clauses • Select • Projection over query results • From • Bind variables to specific locations in graph model • Where • Optional – constraint on values of variables in the from clause
RQL - Example select X, @P from {X} @P {Y} where Y like "Pablo" • x and y are bound to nodes • @P bound to a connecting edge - @ prefix signifies the variable is bound to properties • $ prefix signifies classes • http://sesame.aidministrator.nl/sesame/actionFrameset.jsp?repository=museum
RQL - Namespaces • In RDF, nodes and edges are identified by URIs • Can be very long • Namespace abbreviation mechanism • Extra clause • using namespace cult = http://www.icom.com/schema.rdf# • Simply type: cult:paints
RQL – Path Expressions • Specify a linear path through the graph select PAINTER, PAINTING, TECH from {PAINTER} cult:paints {PAINTING}. cult:technique {TECH} using namespace cult = http://www.icom.com/schema.rdf# • http://sesame.aidministrator.nl/sesame/actionFrameset.jsp?repository=museum
RQL – Querying Schema • Retrieving the class of a resource select X, $X, Y from {X : $X} cult:paints {Y} using namespace cult = http://www.icom.com/schema.rdf# • Variable $X is matched to the class of the resource value of X • http://sesame.aidministrator.nl/sesame/actionFrameset.jsp?repository=museum
RQL – Querying Schema • Constraining resources to a schema select X, Y from {X : cult:Cubist } cult:paints {Y} using namespace cult = http://www.icom.com/schema.rdf#
RQL – Standard Functions • Class (also Property) • subClassOf (also subProperyOf) • typeOf • In all above use ^ for only direct descendents (i.e. subClassOf^( cult:Painter ) )
RQL – subClassOf • Example: select X, @P, Y from {X} @P {Y} where X in subClassOf^( cult:Painter ) using namespace cult = http://www.icom.com/schema.rdf#
RQL – Advanced Queries • Set Operators • Union, Intersection, Difference • Logical Operators • Domain and Range Constraints • Comprehensive List: http://sesame.aidministrator.nl/publications/rql-tutorial.html
Future of RDF Databases • Standard query language • Improved storage structures • Native graph model
References / Links • Sesame: http://sesame.aidministrator.nl/ • NLnet Foundation: http://www.nlnet.nl/ • Original Specifications of RQL: http://139.91.183.30:9090/RDF/RQL