1 / 37

T-SPARQL: a TSQL2-like Temporal Query Language for RDF

First International Workshop on Querying Graph Structured Data – GraphQ 2010 (in conj . with ADBIS 2010 – Novi Sad , Serbia, September 2010). T-SPARQL: a TSQL2-like Temporal Query Language for RDF. Fabio Grandi Alma Mater Studiorum - Università degli Studi di Bologna.

alland
Download Presentation

T-SPARQL: a TSQL2-like Temporal Query Language for RDF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. First International Workshop on Querying Graph Structured Data – GraphQ 2010(in conj. with ADBIS 2010 – Novi Sad, Serbia, September 2010) T-SPARQL: a TSQL2-like TemporalQueryLanguagefor RDF Fabio Grandi Alma Mater Studiorum - Università degli Studi di Bologna

  2. Introduction • Some application fields require the maintenance of past versions of an RDF graph (e.g. encoding a domain ontology) after changes • For instance, in the legal domain: • Ontologies evolve as a natural consequence of the dynamics involved in normative systems • Agents must often deal with a past perspective (e.g. a Court judging today on some fact committed in the past) • Moreover, several time dimensions are usually important for applications in such domains GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  3. Multi-temporalversioning • Time dimensions of interest in the legal domain: • Validity timeis the time a norm is in force in the real world • Efficacy timeis the time a norm can be applied to a concrete case;while such cases exist, the norm continues its efficacy though no longer in force • Transaction timeis the time a norm is stored in the computer system • Publication timeis the time a norm is published on the Official Journal GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  4. Temporal RDF Data Models • Temporal RDF data models have been recently proposed, the proposals remarkably include:[Gutierrez, Hurtado & Vaisman, 2007] [Pugliese, Udrea & Subrahmanian, 2008] [Tappolet & Bernstein, 2009] • Index structures (e.g. tGRIN and keyTree) have been proposed for efficient processing of temporal queries • Interval timestamping of RDF triples is adopted • A single time dimension (valid time) is usually considered GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  5. Temporal SPARQL Extensions Temporal extensions of the SPARQL query language for RDF have been proposed, including: extensions not based on a temporal data model [Frasincar, Borsje & Levering, 2009] extensions based on temporal logic [Mateescu, Meriot & Rampaceck, 2009] extensions based on mapping to plain SPARQL [Tappolet & Bernstein, 2009] Interval timestamping of RDF triples is adopted A single time dimension (valid time) is usually considered GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  6. The TSQL2 Temporal Query Language A consensual temporal extension of the standard database language SQL-92 Defined by a design committee of 18 temporal database experts chaired by Richard Snodgrass It represents the synthesis of more than a decade of work in temporal query languages It was aimed at collecting the best features of the previously proposed languages as to expressivity and user-friendliness Specification published as a book in1995 GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  7. The T-SPARQL Proposal Based on the temporal data model presented inF. Grandi, “Multi-temporal RDF Ontology versioning”, IWOD Workshop, 2009: multiple time dimensions are considered… temporal-element timestamping is adopted… … in order to preserve the scalability property of triple storage technology Presenting the main features of the TSQL2 language TSQL2-like temporal data types and operators TSQL2-like temporal selection and projection facilities GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  8. The Multi-temporal RDF Database Model • N-dimensionaltime domain: • T = T1 x T2x … x TNTi = [0,UC)i • Multi-temporal RDF triple: • ( s,p,o | T )sis a subjectpis a predicateoisanobjectT Tis a timestamp • Multi-temporal RDF database: • RDF-TDB = { ( s,p,o | T ) | T T } GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  9. Multi-temporal RDF Triples • A temporal triple ( s,p,o | T ) assigns a temporalpertinencetoan RDF triple ( s,p,o ) • The non-temporal triple ( s,p,o )is the value (or the contents) of the temporal triple ( s,p,o | T ) • The temporalpertinenceTis a subset of the time domain T representedby a temporalelement GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  10. TemporalElements • A temporal element [Gadia 1998] is a disjoint union of temporal intervals • Multi-temporal intervals are obtained as the Cartesian product of one interval for each temporal dimension • T = U1≤j≤mIj = U1≤j≤m [tjs, tje)1 x [tjs, tje)2 x … x [tjs, tje)N • Ij ∩ Ik = Ø for all 1≤j<k≤m GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  11. IntegrityConstraint • No value-equivalentdistincttriplesexist: ( s,p,o | T ), ( s,p,o | T  )  RDF-TDB:s=s  p=p  o=o  T=T • The constraintismadepossibleby the adoptionoftemporalelementtimestamping • Temporal elements lead to space saving, whenever the temporal pertinence of a triple is not a convex interval GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  12. Memory Saving with Temporal Elements • For example, even with a monodimensional time domain, the two value-equivalent triples with interval time-stamping ( t2 < t3 ):( s,p,o | [t1, t2) ) and ( s,p,o | [t3, t4)) can bemergedinto a single triple withelementtime-stamping: ( s,p,o | [t1, t2) U [t3, t4)) where the same space is required for the timestamps in both cases (i.e. the space needed by 4 time points) and the contents of the triple is stored twice in the former case and only once in the latter • Different triple versions are stored only once with a complex timestamp instead of storing multiple copies (value-equivalent triples) with a simple timestamp GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  13. An Example • The memory saving obtained with temporal elements grows with the dimensionality of the time domain! • The memory saving is also emphasized by the triple size with respect to the timestamp size • In very large RDF benchmark datasets, the average triple sizeranges from 80140 bytes (DBpedia, UScensus, LUBM, BSBM)to more than 600 bytes (UniProtKB) • The timestamp (date+time) data size in SQL is 68 bytes • In the example which follows we assume a bitemporal domain (valid + transaction time) GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  14. Representation of the Evolution of a Triple t0t1 t2 UC (s, p, o1 ) With temporal elements (3 triples needed)( s, p, o1 | [t0,t1)x[t0,UC) U [t1,UC)x[t0,t1) )( s, p, o2 | [t1,t2)x[t1,UC) U [t2,UC)x[t1,t2) )( s, p, o3 | [t2,UC)x[t2,UC) ) • Withtemporalintervals(5 needed) • ( s, p, o1 | [t0,t1)x[t0,UC) )( s, p, o1 | [t1,UC)x[t0,t1) ) ( s, p, o2 | [t1,t2)x[t1,UC) )( s, p, o2 | [t2,UC)x[t1,t2) )( s, p, o3 | [t2,UC)x[t2,UC) ) (s, p, o2 ) (s, p, o3 ) t0 t1 t2UC GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  15. Memory Saving Figures • Percentage space saving with temporal element vs interval timestamping. Avg. number of versions per triple in colums, triple size in bytes in rows. We assume 8-byte timestamps. • For instance, with 120-byte triples with 5 versions per triple on average, we have a 39,22% space saving.With 1 billion of triples, this means an RDF-TDB size of • 721 GB with temporal elements • 1.14 TB with temporal intervals GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  16. Outline of the T-SPARQL language • Time representation (temporal datatypes) • Temporal projection and selection GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  17. TimeRepresentation Like in TSQL2, time is discrete with a minimal system-dependent unit called chronon Three baseTemporal Datatypes: Datetimeinstantaneous event without duration,conventionally represented as a chronon Periodset of consecutive chronons on the time axischaractherized by two datetime-type boundaries Intervalpure duration, non anchored on the time axis,represented by a multiple of the chronon GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  18. Temporaldatatypes • The datetimedatatypecorrespondsto the xs:dateTime XML Schema primitive datatypeexamples:"2010-01-01"^^xs:date"2010-01-01T00:00:00.000+01:00"^^xs:dateTime • The intervaldatatypecorrespondsto the xs:duration XML Schema primitive datatype examples:"P2Y"^^xs:duration"P1Y2M3DT5H20M30.123S"^^xs:duration GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  19. Temporaldatatypes • The period datatype requires the definition of a new datatype as XML Schema extension:xs:period • with a new constructor:fn:period($arg1 as xs:dateTime, $arg2 as xs:dateTime) as xs:periodexample:"[2010-01-01,2010-01-31]"^^xs:periodequiv. tofn:period("2010-01-01"^^xs:date, "2010-01-31"^^xs:date) GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  20. The xs:perioddatatype • The xs:perioddatatype is assumed to be compatible with the standard xs:gYearMonth and xs:gYear datatypes: "[2010-01-01,2010-01-31]"^^xs:period = "2010-01"^^xs:gYearMonth "[2009-01-01,2009-12-31]"^^xs:period = "2009"^^xs:gYear GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  21. The xs:perioddatatype • Two predefined functions can be used to extract the left and right boundaries from xs:perioddata:fn:begin($arg1 as xs:period) as xs:dateTimefn:end($arg1 as xs:period) as xs:dateTime examples: fn:begin("[2010-01-01, 2010-01-31]"^^xs:period) = "2010-01-01"^^xs:datefn:end("2009"^^xs:gYear) = "2009-12-31"^^xs:date GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  22. The xs:temporalElementdatatype • We also assume a new primitive xs:temporalElementdatatype to be defined to represent temporal elements • The constructor has a variable number of xs:period-type arguments, example:fn:temporalElement( "[2008-06-01,2009-07-15]"^^xs:period, "[2009-11-01,2010-02-21]"^^xs:period )= "[2008-06-01,2009-07-15]+[2009-11-01,2010-02-21]"^^xs:temporalElement GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  23. Built-infunctionsforxs:temporalElement • Like in TSQL2 useful functions are available to extract the first (last) period from an element:fn:first($arg1 as xs:temporalElement) as xs:periodfn:last($arg1 as xs:temporalElement) as xs:period • In order to extract the first (last) chronon of an element, the fn:begin (fn:end) function can directly be applied also to elements, that is:fn:begin(T) = fn:begin(fn:first(T))fn:end(T) = fn:end(fn:last(T)) GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  24. Temporal Projection • Specifies which temporal pertinence has to be assigned to the results of a T-SPARQL query • The query result can be: • a temporal RDF graph consistent with the underlying data model (timeslice query) • a regular, non-temporal RDF graph (snapshot query) • an arbitrary tuple set • A TSQL2-like INTERSECT clause is available to assign the right temporal pertinence to timeslice query results GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  25. Temporal Projection Given a time point t= (t1, t2,…, tN)  T we define the RDF database snapshot valid at t asRDF-TDB(t) = { ( s,p,o ) | ( s,p,o | T )  RDF-TDB  t  T} In T-SPARQL:CONSTRUCT { ?s,?p,?o} WHERE { TGRAPH < …myURI… > { ?s, ?p, ?o | ?t } . FILTER ?t CONTAINS "(t1, t2,…, tN) " . } Given a time period I =I1xI2x … xInT we define the RDF database timeslice valid in I asRDF-TDB(I) = { ( s,p,o | T' ) | ( s,p,o | T )  RDF-TDB  T'= T∩ I ≠ Ø } In T-SPARQL:TCONSTRUCT { ?s,?p,?o | INTERSECT( ?t, "(I1x I2x …x IN) " ) .} WHERE { TGRAPH < …myURI… > { ?s, ?p, ?o | ?t } } GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  26. Timestamp Variables Graph patterns to be used in the WHERE clause of the SELECT statement are augmented with an optional fourth position where matching with triple timestamps can be specified, e.g. _:e ex:Dept "Toys" | ?t where the variable ?t binds to the timestamp of a temporal triple whose (non-temporal) contents are:_:e ex:Dept "Toys" i.e. the timestamp variable ?t represents the time an employee denoted by the blank node _:e has been working in the Toys department GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  27. Temporal Selection In the T-SPARQL FILTER clause, TSQL2-like temporal (binary infix) predicates can be used to specify constraints over timestamp variables, e.g. FILTER ( VALID(?t) OVERLAPS "[2010-01-01,2010-12-31]"^^xs:period && TRANSACTION(?t) CONTAINS "2009-06-01"^^xs:date ) which only matches timestamps ?t whose valid time component overlaps January 2010 and whose transaction time component contains the June 1, 2009 time point i.e. the temporal triple whose timestamp is bound to ?t is selected only if it is (even partially) valid in January 2010, as of June 1, 2009. GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  28. Temporal Selection Operators • The available comparison operators are the same as in TSQL2: • They can be used to compare (monodimensional) temporal elements, periods and time points; also operands with different types can be compared (owing to reducibility to chronon sets) • The user-friendly operators, whose definition is close to their meaning in English, form a non minimal but complete set, equivalent to the Allen’s Algebra for intervals and time points GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  29. Query Examples We assume ex: is a prefix referencing a namespace involving the definition of employee data: @prefix ex: <http://myExample.org/employee/> . Sample employee data (temporal RDF graph): _:emp1 rdf:type ex:emp_:emp1 ex:Name "Ann" _:emp1 ex:Salary "2200"^^xs:integer | "[2009-06-01,2009-09-30]+[2009-06-01,UC]"^^xs:temporalElement _:emp2 rdf:type ex:emp_:emp2 ex:Name "Tom" _:emp2 ex:Salary "2000"^^xs:integer | "[2008-01-01,2008-12-31]"^^xs:temporalElement_:emp2 ex:Salary "2200"^^xs:integer | "[2009-01-01,UC]"^^xs:temporalElement GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  30. Query Example (1) • A query involving both temporal selection and projection(result not organized as a temporal RDF graph) • SELECT ?salary INTERSECT(?t,"[2007-01-01,2009-12-31]") WHERE {?emp rdf:type ex:emp ; ex:Name "Tom" ; ex:Salary ?salary | ?t .FILTER ( VALID(?t) OVERLAPS "[2007-01-01,2009-12-31]"^^xs:period ) . } • The query retrieves the Tom’s salary history from 2007 to 2009 • An implied conjunct&& TRANSACTION(?t) CONTAINS fn:current-date() is assumed in the FILTER clause to retrieve only current data GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  31. Query Example (2) A similar query can be used to retrieve the same data after a database rollback to the beginning of 2008 SELECT ?salary INTERSECT(?t,"[2007-01-01,2009-12-31]") WHERE {?emp rdf:type ex:emp ; ex:Name "Tom" ; ex:Salary ?salary | ?t .FILTER ( VALID(?t) OVERLAPS "[2007-01-01,2009-12-31]"^^xs:period && TRANSACTION(?t) CONTAINS "2008-01-01"^^xs:date) . } The query retrieves the Tom’s salary history from 2007 to 2009, as of January 1, 2008 GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  32. Query Example (3) A query involving both temporal selection and projection(result not organized as a temporal RDF graph) SELECT ?salary INTERSECT(?t,"[2007-01-01,2009-12-31]") WHERE {?emp rdf:type ex:emp ; ex:Name "Tom" ; ex:Salary ?salary | ?t .FILTER ( VALID(?t) OVERLAPS "[2007-01-01,2009-12-31]"^^xs:period ) . } The query retrieves the Tom’s salary history from 2007 to 2009 An implied conjunct&& TRANSACTION(?t) CONTAINS fn:current-date() is assumed in the FILTER clause to retrieve only current data GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  33. Query Example (4) A query involving a sort of temporal join involving a comparison between the duration of two validity periods SELECT ?ename WHERE {?emp1 rdf:type ex:emp ; ex:Name "Ann" ; ex:Salary ?salary | ?ts .?emp2 rdf:type ex:emp ; ex:Name ?ename ; ex:Dept "Toys" | ?tt .FILTER ( ?salary > "20000"^^xs:integer && xs:duration(VALID(?tt)) > xs:duration(VALID(?ts)) ) . } The query retrieves the name of the employees (?emp2) who have worked in the Toys department longer than Ann (?emp1) has made $20,000 GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  34. Query Example (5) An optional modifier PERIOD can be specified in the declaration of temporal variables SELECT ?ename WHERE {?emp1 rdf:type ex:emp ; ex:Name ?ename ; ex:Dept "Sales" | ?t .FILTER ( xs:duration(VALID(?tt)) > "P2Y"^^xs:duration ) ) . } This first query version retrieves the name of the employees who worked in the Sales department for more than two years (altogether) GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  35. Query Example (6) An optional modifier PERIOD can be specified in the declaration of temporal variables SELECT ?ename WHERE {?emp1 rdf:type ex:emp ; ex:Name ?ename ; ex:Dept "Sales" | ?t PERIOD .FILTER ( xs:duration(VALID(?tt)) > "P2Y"^^xs:duration ) ) . } This second query version retrieves the name of the employees who worked (continuously) in the Sales department for a period longer than two years GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  36. Query Example (7) The PERIOD modifier can also be used to refernce consecutive periods within the same data history SELECT ?ename ?job WHERE {?emp rdf:type ex:emp ; ex:Name ?ename ; ex:Job ?job | ?t1 PERIOD . ex:Job "Direct2or" | ?t2 PERIOD . ex:Job ?job | ?t3 PERIOD .FILTER ( VALID(?t1) MEETS VALID(?t2) && VALID(?t2) MEETS VALID(?t3) ) . } This query retrieves the name of the employees who returned to their previous job (?job) after having been directors for some time GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

  37. Conclusions and Future Work • We presented T-SPARQL a temporal SPARQL extension supporting the temporal RDF database model we introduced in [Grandi 2009] employing triple timestamping with multi-dimensional temporal elements • T-SPARQL is equipped with the basic temporal constructs introduced for the TSQL2 query language and works with an extended set of the temporal datatypes, functions and operators available in the SPARQL specification • Future work will consider the design and implementation of a prototype query engine supporting a T-SPARQL interface and the adoption of suitable index and storage structures for efficiently querying temporal RDF graphs GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF

More Related