1 / 37

RDF Aggregate Queries and Views

RDF Aggregate Queries and Views. Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan. Maintenance of RDF Aggregate Views. Introduction of RDF and RDQL RDQL Extension for Aggregate Views Aggregate View Maintenance Algorithms AMX

koren
Download Presentation

RDF Aggregate Queries and Views

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

  2. Maintenance of RDF Aggregate Views • Introduction of RDF and RDQL • RDQL Extension for Aggregate Views • Aggregate View Maintenance Algorithms AMX • Implementation and Experiments • Related Work

  3. Introduction • Resource Description Framework (RDF) • W3C Recommendation • Represents metadata about resources identifiable on the web (by Uniform Resource Identifier (URI)) • Triple: (Resource, Property, Value) • (Artist, rdf:type, rdfs:Class) • (Painter, rdf:type, rdfs:Class) • (Painter, rdfs:subClassOf, Artist)

  4. RDF Schema <?xml version="1.0"?> <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base="http://www.auctionschema.com/schema1#"> <rdfs:Class rdf:ID="Artist"/> <rdfs:Class rdf:ID="Painter"><rdfs:subClassOf rdf:resource="#Artist"/></rdfs:Class> <rdfs:Datatype rdf:about="&xsd;string"/> <rdf:Property rdf:ID="fname"> <rdfs:domain rdf:resource="#Artist"/> <rdfs:range rdf:resource="&xsd;string"/> </rdf:Property> </rdf:RDF> <?xml version="1.0"?> <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]> <rdf:RDF xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns1="http://www.auctionschema.com/schema1#"> <rdf:Description rdf:about="http://www.artist.net#guyrose"> <rdf:type rdf:resource="ns1:Painter"/> <ns1:fname rdf:datatype="&xsd;string"> Guy </ns1:fname> </rdf:Description> </rdf:RDF> RDF Instance

  5. fname Artist String <?xml version="1.0"?> <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base="http://www.auctionschema.com/schema1#"> <rdfs:Class rdf:ID="Artist"/> <rdfs:Class rdf:ID="Painter"><rdfs:subClassOf rdf:resource="#Artist"/></rdfs:Class> <rdfs:Datatype rdf:about="&xsd;string"/> <rdf:Property rdf:ID="fname"> <rdfs:domain rdf:resource="#Artist"/> <rdfs:range rdf:resource="&xsd;string"/> </rdf:Property> </rdf:RDF> <?xml version="1.0"?> <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]> <rdf:RDF xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns1="http://www.auctionschema.com/schema1#"> <rdf:Description rdf:about="http://www.artist.net#guyrose"> <rdf:type rdf:resource="ns1:Painter"/> <ns1:fname rdf:datatype="&xsd;string"> Guy </ns1:fname> </rdf:Description> </rdf:RDF> subClassOf Painter fname &r1 Guy &r1 = http://www.artist.net#guyrose

  6. RDQL: RDF Query Language SELECT?highprice WHERE (?artist, <ns1:lname>, "Rose"), (?artist, <ns1:fname>, "Guy"), (?artist, <ns1:creates>, ?artifact), (?artifact, <ns1:estimated>, ?price), (?price, <ns1:high>, ?highprice), (?artifact, <ns1:presented>, ?date) AND 2004-04-01 <= ?date <= 2004-04-30 USING ns1 FOR http://www.auctionschema.com/schema1#> graph pattern

  7. RDQL Extension for Aggregates and Views CREATEVIEW AS SELECTmax(?highprice) WHERE (?artist, <ns1:lname>, "Rose"), (?artist, <ns1:fname>, "Guy"), (?artist, <ns1:creates>, ?artifact), (?artifact, <ns1:estimated>, ?price), (?price, <ns1:high>, ?highprice), (?artifact, <ns1:presented>, ?date) AND 2004-04-01 <= ?date <= 2004-04-30 USING ns1 FOR http://www.auctionschema.com/schema1#>

  8. Aggregate Query • Aggregate operators, e.g. min, max, sum, count, average • GROUP BY clause • Output a table of tuples • Output can be (i) an RDF instance or (ii) a table • Advantage of (i): allows us to further query the result • However, (ii) allows any forms of tables, which include the possibility to output in the form of an RDF instance if the table consists of a set of RDF tuples.

  9. We are expanding the syntax of RDQL so that it allows constants in SELECT clauses which equivalently creates new resources using the constants. • For example, the previous query can be modified as follows CREATEVIEW AS SELECT <ns1:works_by_guyrose>, <ns1:maxprice>, max(?highprice) WHERE (?artist, <ns1:lname>, "Rose"), (?artist, <ns1:fname>, "Guy"), (?artist, <ns1:creates>, ?artifact), (?artifact, <ns1:estimated>, ?price), (?price, <ns1:high>, ?highprice), (?artifact, <ns1:presented>, ?date) AND 2004-04-01 <= ?date <= 2004-04-30 USING ns1 FOR http://www.auctionschema.com/schema1#> • The result is a valid RDF statement (<ns1:works_by_guyrose>,<ns1:maxprice>,``800000"^^ns1:USD)

  10. Aggregate View Maintenance • Relational Approach • Store all triples in a relational table with schema (Resource, Property, Value) OR • Store resources and values of the same property in a separate relational table with schema (Resource, Value) • #self-joins = (#triples in where-clause) – 1 • Large number of delta rules during relational view maintenance  expensive

  11. Aggregate View Maintenance • Our Approach • Localized search in RDF graphs • Modified version of breadth-first search starting at the inserted/deleted edge • auxiliary data are needed for certain aggregate views • min, max, avg

  12. Distributive Aggregate Function • An aggregate function f is distributive w.r.t a source update operation if and only if • the updated value is based on its old value and update without reference to the source. • Examples: count, sum, average w.r.t. insertion, deletion and update • For average, we will need an additional attribute size which stores the size of intermediate result S in order to compute the correct updated value (or, we can use sum, count to calculate it) • max and min are distributive w.r.t. insertion, but not deletion and update • Auxiliary data computed from S help to avoid the need to refer to the source.

  13. graph pattern

  14. BAG

  15. BAG 800000

  16. BAG 800000, 500000 SELECTmax(?highprice)

  17. Compute Aggregates Algorithm CAA Algorithm CAA(I, Q) /* Input: RDF graph I, query Q */ /* Output: table T(Q, I) */ • GP  BuildGP(Q); X  aggregate variables of Q; • Y  GROUP BY variables of Q; • S  [VRetrieve(θ, GP, X U Y) | θMSearchAll(GP, Q, I)]; • Return T(Q, I)  TCompute(S, Q);

  18. Aggregate View Maintenance Algorithms AMX • AMI – Insertion • AMD – Deletion • AMT – Triple Modification • AMR – Resource Modification

  19. BAG 800000, 500000 Update: Insertion paints

  20. BAG 800000, 500000 paints

  21. BAG 800000, 500000, 60000 SELECTmax(?highprice) paints

  22. AMI for Insertion Algorithm AMI(I, Q, A(Q, I), T(Q, I), t) /* Input: RDF graph I, query Q, auxiliary data A(Q, I), query result T(Q, I), inserted triple t */ /* Output: table T(Q, I U t), auxiliary data A(Q, I U t) * • GP  BuildGP(Q); • X  aggregate variables of Q; • Y  GROUP BY variables of Q; • If TMatch(GP, t) == TRUE, then • ΔS  [VRetrieve(θ, GP, X U Y) | θMSearch(GP, Q, t, I U t)]; • return (T(Q, I U t), A(Q, I U t))  TMaintainI(T(Q,I), ΔS, A(Q, I), Q); • else, return (T(Q, I U t), A(Q, I U t))  (T(Q, I), A(Q, I));

  23. Algorithm MSearch(GP, Q, t, I) /* Input: graph pattern GP, query Q, triple t, RDF graph I */ /* Output: Θ = {θ | θ is a pattern matching} */ • Θ ; • for each t’  GP s.t.  θ’, t θ’ = t’ θ’, • for each θ bSearch(t, t’, GP, I), • if θ satisfies the constraints in Q, then Θ Θ U θ; • return Θ;

  24. Handling GROUP BY • From GROUP BY clause, each tuple in ΔS affects a particular group. • TMaintainI only maintain each affected group (and its corresponding auxiliary data) using affecting tuples. • Delete empty groups and insert new groups.

  25. TMaintainI • Handling sum, count, min, max • No auxiliary data required • Suppose f(x) is an aggregate function on attribute x, F the original result, F’ the new result • F’ = F + if f = sum • F’ = F + |ΔS| if f = count • F’ = min([F] U πx(ΔS)) if f = min • F’ = max([F] U πx(ΔS)) if f = max • πx(ΔS) projects a bag of values of x from ΔS

  26. TMaintainI • Handling average • We need size of S size’ = size+|ΔS|

  27. BAG 800000, 500000, 60000 Update: Deletion paints

  28. BAG 800000, 500000, 60000 paints

  29. BAG 500000, 60000 SELECTmax(?highprice) paints

  30. AMD for Deletion Algorithm AMD(I, Q, A(Q, I), T(Q, I), t) /* Input: RDF graph I, query Q, auxiliary data A(Q, I), query result T(Q, I), deleted triple t */ /* Output: table T(Q, I - t), auxiliary data A(Q, I - t) * • GP  BuildGP(Q); • X  aggregate variables of Q; • Y  GROUP BY variables of Q; • If TMatch(GP, t) == TRUE, then • ΔS  [VRetrieve(θ, GP, X U Y) | θMSearch(GP, Q, t, I)]; • return (T(Q, I - t), A(Q, I - t))  TMaintainD(T(Q,I), ΔS, A(Q, I), Q); • else, return (T(Q, I - t), A(Q, I - t))  (T(Q, I), A(Q, I));

  31. TMaintainD • Handling min, max • Min and max are not distributive w.r.t. deletion • We need to store πx(S) which projects a bag of values of x from S • The new aggregate value F’ is obtained by: • F’ = min(πx(S - ΔS)) if f = min • F’ = max(πx(S - ΔS)) if f = max • We need to update πx(S) to become • πx(S) - πx(ΔS)

  32. Implementation and Experiment • Implemented in Java • Jena – RDQL Engine of HP • Comparison with Relational Approach (standard view maintenance algorithm on relational tables) • Counting Algorithm in Gupta et al. "Maintaining Views Incrementally", SIGMOD 1993 • Dataset: Chef Moz Project RDF dump • Data stored in memory

  33. Other Related Work • Volz, Oberle, Studer [DBFUSION’02] • the first to introduce a view mechanism for RDF data • Their views require that • the results contain class instances (i.e., a subject or object variable), or • the result itself has the pattern of RDF statement (i.e., a triple containing subject, predicate and object). • Magkanaraki et al [ISWC’03] • proposed RVL, a view definition language that can also create virtual RDF schemas and restructure class and property hierarchies such that new resources, property values, classes and property types can be created. • None of these works specifically address (i) aggregates in RDF or (ii) the problem of maintaining aggregate RDF views.

  34. Summary • Aggregate Views are important for RDF applications • RDQL Extension for Views and Aggregates • Aggregate View Maintenance Algorithms AMX • Localized search in RDF graphs

  35. Thank you very much! Questions and Answers

More Related