1 / 36

On the E ffi ciency of Joining Group Patterns in SPARQL Queries

On the E ffi ciency of Joining Group Patterns in SPARQL Queries. Mar ía-Esther Vidal 1 , Edna Ruckhaus 1 , Tomás Lampo 1 , Amadís Martínez 1 , Javier Sierra 1 and Axel Polleres 2. Universidad Simón Bolívar. 1 Universidad Sim ón Bolívar, Caracas, Venezuela

yovela
Download Presentation

On the E ffi ciency of Joining Group Patterns in SPARQL Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Efficiency of Joining Group Patterns in SPARQL Queries María-Esther Vidal1, Edna Ruckhaus1, Tomás Lampo1, Amadís Martínez1, Javier Sierra1 and Axel Polleres2 Universidad Simón Bolívar 1 Universidad Simón Bolívar, Caracas, Venezuela 2 DERI, National University of Ireland, Galway On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  2. Motivation-Cloud of Linked Data • Explosion in the number of: • Linking Open Data datasets. • Controlled vocabularies: • MeSH, GO, PO… • Extremely large datasets of linking data. • Open Government Data • Social Networks. • DBpedia. • Geonames. • Sensorpedia. In October 2007, Cloud of Linked Data Datasets of over two billion RDF triples, interlinked by over two million RDF links. By May 2009 had grown to 4.2 billion RDF triples, interlinked by around 142 billions RDF links! Today the Cloud of Linked Data has at least 13,112,409,691 triples. Techniques to efficiently store and query Linked Data are required! On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  3. Query Planning: Example • A left-linear queryplan: • {?E vote:winner 'Nay’ . ?E dc:title ?T . • ?E vote:hasBallot ?I . ?I vote:option ?X . • ?J vote:option ?X . ?E vote:hasBallot ?J . • ?J vote:voter 'people:L000174’. • FILTER (?I != ?J)} • Evaluation cost is the sum of all the • intermediate results. 3.826 394.720 216 ‘L000174’ 21.600 78.631.005 Evaluation Cost=82+8200+8.200+78.631.005+ 394.720+3.828= 79.046.033 21.600 8.200 8.200 21.600 82 21.600 82 216 ‘Nay’ On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  4. Query Planning: Example • {?V voter ‘L000174’ . ?V option ?X . ?E hasBallot ?V} Small star-shaped groups: Main idea: Optimize and arrange these separetely! option njoin njoin ?V Assumption: Such groups are prototypical to identify/filter objects in many RDF queries voter hasBallot On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  5. Query Planning: Example • A bushy queryplan: • {{{?E vote:winner ’Nay’ . ?E dc:title ?T} . • {?E vote:hasBallot ?I . ?I vote:option ?X} • } . • {?J vote:voter ’people:L000174’ . • ?J vote:option ?X . ?E vote:hasBallot ?J}. • FILTER (?I != ?J) • } 3.826 3.826 Evaluation cost=216+216+82+21.600 +8.200+3.826=34.140 3.826 216 8.200 82 21.600 21.600 216 21.600 82 216 21.600 21.600 216 ‘Nay’ ‘L000174’ On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  6. Query Planning: Example ‘L000174’ Plans comprised of small star-shaped groupsmay speed-up query Evaluation cost in several orders of magnitude! Evaluation cost: 79.046.033 Evaluation cost: 34.140 ‘L000174’ ‘Nay’ ‘Nay’ On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  7. Query Planning: Example Transforming a left-linear linear query plan into a bushy query plan • Fold into a star-shaped group rule • A njoin B  {A njoin B} • Grouping • {A njoin B} njoin{C njoin D}  {A njoin B} gjoin{C njoin D} • Associativity • {A njoin B} njoinC  A njoin {B njoin C} • Symmetry • A njoin B  B njoin A ‘L000174’ ‘L000174’ ‘Nay’ ‘Nay’ On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  8. Our Challenges • Develop a query engine able to efficiently evaluate SPARQL-based queries: • Provide a set of physical operators able to exploit the properties of the small star-shaped groups. • Develop cost models to accurately estimate plan Evaluation cost. • Implement query optimization techniques able to explore the space of plans comprised of small star-shaped groups. On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  9. Let’s go into details… • Physical Operators • Query Optimizer: • Hybrid Cost Model • Adaptive Sampling • Simulated Annealing Query Optimizer • … and what did we gain? • Experimental Results (Datasets, Results) On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  10. Star-Shaped Groups Groups of pattern combinations according to exactly one common variable. {?A1vote:voter people:L000174 . ?A1vote:option ?O1 . ?E1 vote:hasBallot ?A1 } On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  11. Physical Operators Njoin: on star-shaped groups we perform (index) nested loop joins: scan the triples of the outer group and loop on the inner group for matching triples. outer and inner groups can be of any shape. Gjoin: we join star-shaped groups by independently evaluating each group and then matching their results Each group can be a star-shaped group or a sub-tree of any shape.

  12. Physical Operators - njoin 8200 combinations Instantiations ?E 2004-53 2004-49 njoin njoin hasBallot Instantiations ?E 2004-53 2004-49 winner title On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  13. Physical Operators - gjoin njoin njoin gjoin winner title hasBallot option On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  14. Hybrid Cost Model Estimate cost and cardinality: • Techniques based on adaptive sampling (Lipton et al, 1990) to: • Accurately estimate cost and cardinality for star-shaped groups • Cost Formulas similar to well-known relational database cost models • Applied to patterns and sub-trees. On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  15. Adaptive Sampling – Example:Estimating cardinality P = {?A1vote:voter people:L000174. ?A1vote:option ?O1 . ?E1 vote:hasBallot ?A1} • The population u is all the valid instantiations of one of the patterns, • ?A1vote:voter people:L000174 and the population is partitioned according to ?A1, where N is the number of different instantiations. • M instantiations of ?A1 are randomly selected to evaluate and compute • cardinality: {a vote:option ?O1 . ?E1 vote:hasBallot a} for each a in M card( P ) = (a in M card({avote:option ?O1 . ?E1 vote:hasBallot a})/M) x N 15

  16. Cost Model-Formulas Njoin cost model formula: Gjoin cost model formula:

  17. Simulated Annealing Query Optimizer • Random walks: • Over the space of bushy query execution plans. • Performed in stages: • Stages consist of an initial plan generation step. • One or more plan transformation steps. • Stages start with the generation of a random plan. • Successive plan transformations are applied. • Probability of transforming a plan p into a plan p’ depends on an acceptance probability function P(p,p’,T), where T is the Temperature. • Note: Allows “anytime query optimization”

  18. Simulated Annealing Transformation Rules Rule1: Symmetry: Rule2: Associativity: Rule3: Distributivity (Linear to Bushy):

  19. Simulated Annealing Transformation Rules Rule4: Grouping: Rule5: Fold into a star-shaped group: Rule6: Unfold a star-shaped group:

  20. Related Work & Experiments • RDF-based engines: • Jena-ARQ • Jena TDB • Sesame • YARS2 • RDF-3X Neither has native Optimization strategies tailored to small star-shaped groups !!!

  21. Experimental Study Datasets: • DS1: US Congress bills 2004 http://www.govtrack.us/data/ • DS2: YAGO http://www.mpi-inf.mpg.de/~suchanek/downloads/yago/

  22. Experimental Study Queries: • Benchmark1 – US Congress Bills 2004 • 9 queries between 3 and 7 patterns • At least one pattern is instantiated • Benchmark2 – US Congress Bills 2004 • 60 queries, more than 12 triple patterns, between 1 and 7 gjoins between small star-shaped groups • Benchmark3 – US Congress Bills 2004 • 16 queries, between 3 and 9 triple patterns, small star-shaped groups • Benchmark4 - YAGO • 11 queries, between 17 and 25 patterns; only one query has a no empty answer. • Benchmark5 - YAGO • 9 queries, between 17 and 26 patterns; all the queries have no empty answers.

  23. Experimental Set Up Metric: evaluation time measured by the time command. Optimizer Set Up: • initial temperature 700, 20 iterations. • Linux Ubuntu machine with an Intel Pentium Core2 Duo 3.0 GHz and 8GB RAM. • Probabilities of transformation rules:

  24. Efficiency of the Star-Shaped Group Physical Operators. Njoin and Gjoin were implemented in: Jena 2.3 (ARQ query engine) Jena 2.7 and Jena TDB (native storage engine): RDF-3X0.3.3 and RDF-3X0.3.4: DVLDB OneQL (our own system) RDFJoin (new – experiments not in the paper) Experiment Goals

  25. Effectiveness of the Proposed Optimization Techniques. Star-Shaped Optimal Plans generated by OneQL were evaluated in: Jena 2.3 and 2.7 Jena TDB RDF-3X0.3.3 and RDF-3X0.3.4 DVLDB OneQL Sesame RDFJoin Experiment Goals

  26. Experiment I: Efficiency of the Physical Operators in Different RDF Query Engines

  27. Performance of Star-Shaped PhysicalOperators. • Compared the benefits of using our gjoin physical implementation and the njoin implementation provided by Jena versions 2.3, Jena 2.7 and Jena TDB. • Benchmark2: 60 queries • US Congress Bills 2004 • Our gjoin implementation was able to speed up the evaluation time up to three orders of magnitude. On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  28. Efficiency of Star-Shaped Group Physical Operators-RDF-3X • Compared the benefit of using our Star-Shaped Group physical operators and RDF-3X physical operators: • Benchmark4: 13 queries • Dataset YAGO • Star-Shaped Group physical operators are able to speed up the evaluation time up to two orders of magnitude.

  29. Experiment II: Effectiveness of the Proposed Optimization in Different RDF Query Engines

  30. Benchmark1: 9 queries Dataset: US Congress Bills 2004 Plans comprised of Small Star-Shaped Group are able to speed up the evaluation time up to two orders of magnitude. Effectiveness of the Proposed Optimization Techniques in different RDF Engines

  31. Effectiveness of the Proposed Optimization Techniques in Jena • Comparison original and the optimal evaluation times in GJena 2.3 • Benchmark3: 16 queries • US Congress Bills 2004 • Optimal query has significantly lower cost than original query • Speed-up evaluation time up to two order of magnitude. • Plans with most significant improvement composed of bushy trees with small star-shaped groups On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  32. Effectiveness of the Proposed Optimization and Evaluation Techniques in RDF-3X • Comparison of Star-Shaped and RDF-3X optimizations versus Optimal Plan: • Benchmark4:11 queries • YAGO • Evaluation times of the optimal plans can up to six order of magnitude lower than RDF-3X optimal plans. • Evaluation times of Star-Shaped plans outperforms RDF-3X plans; Star-Shaped plans were comprised of small star-shaped groups. • Results are similar in RDF-3X0.3.3 and RDF-3X0.3.4. NEW! On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  33. Effectiveness of the Proposed Optimization and Evaluation Techniques in RDFJoin NEW! • Comparison of Star-Shaped and RDF-3X optimizations versus Optimal Plan in RDFJoin: • Benchmark4: 11 queries • YAGO • Evaluation times of the optimal plans can up to two order of magnitude lower than RDF-3X optimal plans. • Evaluation times of Star-Shaped plans outperforms RDF-3X plans; Star-Shaped plans were comprised of small star-shaped groups. On the Efficiency of Joining Group Patterns in SPARQL Queries. Vidal, Ruckhaus, Lampo, Martinez, Sierra and Polleres. ESWC'2010. Greece

  34. Effectiveness Star-Shaped Optimization Techniques NEW! • Comparison of Star-Shaped and RDF-3X optimizations versus Optimal Plan: • Benchmark6: 9 queries • YAGO • Evaluation times of the optimal plans can up to three order of magnitude lower than RDF-3X optimal plans. • Evaluation times of Star-Shaped plans outperforms RDF-3X plans; Star-Shaped plans were comprised of small star-shaped groups.

  35. What’s done, what’s next We reported on optimization and evaluation techniques for SPARQL: Queries comprised of small star-shaped groups may outperform original queries in several orders of magnitude. Optimization techniques tailored to identify plans comprised small star-shaped groups can speed evaluation time of SPARQL queries Future work includes: the enhancement of the hybrid cost model with Bayesian inference capabilities that consider correlations between patterns in a query (cf. our IRLMES WS paper) Implement star-shaped group physical operators in other RDF engines. Thank you! Questions?

  36. Efficiency of the Star-Shaped Group Physical Operators. Njoin and Gjoin were implemented in: Jena 2.3 (ARQ query engine): Gjoin was implemented by extending the method stream in the com.hp.hpl.jena.sparql.engine.main.OpCompiler class to independently evaluate each inner and outer group. Jena 2.7 and Jena TDB (native storage engine): Gjoin was implemented by extending the function execute of the src.com.hp.hpl.jena.sparql.engine.main.OpExecutor class to independently evaluate each inner and outer group. RDF-3X0.3.3 and RDF-3X0.3.4: Njoin and Gjoin were implemented with RDF-3X Hash Joins. Joins between Patterns in a Star-Shaped Group were implemented with RDF-3X Merge Join. DVLDB: gjoin inner and outer groups were implemented as relational views that only projected out the join variables. OneQL: njoin and gjoin were implemented on top of structures that provide direct access to the data. njoin was implemented by extending the side-way passing process in Prolog to use indices; gjoin inner and outer groups are independently evaluated and intermediate results are temporally stored in Prolog main memory to identify matches. RDFJoin: gjoin inner and outer groups were implemented as relational views that only projected out the join variables. Experiment Goals

More Related