1 / 30

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs . Example 1: Social Network. Example 2: Bibliographical Network. Contributions. G-SPARQL language Pattern matching Reachability Hybrid execution engine Graph topology in main memory Graph data in relational database

kineks
Download Presentation

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs

  2. Example 1: Social Network

  3. Example 2: Bibliographical Network

  4. Contributions • G-SPARQL language • Pattern matching • Reachability • Hybrid execution engine • Graph topology in main memory • Graph data in relational database • Algebraic transformation • Operators • Optimizations • Experimental evaluation

  5. 1. G-SPARQL Query Language • Extends a subset of SPARQL • Based on triple pattern: (subject, predicate, object) • Sub-graph matching patterns on • Graph structure • Node attribute • Edge attribute • Reachability patterns on • Path • Shortest path

  6. G-SPARQL Syntax

  7. G-SPARQL Pattern Matching • Node attribute • ?Person @officeNumber “518” • Edge attribute • ?E @Role “Programmer” • Structural • ?Person worksAt Microsoft • ?Person ?E(worksAt) Microsoft

  8. G-SPARQL Reachability • Path • Subject ??PathVar Object • Shortest path • Subject ?*PathVar Object • Path filters • Path length • All edges • All nodes

  9. Example: G-SPARQL Query • SELECT ?L1 ?L2 • WHERE { ?X ??P ?Y. ?X @Label ?L1. ?Y @Label ?L2. ?X @Age ?Age1. ?Y @Age ?Age2. ?X Affiliated UNSW. ?Y ?E(Affiliated) Microsoft. ?X LivesIn Sydney. ?E @Title "Researcher". FILTER(?Age1 >= 40). FILTER(?Age2 >= 40). FILTERPATH( Length( ??P, <= 3) ). • }

  10. Outline • G-SPARQL language • Pattern matching • Reachability • Hybrid execution engine • Graph topology in main memory • Graph data in relational database • Algebraic transformation • Operators • Optimizations • Experimental evaluation

  11. 2. Hybrid Execution Engine • Reachability queries • Main memory algorithms • Example: BFS and Dijkstra’salgorithm • Pattern matching queries • Relational database • Indexing • Example: B-tree • Query optimizations, • Example: selectivity estimation, and join ordering • Recursive queries • Not efficient: large intermediate results and multiple joins

  12. Graph Representation established Node Label age office location keyword type authorOf affiliated published citedBy country order month title know supervise

  13. Hybrid Execution Engine: interfaces Traversal operations G-SPARQL query SQL commands

  14. 3. Intermediate Language & Compilation Traversal operations Front-end compilation Back-end compilation Physical execution plan G-SPARQL query Algebraic query plan Step 1 Step 2 SQL commands

  15. Intermediate Language • Objective • Generate query plan and chop it • Reachability part -> main-memory algorithms on topology • Pattern matching part -> relational database • Optimizations • Features • Independent of execution engine and graph representation • Algebraic query plan

  16. G-SPARQL Algebra • Variant of “Tuple Algebra” • Algebra details • Data: tuples • Sets of nodes, edges, paths. • Operators • Relational: select, project, join • Graph specific: nodeand edge attributes, adjacency • Path operators

  17. Relational

  18. Relational NOT Relational

  19. Front-end Compilation (Step 1) • Input • G-SPARQL query • Output • Algebraic query plan • Technique • Map • from triple patterns • To G-SPARQL operators • Use inference rules

  20. Front-end Compilation: Inference Rules

  21. Front-end Compilation: Optimizations • Objective • Delay execution of traversal operations • Technique • Order triple patterns, based on restrictiveness • Heuristics • Triple pattern P1 is more restrictive than P2 • P1 has fewer path variables than P2 • P1 has fewer variables than P2 • P1’s variables have more filter statements than P2’s variables

  22. Back-end Compilation (Step 2) • Input • G-SPARQL algebraic plan • Output • SQL commands • Traversal operations • Technique • Substitute G-SPARLQ relational operators with SPJ • Traverse • Bottom up • Stop when reaching root or reaching non-relational operator • Transform relational algebra to SQL commands • Send non-relational commands to main memory algorithms

  23. Back-end Compilation: Optimizations • Optimize a fragment of query plan • Before generating SQL command • All operators are Select/Project/Join • Apply standard techniques • For example pushing selection

  24. Example: G-SPARQL Query • SELECT ?L1 ?L2 • WHERE { ?X ??P ?Y. ?X @label ?L1. ?Y @label ?L2. ?X @age ?Age1. ?Y @age ?Age2. ?X affiliated UNSW. ?Y ?E(affiliated) Microsoft. ?X livesIn Sydney. ?E @title "Researcher" FILTER(?Age1 >= 40). FILTER(?Age2 >= 40). • }

  25. Example: Query Plan

  26. 4. Experimental Evaluation • Objective • This is a good idea • Good performance from DBMS and main memory topology • Data sets • Real ACM bibliographic network • Synthetic graphs • See technical report

  27. Experimental Environment • Workload • Created Q1 … Q12 • Process • Compare to Neo4J (non-optimized, optimized) • Environment • Implementation • Main memory algorithms in C++ • IBM DB2 • PC Server

  28. Results on Real Dataset

  29. Response time on ACM Bibliographic Network

  30. Conclusions • G-SPARQL Language • Expresses pattern matching and reachability queries on attributed graphs • Hybrid engine • Graph topology in main memory • Graph data in database • Compilation into algebraic plan • Operators and optimizations • Evaluation • Real and synthetic datasets • Good performance • Leveraging database engine and main memory topology

More Related