1 / 32

Efficiently Answering Reachability Queries on Large Directed Graphs

Efficiently Answering Reachability Queries on Large Directed Graphs. Ruoming Jin Kent State University Joint work with Yang Xiang (KSU), Ning Ruan (KSU), and Haixun Wang (IBM T.J. Watson). Reachability Query. The problem : Given two vertices u and v in

agalia
Download Presentation

Efficiently Answering Reachability Queries on Large Directed Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficiently Answering Reachability Queries on Large Directed Graphs Ruoming Jin Kent State University Joint work with Yang Xiang (KSU), Ning Ruan (KSU), and Haixun Wang (IBM T.J. Watson)

  2. Reachability Query The problem: Given two vertices u and v in a directed graph G, is there a path from u to v ? ?Query(1,11) Yes ?Query(3,9) No 15 14 11 13 10 12 6 7 8 9 3 4 5 1 2 Directed Graph  DAG (directed acyclic graph) by coalescing the strongly connected components

  3. Applications • XML • Biological networks • Ontology • Knowledge representation (Lattice operation) • Object programming (Class relationship) • Distributed systems (Reachable states) Graph Databases

  4. Prior Work 2-HOP (O(nm1/2), and O(n4)), HOPI, and heuristic algorithms

  5. Limitation of Tree-based approaches • Finding a good tree cover is expensive • Tree cover cannot represent some common types of DAGs, like Grid • Compression limitations • Chain (1-parent, 1-child) • Tree (1-parent, multiple children) • Most existing methods which utilize the tree cover are greatly affected by how many edges are left uncovered

  6. Overview of Path-Tree • Chain->Tree->Path-Tree (2 parents / multiple children) • Path-tree cover is a spanning subgraph of G in a tree shape (T) • A node in the tree T corresponds to a path in G and an edge in T corresponds to the edges between two paths in G • 3-tuple labeling exists for any path-tree to answer reachability query in O(1)

  7. Path-Tree in a Nutshell 15 14 P4 11 13 10 12 P2 6 7 8 9 P4 P1 P3 3 4 5 P3 1 2 P2 P1 Path-Graph is not necessarily a planar graph The reachability between any two nodes can be answered in O(1)

  8. Key Problems • How to construct a path-tree? • Algorithm • How can a path-tree help with reachability queries? • Labeling • Transitive Closure Compression • How does path-tree compare with the existing methods? • Optimality

  9. Constructing Path-Tree • Step 1: Path-Decomposition of DAG • Step 2: Minimal Equivalent Edge Set between any two paths • Step 3: Path-Graph Construction • Step 4: Path-Tree Cover Extraction

  10. Step 1: Path-Decomposition 15 (PID,SID) =(2, 5) 14 11 For any two nodes (u, v) in the same path, u  v if and only if (u.sid  v.sid) 13 10 12 6 7 8 9 P4 3 4 5 P3 1 2 P2 P1 Simple linear algorithm based on topological sort can achieve a path-decomposition

  11. Step 2: Minimal equivalent edge set The reachability between any two paths can be captured by a unique minimal set of edges 15 15 14 14 11 11 13 10 13 10 6 7 P1 P2 P1  P2 6 7 3 4 3 4 1 2 1 2 P2 P2 P1 P1 The edges in the minimal equivalent edge set do not cross (always parallel)!

  12. Step 3: Path-Graph Construction Weight reflects the cost we have to pay for the transitive closure computation if we exclude this path-tree edge 15 14 P2 11 2 4 13 10 12 5 P4 P1 2 2 1 1 6 7 8 9 1 P4 P3 3 4 5 P3 Weighted Directed Path-Graph 1 2 P2 P1

  13. Step 4: Extracting Path-Tree Cover P2 P2 2 2 4 5 5 P4 P4 P1 P1 2 2 2 1 1 1 P3 P3 Weighted Directed Path-Graph Maximal Directed Spanning Tree Chu-Liu/Edmonds algorithm, O(m’+ k logk)

  14. Key Problems • How to construct a path-tree? • Algorithm • How can path-tree help with reachability queries? • Labeling • Transitive Closure Compression • How does path-tree compare with the existing methods? • Optimality

  15. 3-Tuple Labeling for Reachability 15 [1,3] P2 14 11 [1,4] P4 13 10 12 P1 [1,1] [2,2] 6 7 8 P3 9 P4 3 4 5 Interval labeling (2-tuple) High-level description about paths Pi  Pj ? P3 1 2 P2 P1 DFS labeling (1-tuple)

  16. DFS labeling 4 15 1 2 10 14 7 9 P3 P1 5 15 13 1 6 8 3 14 6 3 11 8 13 P2 11 4 10 2 7 12 5 P4 9 12 • Starting from the first vertex in the root-path • Always try to visit the next vertex in the same path • Label a node when all its neighbors has been visited • L(v)=N-x, x is the # of nodes has been labeled

  17. 3-Tuple Labeling for Reachability 4 15 1 2 10 14 7 9 P3 P1 5 15 13 1 6 8 3 14 6 3 11 8 13 P2 11 4 10 2 7 12 5 P4 [1,3] 9 12 P2 uv if and only if 1) Interval label I(u)  I(v) 2) DFS label L(u)  L(v) ?Query(9,15) P4[1,4]  P1[1,1] and 5 < 15 Yes ?Query(9,2)?Query(5,9) [1,4] P4 P1 [1,1] [2,2] P3

  18. Transitive Closure Compression 15 Path-tree cover (including labeling) can be constructed inO(m + n logn) 14 11 13 10 12 6 7 8 9 3 4 5 1 2 An efficient procedure can compute and compress the transitive closure in O(mk), k is number of paths in path-tree

  19. Key Problems • How to construct a path-tree? • Algorithm • How can path-tree help with reachability query? • Labeling • Transitive Closure Compression • How does path-tree compare with the existing methods? • Optimality

  20. Theoretical Analysis • Optimal Path-Tree Cover (OPTC) Problem: • Given a path-decomposition, what is the optimal path-tree cover to maximally compress the transitive closure? • OptIndex weight assignment based on computing the predecessor set • Optimal Path-Decomposition (OPD) Problem: • Assuming we only use path-decomposition to compress the transitive closure, what is the optimal path-decomposition to maximally compress the transitive closure? • Minimal-cost flow problem • What is the overall optimal path-decomposition?

  21. Superiority of Path-Tree Cover • The optimal tree cover is a special case of path-tree cover when each vertex corresponds to a single path and the weight is based on OptIndex. • The path-tree cover approach can compress the transitive closure with size being smaller than or equal to the optimal tree cover approach (and consequently optimal chain cover approach).

  22. Experimental Evaluation • Implementation in C++ • 12 Real datasets used in Dual-labeling paper and GRIPP paper • Synthetic datasets • Sparse DAG with edge density = 2 • AMD Opteron 2.0GHz/ 2GB/ Linux • PTree1 (OptIndex) and PTree2 • Mainly compare with Optimal Tree Cover

  23. Real Datasets

  24. Experimental Result (Real Data) On average 10 times better than Tree On average 3 times better than Tree

  25. Experimental Result (Synthetic Data)

  26. Experimental Result (Synthetic Data)

  27. Experimental Result (Synthetic Data)

  28. Conclusion • A novel Path-Tree structure is proposed to assist the compression of transitive closure and answering reachability query • Path-tree has potential to integrate with other existing methods to further improve the efficiency of reachability query processing

  29. Thanks!!

  30. Step 3: Path-Graph Construction Weight reflects the penalty if we exclude this path-tree edge 15 14 P2 11 2 4 13 10 12 5 P4 P1 2 2 1 1 6 7 8 9 1 P4 P3 3 4 5 P3 Weighted Directed Path-Graph 1 2 P2 P1

  31. 15 14 11 13 10 6 7 3 4 1 2 P2 P1 P1 P2 Step 2: Constructing Minimal Equivalent Edge Set (PiPj) • Ordering the vertices in Pi and Pj by decreasing order • Finding the first vertex v in P_j that P_i can reach • Finding the last vertex u in P_i that reach v • Removing all the edges cross (u,v) and • repeat 2-4

  32. 3-Tuple Labeling for Reachability 15 [1,3] P2 14 11 [1,4] P4 13 10 12 P1 [1,1] [2,2] 6 7 8 P3 9 P4 3 4 5 Interval labeling (2-tuple) High-level description about paths Pi  Pj ? P3 1 2 P2 P1 DFS labeling (1-tuple)

More Related