1 / 28

3-HOP:A High-Compression Indexing Scheme for Reachability Query

3-HOP:A High-Compression Indexing Scheme for Reachability Query. Yang Xiang Kent State University Joint work with Ruoming Jin (KSU), Ning Ruan (KSU), and Dave Fuhry (KSU). Reachability Query. The problem : Given two vertices u and v in

antonioo
Download Presentation

3-HOP:A High-Compression Indexing Scheme for Reachability Query

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3-HOP:A High-Compression Indexing Scheme for Reachability Query Yang Xiang Kent State University Joint work with Ruoming Jin (KSU), Ning Ruan (KSU), and Dave Fuhry (KSU)

  2. Reachability Query The problem: Given two vertices u and v in a directed graph G, is there a path from u to v ? ?Query(1,11) Yes ?Query(3,9) No 15 14 11 13 10 12 6 7 8 9 3 4 5 1 2 Directed Graph  DAG (directed acyclic graph) by coalescing the strongly connected components

  3. Applications • XML • Biological networks • Ontology • Knowledge representation (Lattice operation) • Object programming (Class relationship) • Distributed systems (Reachable states)

  4. Prior Work

  5. Existing work classification and their limitation • Existing work can be classified into two big categories: • Using spanning structures, such as chains or trees. • Using 2-hop strategy • Major limitation: • When graphs are denser, the size or the compressed transitive closure grows very large

  6. 9 5 1 Lin:{7} Lout:{5} 10 6 2 Lin:{8} Lout:{6} Lin:{7} 7 11 3 Lout:{7} Lin:{8} 8 12 4 Lout:{8} 3-Hop Intuition of 3-Hop Lout:{7} 9 5 1 Lout:{6,7} Lin:{7} Lout:{7} Lout:{5,6,7} Lout:{7} 10 6 2 Lin:{7} Lout:{6,7} Lout:{7} Lin:{7,8} 7 11 3 Lout:{7} Lin:{7} Lin:{7} 8 12 4 Lout:{8} Lin:{7,8} Lin:{7} 2-Hop

  7. C2 1 C1 C2 C3 C4 2 1 16 6 10 3 11 6 7 17 10 2 12 18 16 11 3 7 12 18 8 17 12 13 19 13 18 13 8 14 20 14 9 15 4 15 19 14 4 9 5 20 Chain Decomposition for 3-hop

  8. Overview of 3-HOP • VertexVertexVertex (2 hop) • VertexChain(VertexVertex)Vertex (Initial motivation of 3-HOP) • Chain(VertexVertex) Chain(VertexVertex) Chain(VertexVertex) (3 hop contour) • Chain decomposition is a spanning structure of G • Some special vertices in the graph are labeled by Lout (a subset of vertices it can reach) and/or Lin(a subset of vertices it can be reached from). • Chain decomposition plus the set of Lout and Lin are all that we need to design efficient reachability answering schemes.

  9. Key Problem • Given a chain decomposition {C1,C2,…,Ck} of a DAG, how can we utilize 3-hop strategy to maximally compress the transitive closure and answer reachability queries efficiently? • To answer the above question, we would first ask the following question: is it possible to compress the transitive closure by chain decomposition itself?

  10. C1 C3 1 10 11 C1 C2 C3 C4 1 16 6 10 2 12 11 3 7 17 2 12 18 13 3 8 13 19 14 20 14 9 4 C3 15 4 10 11 12 13 14 15 5 15 1 1 1 1 1 1 1 5 1 1 1 1 2 1 1 1 1 3 C1 1 4 1 5 Essential Information Between Two Chains Contour Points (110) (312) (515)

  11. C1 C2 C3 C4 y 5 1 2 3 4 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 C1 1 1 1 1 1 1 4 1 5 1 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7 1 1 1 C2 1 1 1 8 1 1 1 9 1 10 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 12 C3 1 1 1 1 1 1 1 1 13 1 1 1 1 1 14 15 1 16 1 1 1 1 1 1 1 1 1 17 1 1 1 1 1 1 1 1 C4 1 18 1 1 1 1 1 1 1 1 19 20 1 x Pseudo-upper Triangular Submatrix and Pseudo-diagonal C1 C2 C3 C4 1 16 6 10 11 7 17 2 12 18 3 8 13 19 20 14 9 4 15 5

  12. Calculate Contour Points • It is not necessary to calculate transitive closure, which costs O(mn) time , for calculating contour points. • Our algorithm calculates contour points in O(mklogn) time using a bottom up approach.

  13. 3-Hop Labeling by Contour Points To 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 C3 C4 C2 C1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 9 13 1 5 1 1 1 1 1 1 1 1 1 1 1 1 1 o:{6,11,15} 1 1 1 1 1 1 1 1 o:{6,11} o:{6} i:{2} 10 14 2 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 From o:{11,15} i:{2,7,12} i:{2,7} i:{2,7} 1 1 o:{11} 1 1 1 1 i:{2} i:{2} 11 15 3 7 1 1 1 1 1 1 1 1 1 1 1 1 o:{15} 12 16 4 8 1 1 1 1 1 1 1 1 1 Label size: 12 1

  14. 3-Hop Labeling by Covering Contour Points 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 C3 C4 C2 C1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 1 1 1 1 1 1 1 1 1 9 13 1 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 o:{6} 10 14 2 6 1 1 1 1 1 1 1 1 1 1 1 1 1 i:{7,12} i:{7} 1 1 1 1 1 11 15 3 7 1 1 1 1 1 1 1 1 1 1 1 12 16 4 8 1 1 1 1 1 Label size: 4 1

  15. How to find the minimum 3-hop labeling? Chain centered bipartite graph C3 C4 C2 C1 C2 C1 9 13 1 5 1 5 o:{6} 10 14 2 6 2 6 i:{7} i:{7} 11 15 3 7 7 11 15 12 16 4 8 8 12 16 Contour Points: (2,6), (2,11), (2,15), (7,11), (7,15), (12,15)

  16. Finding minimum 3-hop labeling • Build chain-centered bipartite graphs. • Among all the chain-centered bipartite graphs, find a bipartite subgraph such that the number uncovered contour points (edges) over the label size is maximum • The above goal can be converted to finding the maximum densest subgraph. • Now the key technical issue is how to quickly find the maximum densest subgraph.

  17. A quick 2-approximation algorithm for finding the maximum densest subgraph 2 3 3 Removing any vertex with degree 2 or less Removing any vertex with degree 3 or less 3 3 The graph has rank 3 2 2 A vertex rank or a graph rank will never increase in our labeling (indexing) algorithm. This is an important observation for designing a fast labeling algorithm.

  18. 3HOP Contour Algorithm in General Given a Chain Decomposition • Step 1: Calculate Contour Points • Step 2: Construct Chain Centered Bipartite Graphs • Step 3: Keep doing the following until all Contour Points are covered: • Find the densest subgraph among all Chain Centered Bipartite graphs (k times speed up is possible) • Label vertices according to the selected densest subgraph, delete the subgraph from the corresponding chain centered bipartite graph, and update covering information

  19. Theoretical Analysis • The labeling size returned by 3HOP Contour is at most O(logn) times larger than the miminum 3-hop labeling size. • The optimal 3-hop contour always has less labeling size than that of the optimal 2-hop. • The worst case running time of 2-hop is O(n3|Tc|), while it is O(kn2|contour|) for 3-hop.

  20. 3-HOP Contour Query C2 C1 C3 C4 1 16 6 10 o:{10} 17 Can 2 reach 20? 2out: {6,15} 20 in: {9,13} Since 69, the answer is Yes. Worst case complexity: O(n) 11 7 i:{11} 12 18 2 i:{7} o:{8,14} i:{7,13} 3 o:{6} 8 13 i:{18} 19 o:{18} i:{9} 20 14 9 o:{19} o:{9} 4 15 5 o:{15}

  21. 3-HOP Segment Query Segments on C1 w.r.t. reaching C2: [1,3] o:{6} [4,4] o:{9} Segments on C4 w.r.t. being reached by C2: [18,18] i:{7} [19,20] i:{9} …… (a total of O(nk) segments) Can 2 reach 20? 2[1,3], which can reach 6 20 [19,20], which can be reached by 9 Since 69, the answer is Yes. Worst case complexity O(lognk+k)=O(logn+k) C2 C1 C3 C4 1 16 6 10 o:{10} 17 11 7 i:{11} 12 18 2 i:{7} o:{8,14} i:{7,13} 3 o:{6} 8 13 i:{18} 19 o:{18} i:{9} 20 14 9 o:{19} o:{9} 4 15 5 o:{15}

  22. Experimental Evaluation • Implementation in C++ • 12 Synthetic datasets and 5 publicly available Real datasets • Synthetic datasets • 2k DAG with edge density = 2, 4, 6, 8, 10, 12 • 10k DAG with edge density= 2,5,10,15,20,25 • AMD Opteron 2.0GHz/ 8GB/ Linux

  23. Experimental Result (Synthetic Data, 2k)

  24. Experimental Result (Synthetic Data, 2k)

  25. Experimental Result (Synthetic Data, 10k)

  26. Real Datasets

  27. Conclusion • A novel 3-hop index scheme is proposed to maximally compress transitive closure, and it can efficiently answer reachability query with an acceptable index time. • In the Journal version, we plan to show how to efficiently answer distance query via 3-hop.

  28. Thanks!!!Questions?

More Related