3 hop a high compression indexing scheme for reachability query
Sponsored Links
This presentation is the property of its rightful owner.
1 / 28

3-HOP:A High-Compression Indexing Scheme for Reachability Query PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on
  • Presentation posted in: General

3-HOP:A High-Compression Indexing Scheme for Reachability Query. Yang Xiang Kent State University Joint work with Ruoming Jin (KSU), Ning Ruan (KSU), and Dave Fuhry (KSU). Reachability Query. The problem : Given two vertices u and v in

Download Presentation

3-HOP:A High-Compression Indexing Scheme for Reachability Query

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


3-HOP:A High-Compression Indexing Scheme for Reachability Query

Yang Xiang

Kent State University

Joint work with Ruoming Jin (KSU), Ning Ruan (KSU), and Dave Fuhry (KSU)


Reachability Query

The problem: Given two vertices u and v in

a directed graph G, is there a path from u to v ?

?Query(1,11)

Yes

?Query(3,9)

No

15

14

11

13

10

12

6

7

8

9

3

4

5

1

2

Directed Graph  DAG (directed acyclic graph) by

coalescing the strongly connected components


Applications

  • XML

  • Biological networks

  • Ontology

  • Knowledge representation (Lattice operation)

  • Object programming (Class relationship)

  • Distributed systems (Reachable states)


Prior Work


Existing work classification and their limitation

  • Existing work can be classified into two big categories:

    • Using spanning structures, such as chains or trees.

    • Using 2-hop strategy

  • Major limitation:

    • When graphs are denser, the size or the compressed transitive closure grows very large


9

5

1

Lin:{7}

Lout:{5}

10

6

2

Lin:{8}

Lout:{6}

Lin:{7}

7

11

3

Lout:{7}

Lin:{8}

8

12

4

Lout:{8}

3-Hop

Intuition of 3-Hop

Lout:{7}

9

5

1

Lout:{6,7}

Lin:{7}

Lout:{7}

Lout:{5,6,7}

Lout:{7}

10

6

2

Lin:{7}

Lout:{6,7}

Lout:{7}

Lin:{7,8}

7

11

3

Lout:{7}

Lin:{7}

Lin:{7}

8

12

4

Lout:{8}

Lin:{7,8}

Lin:{7}

2-Hop


C2

1

C1

C2

C3

C4

2

1

16

6

10

3

11

6

7

17

10

2

12

18

16

11

3

7

12

18

8

17

12

13

19

13

18

13

8

14

20

14

9

15

4

15

19

14

4

9

5

20

Chain Decomposition for 3-hop


Overview of 3-HOP

  • VertexVertexVertex (2 hop)

  • VertexChain(VertexVertex)Vertex (Initial motivation of 3-HOP)

  • Chain(VertexVertex) Chain(VertexVertex) Chain(VertexVertex) (3 hop contour)

  • Chain decomposition is a spanning structure of G

  • Some special vertices in the graph are labeled by Lout (a subset of vertices it can reach) and/or Lin(a subset of vertices it can be reached from).

  • Chain decomposition plus the set of Lout and Lin are all that we need to design efficient reachability answering schemes.


Key Problem

  • Given a chain decomposition {C1,C2,…,Ck} of a DAG, how can we utilize 3-hop strategy to maximally compress the transitive closure and answer reachability queries efficiently?

    • To answer the above question, we would first ask the following question: is it possible to compress the transitive closure by chain decomposition itself?


C1

C3

1

10

11

C1

C2

C3

C4

1

16

6

10

2

12

11

3

7

17

2

12

18

13

3

8

13

19

14

20

14

9

4

C3

15

4

10

11

12

13

14

15

5

15

1

1

1

1

1

1

1

5

1

1

1

1

2

1

1

1

1

3

C1

1

4

1

5

Essential Information Between Two Chains

Contour Points

(110)

(312)

(515)


C1

C2

C3

C4

y

5

1

2

3

4

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

2

1

1

1

1

1

1

1

1

1

1

1

1

1

1

3

C1

1

1

1

1

1

1

4

1

5

1

6

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

7

1

1

1

C2

1

1

1

8

1

1

1

9

1

10

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

12

C3

1

1

1

1

1

1

1

1

13

1

1

1

1

1

14

15

1

16

1

1

1

1

1

1

1

1

1

17

1

1

1

1

1

1

1

1

C4

1

18

1

1

1

1

1

1

1

1

19

20

1

x

Pseudo-upper Triangular Submatrix and Pseudo-diagonal

C1

C2

C3

C4

1

16

6

10

11

7

17

2

12

18

3

8

13

19

20

14

9

4

15

5


Calculate Contour Points

  • It is not necessary to calculate transitive closure, which costs O(mn) time , for calculating contour points.

  • Our algorithm calculates contour points in O(mklogn) time using a bottom up approach.


3-Hop Labeling by Contour Points

To

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

C3

C4

C2

C1

1

1

1

1

1

1

1

1

1

1

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

9

13

1

5

1

1

1

1

1

1

1

1

1

1

1

1

1

o:{6,11,15}

1

1

1

1

1

1

1

1

o:{6,11}

o:{6}

i:{2}

10

14

2

6

1

1

1

1

1

1

1

1

1

1

1

1

1

1

From

o:{11,15}

i:{2,7,12}

i:{2,7}

i:{2,7}

1

1

o:{11}

1

1

1

1

i:{2}

i:{2}

11

15

3

7

1

1

1

1

1

1

1

1

1

1

1

1

o:{15}

12

16

4

8

1

1

1

1

1

1

1

1

1

Label size: 12

1


3-Hop Labeling by Covering Contour Points

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

C3

C4

C2

C1

1

1

1

1

1

1

1

1

1

1

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1

1

1

1

1

1

1

1

1

1

9

13

1

5

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

o:{6}

10

14

2

6

1

1

1

1

1

1

1

1

1

1

1

1

1

i:{7,12}

i:{7}

1

1

1

1

1

11

15

3

7

1

1

1

1

1

1

1

1

1

1

1

12

16

4

8

1

1

1

1

1

Label size: 4

1


How to find the minimum 3-hop labeling?

Chain centered bipartite graph

C3

C4

C2

C1

C2

C1

9

13

1

5

1

5

o:{6}

10

14

2

6

2

6

i:{7}

i:{7}

11

15

3

7

7

11

15

12

16

4

8

8

12

16

Contour Points: (2,6), (2,11), (2,15), (7,11), (7,15), (12,15)


Finding minimum 3-hop labeling

  • Build chain-centered bipartite graphs.

  • Among all the chain-centered bipartite graphs, find a bipartite subgraph such that the number uncovered contour points (edges) over the label size is maximum

  • The above goal can be converted to finding the maximum densest subgraph.

  • Now the key technical issue is how to quickly find the maximum densest subgraph.


A quick 2-approximation algorithm for finding the maximum densest subgraph

2

3

3

Removing any vertex

with degree 2 or less

Removing any vertex

with degree 3 or less

3

3

The graph has rank 3

2

2

A vertex rank or a graph rank will never increase in our labeling (indexing) algorithm.

This is an important observation for designing a fast labeling algorithm.


3HOP Contour Algorithm in General

Given a Chain Decomposition

  • Step 1: Calculate Contour Points

  • Step 2: Construct Chain Centered Bipartite Graphs

  • Step 3: Keep doing the following until all Contour Points are covered:

    • Find the densest subgraph among all Chain Centered Bipartite graphs (k times speed up is possible)

    • Label vertices according to the selected densest subgraph, delete the subgraph from the corresponding chain centered bipartite graph, and update covering information


Theoretical Analysis

  • The labeling size returned by 3HOP Contour is at most O(logn) times larger than the miminum 3-hop labeling size.

  • The optimal 3-hop contour always has less labeling size than that of the optimal 2-hop.

  • The worst case running time of 2-hop is O(n3|Tc|), while it is O(kn2|contour|) for 3-hop.


3-HOP Contour Query

C2

C1

C3

C4

1

16

6

10

o:{10}

17

Can 2 reach 20?

2out: {6,15}

20 in: {9,13}

Since 69, the answer is Yes.

Worst case complexity: O(n)

11

7

i:{11}

12

18

2

i:{7}

o:{8,14}

i:{7,13}

3

o:{6}

8

13

i:{18}

19

o:{18}

i:{9}

20

14

9

o:{19}

o:{9}

4

15

5

o:{15}


3-HOP Segment Query

Segments on C1 w.r.t. reaching C2:

[1,3] o:{6}

[4,4] o:{9}

Segments on C4 w.r.t. being reached by C2:

[18,18] i:{7}

[19,20] i:{9}

…… (a total of O(nk) segments)

Can 2 reach 20?

2[1,3], which can reach 6

20 [19,20], which can be reached by 9

Since 69, the answer is Yes. Worst case complexity O(lognk+k)=O(logn+k)

C2

C1

C3

C4

1

16

6

10

o:{10}

17

11

7

i:{11}

12

18

2

i:{7}

o:{8,14}

i:{7,13}

3

o:{6}

8

13

i:{18}

19

o:{18}

i:{9}

20

14

9

o:{19}

o:{9}

4

15

5

o:{15}


Experimental Evaluation

  • Implementation in C++

  • 12 Synthetic datasets and 5 publicly available Real datasets

  • Synthetic datasets

    • 2k DAG with edge density = 2, 4, 6, 8, 10, 12

    • 10k DAG with edge density= 2,5,10,15,20,25

  • AMD Opteron 2.0GHz/ 8GB/ Linux


Experimental Result (Synthetic Data, 2k)


Experimental Result (Synthetic Data, 2k)


Experimental Result (Synthetic Data, 10k)


Real Datasets


Conclusion

  • A novel 3-hop index scheme is proposed to maximally compress transitive closure, and it can efficiently answer reachability query with an acceptable index time.

  • In the Journal version, we plan to show how to efficiently answer distance query via 3-hop.


Thanks!!!Questions?


  • Login