3 hop a high compression indexing scheme for reachability query
This presentation is the property of its rightful owner.
Sponsored Links
1 / 28

3-HOP:A High-Compression Indexing Scheme for Reachability Query PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on
  • Presentation posted in: General

3-HOP:A High-Compression Indexing Scheme for Reachability Query. Yang Xiang Kent State University Joint work with Ruoming Jin (KSU), Ning Ruan (KSU), and Dave Fuhry (KSU). Reachability Query. The problem : Given two vertices u and v in

Download Presentation

3-HOP:A High-Compression Indexing Scheme for Reachability Query

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


3 hop a high compression indexing scheme for reachability query

3-HOP:A High-Compression Indexing Scheme for Reachability Query

Yang Xiang

Kent State University

Joint work with Ruoming Jin (KSU), Ning Ruan (KSU), and Dave Fuhry (KSU)


Reachability query

Reachability Query

The problem: Given two vertices u and v in

a directed graph G, is there a path from u to v ?

?Query(1,11)

Yes

?Query(3,9)

No

15

14

11

13

10

12

6

7

8

9

3

4

5

1

2

Directed Graph  DAG (directed acyclic graph) by

coalescing the strongly connected components


Applications

Applications

  • XML

  • Biological networks

  • Ontology

  • Knowledge representation (Lattice operation)

  • Object programming (Class relationship)

  • Distributed systems (Reachable states)


Prior work

Prior Work


Existing work classification and their limitation

Existing work classification and their limitation

  • Existing work can be classified into two big categories:

    • Using spanning structures, such as chains or trees.

    • Using 2-hop strategy

  • Major limitation:

    • When graphs are denser, the size or the compressed transitive closure grows very large


Intuition of 3 hop

9

5

1

Lin:{7}

Lout:{5}

10

6

2

Lin:{8}

Lout:{6}

Lin:{7}

7

11

3

Lout:{7}

Lin:{8}

8

12

4

Lout:{8}

3-Hop

Intuition of 3-Hop

Lout:{7}

9

5

1

Lout:{6,7}

Lin:{7}

Lout:{7}

Lout:{5,6,7}

Lout:{7}

10

6

2

Lin:{7}

Lout:{6,7}

Lout:{7}

Lin:{7,8}

7

11

3

Lout:{7}

Lin:{7}

Lin:{7}

8

12

4

Lout:{8}

Lin:{7,8}

Lin:{7}

2-Hop


Chain decomposition for 3 hop

C2

1

C1

C2

C3

C4

2

1

16

6

10

3

11

6

7

17

10

2

12

18

16

11

3

7

12

18

8

17

12

13

19

13

18

13

8

14

20

14

9

15

4

15

19

14

4

9

5

20

Chain Decomposition for 3-hop


Overview of 3 hop

Overview of 3-HOP

  • VertexVertexVertex (2 hop)

  • VertexChain(VertexVertex)Vertex (Initial motivation of 3-HOP)

  • Chain(VertexVertex) Chain(VertexVertex) Chain(VertexVertex) (3 hop contour)

  • Chain decomposition is a spanning structure of G

  • Some special vertices in the graph are labeled by Lout (a subset of vertices it can reach) and/or Lin(a subset of vertices it can be reached from).

  • Chain decomposition plus the set of Lout and Lin are all that we need to design efficient reachability answering schemes.


Key problem

Key Problem

  • Given a chain decomposition {C1,C2,…,Ck} of a DAG, how can we utilize 3-hop strategy to maximally compress the transitive closure and answer reachability queries efficiently?

    • To answer the above question, we would first ask the following question: is it possible to compress the transitive closure by chain decomposition itself?


Essential information between two chains

C1

C3

1

10

11

C1

C2

C3

C4

1

16

6

10

2

12

11

3

7

17

2

12

18

13

3

8

13

19

14

20

14

9

4

C3

15

4

10

11

12

13

14

15

5

15

1

1

1

1

1

1

1

5

1

1

1

1

2

1

1

1

1

3

C1

1

4

1

5

Essential Information Between Two Chains

Contour Points

(110)

(312)

(515)


Pseudo upper triangular submatrix and pseudo diagonal

C1

C2

C3

C4

y

5

1

2

3

4

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

2

1

1

1

1

1

1

1

1

1

1

1

1

1

1

3

C1

1

1

1

1

1

1

4

1

5

1

6

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

7

1

1

1

C2

1

1

1

8

1

1

1

9

1

10

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

12

C3

1

1

1

1

1

1

1

1

13

1

1

1

1

1

14

15

1

16

1

1

1

1

1

1

1

1

1

17

1

1

1

1

1

1

1

1

C4

1

18

1

1

1

1

1

1

1

1

19

20

1

x

Pseudo-upper Triangular Submatrix and Pseudo-diagonal

C1

C2

C3

C4

1

16

6

10

11

7

17

2

12

18

3

8

13

19

20

14

9

4

15

5


Calculate contour points

Calculate Contour Points

  • It is not necessary to calculate transitive closure, which costs O(mn) time , for calculating contour points.

  • Our algorithm calculates contour points in O(mklogn) time using a bottom up approach.


3 hop labeling by contour points

3-Hop Labeling by Contour Points

To

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

C3

C4

C2

C1

1

1

1

1

1

1

1

1

1

1

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

9

13

1

5

1

1

1

1

1

1

1

1

1

1

1

1

1

o:{6,11,15}

1

1

1

1

1

1

1

1

o:{6,11}

o:{6}

i:{2}

10

14

2

6

1

1

1

1

1

1

1

1

1

1

1

1

1

1

From

o:{11,15}

i:{2,7,12}

i:{2,7}

i:{2,7}

1

1

o:{11}

1

1

1

1

i:{2}

i:{2}

11

15

3

7

1

1

1

1

1

1

1

1

1

1

1

1

o:{15}

12

16

4

8

1

1

1

1

1

1

1

1

1

Label size: 12

1


3 hop labeling by covering contour points

3-Hop Labeling by Covering Contour Points

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

C3

C4

C2

C1

1

1

1

1

1

1

1

1

1

1

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1

1

1

1

1

1

1

1

1

1

9

13

1

5

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

o:{6}

10

14

2

6

1

1

1

1

1

1

1

1

1

1

1

1

1

i:{7,12}

i:{7}

1

1

1

1

1

11

15

3

7

1

1

1

1

1

1

1

1

1

1

1

12

16

4

8

1

1

1

1

1

Label size: 4

1


How to find the minimum 3 hop labeling

How to find the minimum 3-hop labeling?

Chain centered bipartite graph

C3

C4

C2

C1

C2

C1

9

13

1

5

1

5

o:{6}

10

14

2

6

2

6

i:{7}

i:{7}

11

15

3

7

7

11

15

12

16

4

8

8

12

16

Contour Points: (2,6), (2,11), (2,15), (7,11), (7,15), (12,15)


Finding minimum 3 hop labeling

Finding minimum 3-hop labeling

  • Build chain-centered bipartite graphs.

  • Among all the chain-centered bipartite graphs, find a bipartite subgraph such that the number uncovered contour points (edges) over the label size is maximum

  • The above goal can be converted to finding the maximum densest subgraph.

  • Now the key technical issue is how to quickly find the maximum densest subgraph.


A quick 2 approximation algorithm for finding the maximum densest subgraph

A quick 2-approximation algorithm for finding the maximum densest subgraph

2

3

3

Removing any vertex

with degree 2 or less

Removing any vertex

with degree 3 or less

3

3

The graph has rank 3

2

2

A vertex rank or a graph rank will never increase in our labeling (indexing) algorithm.

This is an important observation for designing a fast labeling algorithm.


3hop contour algorithm in general

3HOP Contour Algorithm in General

Given a Chain Decomposition

  • Step 1: Calculate Contour Points

  • Step 2: Construct Chain Centered Bipartite Graphs

  • Step 3: Keep doing the following until all Contour Points are covered:

    • Find the densest subgraph among all Chain Centered Bipartite graphs (k times speed up is possible)

    • Label vertices according to the selected densest subgraph, delete the subgraph from the corresponding chain centered bipartite graph, and update covering information


Theoretical analysis

Theoretical Analysis

  • The labeling size returned by 3HOP Contour is at most O(logn) times larger than the miminum 3-hop labeling size.

  • The optimal 3-hop contour always has less labeling size than that of the optimal 2-hop.

  • The worst case running time of 2-hop is O(n3|Tc|), while it is O(kn2|contour|) for 3-hop.


3 hop contour query

3-HOP Contour Query

C2

C1

C3

C4

1

16

6

10

o:{10}

17

Can 2 reach 20?

2out: {6,15}

20 in: {9,13}

Since 69, the answer is Yes.

Worst case complexity: O(n)

11

7

i:{11}

12

18

2

i:{7}

o:{8,14}

i:{7,13}

3

o:{6}

8

13

i:{18}

19

o:{18}

i:{9}

20

14

9

o:{19}

o:{9}

4

15

5

o:{15}


3 hop segment query

3-HOP Segment Query

Segments on C1 w.r.t. reaching C2:

[1,3] o:{6}

[4,4] o:{9}

Segments on C4 w.r.t. being reached by C2:

[18,18] i:{7}

[19,20] i:{9}

…… (a total of O(nk) segments)

Can 2 reach 20?

2[1,3], which can reach 6

20 [19,20], which can be reached by 9

Since 69, the answer is Yes. Worst case complexity O(lognk+k)=O(logn+k)

C2

C1

C3

C4

1

16

6

10

o:{10}

17

11

7

i:{11}

12

18

2

i:{7}

o:{8,14}

i:{7,13}

3

o:{6}

8

13

i:{18}

19

o:{18}

i:{9}

20

14

9

o:{19}

o:{9}

4

15

5

o:{15}


Experimental evaluation

Experimental Evaluation

  • Implementation in C++

  • 12 Synthetic datasets and 5 publicly available Real datasets

  • Synthetic datasets

    • 2k DAG with edge density = 2, 4, 6, 8, 10, 12

    • 10k DAG with edge density= 2,5,10,15,20,25

  • AMD Opteron 2.0GHz/ 8GB/ Linux


Experimental result synthetic data 2k

Experimental Result (Synthetic Data, 2k)


Experimental result synthetic data 2k1

Experimental Result (Synthetic Data, 2k)


Experimental result synthetic data 10k

Experimental Result (Synthetic Data, 10k)


Real datasets

Real Datasets


Conclusion

Conclusion

  • A novel 3-hop index scheme is proposed to maximally compress transitive closure, and it can efficiently answer reachability query with an acceptable index time.

  • In the Journal version, we plan to show how to efficiently answer distance query via 3-hop.


Thanks questions

Thanks!!!Questions?


  • Login