3 hop a high compression indexing scheme for reachability query
Download
1 / 28

3-HOP:A High-Compression Indexing Scheme for Reachability Query - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

3-HOP:A High-Compression Indexing Scheme for Reachability Query. Yang Xiang Kent State University Joint work with Ruoming Jin (KSU), Ning Ruan (KSU), and Dave Fuhry (KSU). Reachability Query. The problem : Given two vertices u and v in

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' 3-HOP:A High-Compression Indexing Scheme for Reachability Query' - mei


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
3 hop a high compression indexing scheme for reachability query

3-HOP:A High-Compression Indexing Scheme for Reachability Query

Yang Xiang

Kent State University

Joint work with Ruoming Jin (KSU), Ning Ruan (KSU), and Dave Fuhry (KSU)


Reachability query
Reachability Query Query

The problem: Given two vertices u and v in

a directed graph G, is there a path from u to v ?

?Query(1,11)

Yes

?Query(3,9)

No

15

14

11

13

10

12

6

7

8

9

3

4

5

1

2

Directed Graph  DAG (directed acyclic graph) by

coalescing the strongly connected components


Applications
Applications Query

  • XML

  • Biological networks

  • Ontology

  • Knowledge representation (Lattice operation)

  • Object programming (Class relationship)

  • Distributed systems (Reachable states)


Prior work
Prior Work Query


Existing work classification and their limitation
Existing work classification and their limitation Query

  • Existing work can be classified into two big categories:

    • Using spanning structures, such as chains or trees.

    • Using 2-hop strategy

  • Major limitation:

    • When graphs are denser, the size or the compressed transitive closure grows very large


Intuition of 3 hop

9 Query

5

1

Lin:{7}

Lout:{5}

10

6

2

Lin:{8}

Lout:{6}

Lin:{7}

7

11

3

Lout:{7}

Lin:{8}

8

12

4

Lout:{8}

3-Hop

Intuition of 3-Hop

Lout:{7}

9

5

1

Lout:{6,7}

Lin:{7}

Lout:{7}

Lout:{5,6,7}

Lout:{7}

10

6

2

Lin:{7}

Lout:{6,7}

Lout:{7}

Lin:{7,8}

7

11

3

Lout:{7}

Lin:{7}

Lin:{7}

8

12

4

Lout:{8}

Lin:{7,8}

Lin:{7}

2-Hop


Chain decomposition for 3 hop

C Query2

1

C1

C2

C3

C4

2

1

16

6

10

3

11

6

7

17

10

2

12

18

16

11

3

7

12

18

8

17

12

13

19

13

18

13

8

14

20

14

9

15

4

15

19

14

4

9

5

20

Chain Decomposition for 3-hop


Overview of 3 hop
Overview of 3-HOP Query

  • VertexVertexVertex (2 hop)

  • VertexChain(VertexVertex)Vertex (Initial motivation of 3-HOP)

  • Chain(VertexVertex) Chain(VertexVertex) Chain(VertexVertex) (3 hop contour)

  • Chain decomposition is a spanning structure of G

  • Some special vertices in the graph are labeled by Lout (a subset of vertices it can reach) and/or Lin(a subset of vertices it can be reached from).

  • Chain decomposition plus the set of Lout and Lin are all that we need to design efficient reachability answering schemes.


Key problem
Key Problem Query

  • Given a chain decomposition {C1,C2,…,Ck} of a DAG, how can we utilize 3-hop strategy to maximally compress the transitive closure and answer reachability queries efficiently?

    • To answer the above question, we would first ask the following question: is it possible to compress the transitive closure by chain decomposition itself?


Essential information between two chains

C Query1

C3

1

10

11

C1

C2

C3

C4

1

16

6

10

2

12

11

3

7

17

2

12

18

13

3

8

13

19

14

20

14

9

4

C3

15

4

10

11

12

13

14

15

5

15

1

1

1

1

1

1

1

5

1

1

1

1

2

1

1

1

1

3

C1

1

4

1

5

Essential Information Between Two Chains

Contour Points

(110)

(312)

(515)


Pseudo upper triangular submatrix and pseudo diagonal

C Query1

C2

C3

C4

y

5

1

2

3

4

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

2

1

1

1

1

1

1

1

1

1

1

1

1

1

1

3

C1

1

1

1

1

1

1

4

1

5

1

6

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

7

1

1

1

C2

1

1

1

8

1

1

1

9

1

10

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

12

C3

1

1

1

1

1

1

1

1

13

1

1

1

1

1

14

15

1

16

1

1

1

1

1

1

1

1

1

17

1

1

1

1

1

1

1

1

C4

1

18

1

1

1

1

1

1

1

1

19

20

1

x

Pseudo-upper Triangular Submatrix and Pseudo-diagonal

C1

C2

C3

C4

1

16

6

10

11

7

17

2

12

18

3

8

13

19

20

14

9

4

15

5


Calculate contour points
Calculate Contour Points Query

  • It is not necessary to calculate transitive closure, which costs O(mn) time , for calculating contour points.

  • Our algorithm calculates contour points in O(mklogn) time using a bottom up approach.


3 hop labeling by contour points
3-Hop Labeling by Contour Points Query

To

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

C3

C4

C2

C1

1

1

1

1

1

1

1

1

1

1

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

9

13

1

5

1

1

1

1

1

1

1

1

1

1

1

1

1

o:{6,11,15}

1

1

1

1

1

1

1

1

o:{6,11}

o:{6}

i:{2}

10

14

2

6

1

1

1

1

1

1

1

1

1

1

1

1

1

1

From

o:{11,15}

i:{2,7,12}

i:{2,7}

i:{2,7}

1

1

o:{11}

1

1

1

1

i:{2}

i:{2}

11

15

3

7

1

1

1

1

1

1

1

1

1

1

1

1

o:{15}

12

16

4

8

1

1

1

1

1

1

1

1

1

Label size: 12

1


3 hop labeling by covering contour points
3-Hop Labeling by QueryCovering Contour Points

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

C3

C4

C2

C1

1

1

1

1

1

1

1

1

1

1

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1

1

1

1

1

1

1

1

1

1

9

13

1

5

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

o:{6}

10

14

2

6

1

1

1

1

1

1

1

1

1

1

1

1

1

i:{7,12}

i:{7}

1

1

1

1

1

11

15

3

7

1

1

1

1

1

1

1

1

1

1

1

12

16

4

8

1

1

1

1

1

Label size: 4

1


How to find the minimum 3 hop labeling
How to find the minimum 3-hop labeling? Query

Chain centered bipartite graph

C3

C4

C2

C1

C2

C1

9

13

1

5

1

5

o:{6}

10

14

2

6

2

6

i:{7}

i:{7}

11

15

3

7

7

11

15

12

16

4

8

8

12

16

Contour Points: (2,6), (2,11), (2,15), (7,11), (7,15), (12,15)


Finding minimum 3 hop labeling
Finding minimum 3-hop labeling Query

  • Build chain-centered bipartite graphs.

  • Among all the chain-centered bipartite graphs, find a bipartite subgraph such that the number uncovered contour points (edges) over the label size is maximum

  • The above goal can be converted to finding the maximum densest subgraph.

  • Now the key technical issue is how to quickly find the maximum densest subgraph.


A quick 2 approximation algorithm for finding the maximum densest subgraph
A quick 2-approximation algorithm for finding the maximum densest subgraph

2

3

3

Removing any vertex

with degree 2 or less

Removing any vertex

with degree 3 or less

3

3

The graph has rank 3

2

2

A vertex rank or a graph rank will never increase in our labeling (indexing) algorithm.

This is an important observation for designing a fast labeling algorithm.


3hop contour algorithm in general
3HOP Contour Algorithm in General densest subgraph

Given a Chain Decomposition

  • Step 1: Calculate Contour Points

  • Step 2: Construct Chain Centered Bipartite Graphs

  • Step 3: Keep doing the following until all Contour Points are covered:

    • Find the densest subgraph among all Chain Centered Bipartite graphs (k times speed up is possible)

    • Label vertices according to the selected densest subgraph, delete the subgraph from the corresponding chain centered bipartite graph, and update covering information


Theoretical analysis
Theoretical Analysis densest subgraph

  • The labeling size returned by 3HOP Contour is at most O(logn) times larger than the miminum 3-hop labeling size.

  • The optimal 3-hop contour always has less labeling size than that of the optimal 2-hop.

  • The worst case running time of 2-hop is O(n3|Tc|), while it is O(kn2|contour|) for 3-hop.


3 hop contour query
3-HOP Contour Query densest subgraph

C2

C1

C3

C4

1

16

6

10

o:{10}

17

Can 2 reach 20?

2out: {6,15}

20 in: {9,13}

Since 69, the answer is Yes.

Worst case complexity: O(n)

11

7

i:{11}

12

18

2

i:{7}

o:{8,14}

i:{7,13}

3

o:{6}

8

13

i:{18}

19

o:{18}

i:{9}

20

14

9

o:{19}

o:{9}

4

15

5

o:{15}


3 hop segment query
3-HOP Segment Query densest subgraph

Segments on C1 w.r.t. reaching C2:

[1,3] o:{6}

[4,4] o:{9}

Segments on C4 w.r.t. being reached by C2:

[18,18] i:{7}

[19,20] i:{9}

…… (a total of O(nk) segments)

Can 2 reach 20?

2[1,3], which can reach 6

20 [19,20], which can be reached by 9

Since 69, the answer is Yes. Worst case complexity O(lognk+k)=O(logn+k)

C2

C1

C3

C4

1

16

6

10

o:{10}

17

11

7

i:{11}

12

18

2

i:{7}

o:{8,14}

i:{7,13}

3

o:{6}

8

13

i:{18}

19

o:{18}

i:{9}

20

14

9

o:{19}

o:{9}

4

15

5

o:{15}


Experimental evaluation
Experimental Evaluation densest subgraph

  • Implementation in C++

  • 12 Synthetic datasets and 5 publicly available Real datasets

  • Synthetic datasets

    • 2k DAG with edge density = 2, 4, 6, 8, 10, 12

    • 10k DAG with edge density= 2,5,10,15,20,25

  • AMD Opteron 2.0GHz/ 8GB/ Linux





Real datasets
Real Datasets densest subgraph


Conclusion
Conclusion densest subgraph

  • A novel 3-hop index scheme is proposed to maximally compress transitive closure, and it can efficiently answer reachability query with an acceptable index time.

  • In the Journal version, we plan to show how to efficiently answer distance query via 3-hop.


Thanks questions
Thanks!!! densest subgraphQuestions?


ad