3-HOP:A High-Compression Indexing Scheme for Reachability Query

Download Presentation

3-HOP:A High-Compression Indexing Scheme for Reachability Query

Loading in 2 Seconds...

- 91 Views
- Uploaded on
- Presentation posted in: General

3-HOP:A High-Compression Indexing Scheme for Reachability Query

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

3-HOP:A High-Compression Indexing Scheme for Reachability Query

Yang Xiang

Kent State University

Joint work with Ruoming Jin (KSU), Ning Ruan (KSU), and Dave Fuhry (KSU)

The problem: Given two vertices u and v in

a directed graph G, is there a path from u to v ?

?Query(1,11)

Yes

?Query(3,9)

No

15

14

11

13

10

12

6

7

8

9

3

4

5

1

2

Directed Graph DAG (directed acyclic graph) by

coalescing the strongly connected components

- XML
- Biological networks
- Ontology
- Knowledge representation (Lattice operation)
- Object programming (Class relationship)
- Distributed systems (Reachable states)

- Existing work can be classified into two big categories:
- Using spanning structures, such as chains or trees.
- Using 2-hop strategy

- Major limitation:
- When graphs are denser, the size or the compressed transitive closure grows very large

9

5

1

Lin:{7}

Lout:{5}

10

6

2

Lin:{8}

Lout:{6}

Lin:{7}

7

11

3

Lout:{7}

Lin:{8}

8

12

4

Lout:{8}

3-Hop

Lout:{7}

9

5

1

Lout:{6,7}

Lin:{7}

Lout:{7}

Lout:{5,6,7}

Lout:{7}

10

6

2

Lin:{7}

Lout:{6,7}

Lout:{7}

Lin:{7,8}

7

11

3

Lout:{7}

Lin:{7}

Lin:{7}

8

12

4

Lout:{8}

Lin:{7,8}

Lin:{7}

2-Hop

C2

1

C1

C2

C3

C4

2

1

16

6

10

3

11

6

7

17

10

2

12

18

16

11

3

7

12

18

8

17

12

13

19

13

18

13

8

14

20

14

9

15

4

15

19

14

4

9

5

20

- VertexVertexVertex (2 hop)
- VertexChain(VertexVertex)Vertex (Initial motivation of 3-HOP)
- Chain(VertexVertex) Chain(VertexVertex) Chain(VertexVertex) (3 hop contour)
- Chain decomposition is a spanning structure of G
- Some special vertices in the graph are labeled by Lout (a subset of vertices it can reach) and/or Lin(a subset of vertices it can be reached from).
- Chain decomposition plus the set of Lout and Lin are all that we need to design efficient reachability answering schemes.

- Given a chain decomposition {C1,C2,…,Ck} of a DAG, how can we utilize 3-hop strategy to maximally compress the transitive closure and answer reachability queries efficiently?
- To answer the above question, we would first ask the following question: is it possible to compress the transitive closure by chain decomposition itself?

C1

C3

1

10

11

C1

C2

C3

C4

1

16

6

10

2

12

11

3

7

17

2

12

18

13

3

8

13

19

14

20

14

9

4

C3

15

4

10

11

12

13

14

15

5

15

1

1

1

1

1

1

1

5

1

1

1

1

2

1

1

1

1

3

C1

1

4

1

5

Contour Points

(110)

(312)

(515)

C1

C2

C3

C4

y

5

1

2

3

4

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

2

1

1

1

1

1

1

1

1

1

1

1

1

1

1

3

C1

1

1

1

1

1

1

4

1

5

1

6

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

7

1

1

1

C2

1

1

1

8

1

1

1

9

1

10

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

12

C3

1

1

1

1

1

1

1

1

13

1

1

1

1

1

14

15

1

16

1

1

1

1

1

1

1

1

1

17

1

1

1

1

1

1

1

1

C4

1

18

1

1

1

1

1

1

1

1

19

20

1

x

C1

C2

C3

C4

1

16

6

10

11

7

17

2

12

18

3

8

13

19

20

14

9

4

15

5

- It is not necessary to calculate transitive closure, which costs O(mn) time , for calculating contour points.
- Our algorithm calculates contour points in O(mklogn) time using a bottom up approach.

To

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

C3

C4

C2

C1

1

1

1

1

1

1

1

1

1

1

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

9

13

1

5

1

1

1

1

1

1

1

1

1

1

1

1

1

o:{6,11,15}

1

1

1

1

1

1

1

1

o:{6,11}

o:{6}

i:{2}

10

14

2

6

1

1

1

1

1

1

1

1

1

1

1

1

1

1

From

o:{11,15}

i:{2,7,12}

i:{2,7}

i:{2,7}

1

1

o:{11}

1

1

1

1

i:{2}

i:{2}

11

15

3

7

1

1

1

1

1

1

1

1

1

1

1

1

o:{15}

12

16

4

8

1

1

1

1

1

1

1

1

1

Label size: 12

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

C3

C4

C2

C1

1

1

1

1

1

1

1

1

1

1

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1

1

1

1

1

1

1

1

1

1

9

13

1

5

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

o:{6}

10

14

2

6

1

1

1

1

1

1

1

1

1

1

1

1

1

i:{7,12}

i:{7}

1

1

1

1

1

11

15

3

7

1

1

1

1

1

1

1

1

1

1

1

12

16

4

8

1

1

1

1

1

Label size: 4

1

Chain centered bipartite graph

C3

C4

C2

C1

C2

C1

9

13

1

5

1

5

o:{6}

10

14

2

6

2

6

i:{7}

i:{7}

11

15

3

7

7

11

15

12

16

4

8

8

12

16

Contour Points: (2,6), (2,11), (2,15), (7,11), (7,15), (12,15)

- Build chain-centered bipartite graphs.
- Among all the chain-centered bipartite graphs, find a bipartite subgraph such that the number uncovered contour points (edges) over the label size is maximum
- The above goal can be converted to finding the maximum densest subgraph.
- Now the key technical issue is how to quickly find the maximum densest subgraph.

2

3

3

Removing any vertex

with degree 2 or less

Removing any vertex

with degree 3 or less

3

3

The graph has rank 3

2

2

A vertex rank or a graph rank will never increase in our labeling (indexing) algorithm.

This is an important observation for designing a fast labeling algorithm.

Given a Chain Decomposition

- Step 1: Calculate Contour Points
- Step 2: Construct Chain Centered Bipartite Graphs
- Step 3: Keep doing the following until all Contour Points are covered:
- Find the densest subgraph among all Chain Centered Bipartite graphs (k times speed up is possible)
- Label vertices according to the selected densest subgraph, delete the subgraph from the corresponding chain centered bipartite graph, and update covering information

- The labeling size returned by 3HOP Contour is at most O(logn) times larger than the miminum 3-hop labeling size.
- The optimal 3-hop contour always has less labeling size than that of the optimal 2-hop.
- The worst case running time of 2-hop is O(n3|Tc|), while it is O(kn2|contour|) for 3-hop.

C2

C1

C3

C4

1

16

6

10

o:{10}

17

Can 2 reach 20?

2out: {6,15}

20 in: {9,13}

Since 69, the answer is Yes.

Worst case complexity: O(n)

11

7

i:{11}

12

18

2

i:{7}

o:{8,14}

i:{7,13}

3

o:{6}

8

13

i:{18}

19

o:{18}

i:{9}

20

14

9

o:{19}

o:{9}

4

15

5

o:{15}

Segments on C1 w.r.t. reaching C2:

[1,3] o:{6}

[4,4] o:{9}

Segments on C4 w.r.t. being reached by C2:

[18,18] i:{7}

[19,20] i:{9}

…… (a total of O(nk) segments)

Can 2 reach 20?

2[1,3], which can reach 6

20 [19,20], which can be reached by 9

Since 69, the answer is Yes. Worst case complexity O(lognk+k)=O(logn+k)

C2

C1

C3

C4

1

16

6

10

o:{10}

17

11

7

i:{11}

12

18

2

i:{7}

o:{8,14}

i:{7,13}

3

o:{6}

8

13

i:{18}

19

o:{18}

i:{9}

20

14

9

o:{19}

o:{9}

4

15

5

o:{15}

- Implementation in C++
- 12 Synthetic datasets and 5 publicly available Real datasets
- Synthetic datasets
- 2k DAG with edge density = 2, 4, 6, 8, 10, 12
- 10k DAG with edge density= 2,5,10,15,20,25

- AMD Opteron 2.0GHz/ 8GB/ Linux

- A novel 3-hop index scheme is proposed to maximally compress transitive closure, and it can efficiently answer reachability query with an acceptable index time.
- In the Journal version, we plan to show how to efficiently answer distance query via 3-hop.