Fast random walk with restart and its applications
Download
1 / 44

Fast Random Walk with Restart and Its Applications - PowerPoint PPT Presentation


  • 191 Views
  • Uploaded on

Fast Random Walk with Restart and Its Applications. Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan. ICDM 2006 Dec. 18-22, HongKong. Motivating Questions. Q: How to measure the relevance?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Fast Random Walk with Restart and Its Applications' - columbia-adriel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Fast random walk with restart and its applications

Fast Random Walk with Restart and Its Applications

Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan

ICDM 2006 Dec. 18-22, HongKong


Motivating questions
Motivating Questions

  • Q: How to measure the relevance?

  • A: Random walk with restart

  • Q: How to do it efficiently?

  • A: This talk tries to answer!


Random walk with restart

10

9

12

2

8

1

11

3

4

6

5

7

Random walk with restart


Random walk with restart1

0.03

0.04

10

9

0.10

12

2

0.08

0.02

0.13

8

1

0.13

11

3

0.04

4

0.05

6

5

0.13

7

0.05

Random walk with restart

Ranking vector


Automatic image caption
Automatic Image Caption

Region

  • [Pan KDD04]

Image

Test Image

Text

Jet Plane Runway

Candy

Texture

Background



Center piece subgraph
Center-Piece Subgraph

  • [Tong KDD06]


Other applications
Other Applications

  • Content-based Image Retrieval

  • Personalized PageRank

  • Anomaly Detection (for node; link)

  • Link Prediction [Getoor], [Jensen], …

  • Semi-supervised Learning

  • ….

  • [Put Authors]


Roadmap
Roadmap

  • Background

    • RWR: Definitions

    • RWR: Algorithms

  • Basic Idea

  • FastRWR

    • Pre-Compute Stage

    • On-Line Stage

  • Experimental Results

  • Conclusion


Computing rwr

10

9

12

2

8

1

11

3

4

6

5

7

Computing RWR

starting vector

Ranking vector

Adjacent matrix

1

n x 1

n x n

n x 1

Q: Given ei, how to solve?


Onthefly

10

9

12

2

8

1

11

3

0.04

0.03

10

9

0.10

12

4

0.13

0.08

2

0.02

8

1

11

0.13

3

6

0.04

5

4

0.05

6

5

0.13

7

7

0.05

OntheFly:

No pre-computation/ light storage

Slow on-line response

O(mE)


Precompute

10

9

12

2

8

1

11

3

0.04

0.03

10

9

0.10

12

4

0.13

0.08

2

0.02

8

1

11

0.13

3

6

0.04

5

4

0.05

6

5

0.13

7

7

0.05

PreCompute:

Fast on-line response

Heavy pre-computation/storage cost

O(n^3)

O(n^2)


Q how to balance
Q: How to Balance?

On-line

Off-line


Roadmap1
Roadmap

  • Background

    • RWR: Definitions

    • RWR: Algorithms

  • Basic Idea

  • FastRWR

    • Pre-Compute Stage

    • On-Line Stage

  • Experimental Results

  • Conclusion


Basic idea

10

10

9

9

12

12

2

2

8

8

1

11

11

3

1

3

4

4

6

6

5

5

7

7

10

9

12

2

8

1

11

3

0.04

0.03

10

9

0.10

12

4

0.13

0.08

2

0.02

8

1

11

0.13

3

6

0.04

5

4

0.05

6

5

0.13

7

7

0.05

Basic Idea

Find Community

Combine

Fix the remaining


Basic idea pre computational stage
Basic Idea: Pre-computational stage

  • A few small, instead of ONE BIG, matrices inversions

Q-matrices

Link matrices

V

U

+


Basic idea on line stage
Basic Idea: On-Line Stage

  • A few, instead of MANY, matrix-vector multiplication

V

+

+

U

Query

Result


Roadmap2
Roadmap

  • Background

  • Basic Idea

  • FastRWR

    • Pre-Compute Stage

    • On-Line Stage

  • Experimental Results

  • Conclusion


Pre compute stage
Pre-compute Stage

  • p1: B_Lin Decomposition

    • P1.1 partition

    • P1.2 low-rank approximation

  • p2: Q matrices

    • P2.1 computing (for each partition)

    • P2.2 computing (for concept space)


P1 1 partition

10

9

12

2

8

1

11

3

4

6

5

7

P1.1: partition

10

9

12

2

8

1

11

3

4

6

5

7

Within-partition links

cross-partition links


P1 1 block diagonal

10

9

12

2

8

1

11

3

4

6

5

7

P1.1: block-diagonal

10

9

12

2

8

1

11

3

4

6

5

7


P1 2 lra for

10

9

12

2

8

1

11

3

4

6

5

7

P1.2: LRA for

10

9

12

2

8

1

11

3

4

6

5

7

S

V

U


10

9

12

c2

2

8

1

c1

11

3

10

4

c4

9

6

12

5

2

8

7

1

11

3

4

6

5

7

c3

S

V

+

U



Comparing and
Comparing and

  • Computing Time

    • 100,000 nodes; 100 partitions

    • Computing 100,00x is Faster!

  • Storage Cost (100x saving!)


P2 2 computing

10

9

12

2

8

1

11

3

4

6

5

7

p2.2 Computing:

-1

_

U

=

V


We have
We have:

Link matrices

Q-matricies

V

U

SM Lemma says:


Roadmap3
Roadmap

  • Background

  • Basic Idea

  • FastRWR

    • Pre-Compute Stage

    • On-Line Stage

  • Experimental Results

  • Conclusion


On line stage

V

+

U

On-Line Stage

  • Q

?

+

Query

Result

  • A (SM lemma)


On line query stage

q1:

q2:

q3:

q4:

q5:

q6:

On-Line Query Stage


q1: Find the community

q6: Combine

(1-c)

c

q2-q5: Compensate out-community Links

+


Example

10

9

12

2

8

1

11

3

4

6

5

7

Example

  • We have

V

+

U

  • we want to:


Q1 find community

2

1

3

4

10

9

12

2

8

1

11

3

4

6

5

7

q1:Find Community

q1:


Q2 q5 out community

2

1

3

4

q3:

q2:

q4:

10

9

q2-q5: out-community

12

8

11

6

5

7


Q6 combination

10

9

12

2

8

1

11

3

4

6

5

7

0.04

0.03

10

9

0.10

12

0.13

0.08

2

0.02

8

1

11

0.13

3

0.04

4

0.05

6

5

0.13

7

0.05

q6: Combination

+

0.9

0.1 =

q6:


Roadmap4
Roadmap

  • Background

  • Basic Idea

  • FastRWR

    • Pre-Compute Stage

    • On-Line Stage

  • Experimental Results

  • Conclusion


Experimental setup
Experimental Setup

  • Dataset

    • DBLP/authorship

    • Author-Paper

    • 315k nodes

    • 1,800k edges

  • Quality: Relative Accuracy

  • Application: Center-Piece Subgraph


Query time vs pre compute time
Query Time vs. Pre-Compute Time

Log Query Time

Log Pre-compute Time


Query time vs pre storage
Query Time vs. Pre-Storage

Log Query Time

Log Storage


Roadmap5
Roadmap

  • Background

  • Basic Idea

  • FastRWR

    • Pre-Compute Stage

    • On-Line Stage

  • Experimental Results

  • Conclusion


Conclusion
Conclusion

  • FastRWR

    • Reasonable quality preservation (90%+)

    • 150x speed-up: query time

    • Orders of magnitude saving: pre-compute & storage

  • More in the paper

    • The variant of FastRWR and theoretic justification

    • Implementation details

      • normalization, low-rank approximation, sparse

    • More experiments

      • Other datasets, other applications


Q&A

Thank you!

[email protected]

www.cs.cmu.edu/~htong


Future work
Future work

  • Incremental FastRWR

  • Paralell FastRWR

    • Partition

    • Q-matraces for each partition

  • Hierarchical FastRWR

    • How to compute one Q-matrix for


Possible q
Possible Q?

  • Why RWR?


ad