the link prediction problem for social networks david libel nowell mit john klienberg cornell n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Saswat Mishra sxm111131 PowerPoint Presentation
Download Presentation
Saswat Mishra sxm111131

Loading in 2 Seconds...

play fullscreen
1 / 30

Saswat Mishra sxm111131 - PowerPoint PPT Presentation


  • 60 Views
  • Uploaded on

The Link Prediction Problem for Social Networks David Libel- Nowell , MIT John Klienberg , Cornell. Saswat Mishra sxm111131. Summary. The “Link Prediction Problem”

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Saswat Mishra sxm111131' - alan-joseph


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the link prediction problem for social networks david libel nowell mit john klienberg cornell

The Link Prediction Problem for Social NetworksDavid Libel-Nowell, MITJohn Klienberg, Cornell

Saswat Mishra sxm111131

summary
Summary
  • The “Link Prediction Problem”
  • Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future?
  • Based on “proximity” of nodes in a network
introduction
Introduction
  • Natural examples of social networks:

Nodes = people/entities

Edges = interaction/ collaboration

motivation
Motivation
  • Understanding how social networks evolve
  • The link prediction problem
  • Given a snapshot of a social network at time t, we seek to accurately predict the edges that will be added to the network during the interval (t, t’)

?

slide5
Why?
  • To suggest interactions or collaborations that haven’t yet been utilized within an organization
  • To monitor terrorist networks - to deduce possible interaction between terrorists (without direct evidence)
  • Used in Facebook and Linked In to suggest friends
  • Open Question: How does Facebook do it?

(friends of friends, same school, manually…)

motivation1
Motivation
  • Co-authorship network for scientists
  • Scientists who are “close” in the network will have common colleagues & circles – likely to collaborate

Caveat: Scientists who have never collaborated might in future - hard to predict

  • Goal: make that intuitive notion precise; understand which measures of “proximity” lead to accurate predictions

D

B

A

C

goals
Goals
  • Present measures of proximity
  • Understand relative effectiveness of network proximity measures (adapted from graph theory, CS, social sciences)
  • Prove that prediction by proximity outperforms random predictions by a factor of 40 to 50
  • Prove that subtle measures outperform more direct measures
data and experimental setup
Data and Experimental Setup
  • Co-authorship network (G) from “author list” of the physics e-Print arXiv (www.arxiv.org)
  • Took 5 such networks from 5 sections of the print

D

B

B

A

A

C

C

Training interval [1994,1996]

Ktraining = 3

Test interval [1997,1999]

Ktest = 3

Core: set of authors who have at least 3 papers during both training and test

G[1994,1996] = Gcollab = (A,Eold) Enew = new collaborations (edges)

methods for link prediction
Methods for Link Prediction
  • Take the input graph during training period Gcollab
  • Pick a pair of nodes (x, y)
  • Assign a connection weight score(x, y)
  • Make a list in descending order of score
  • score is a measure of proximity
  • Any ideas for measures?
graph distance common neighbors
Graph distance & Common Neighbors
  • Graph distance: (Negated) length of shortest path between x and y
  • Common Neighbors: A and C have 2 common neighbors, more likely to collaborate

E

D

B

A

C

E

D

B

A

C

jaccard s coefficient and adamic adar
Jaccard’s coefficient and Adamic / Adar
  • Jaccard’s coefficient: same as common neighbors, adjusted for degree
  • Adamic / Adar: weighting rarer neighbors more heavily

E

D

B

A

C

preferential attachment
Preferential Attachment
  • Probability that a new collaboration involves x is proportional to T(x), current neighbors of x
  • score (x, y) :=
considering all paths katz
Considering all paths: Katz
  • Katz: measure that sums over the collection of paths, exponentially damped by length (to count short paths heavily)
  • β is chosen to be a very small value (for dampening)

E

D

B

A

C

hitting time pagerank
Hitting time, PageRank
  • Hitting time: expected number of steps for a random walk starting at x to reach y
  • Commute time:
  • If y has a large stationary probability, Hx,y is small. To counterbalance, we can normalize
  • PageRank: to cut down on long random walks, walk can return to x with a probablity α at every step y
simrank
SimRank
  • Defined by this recursive definition: two nodes are similar to the extent that they are joined by similar neighbors
low rank approximation
Low-rank approximation
  • Treat the graph as an adjacency matrix
  • Compute the rank-k matrix Mk (noise-reduction)
  • x is a row, y is a row, score(x, y) = inner product of rows r(x) and r(y)
unseen bigrams and clustering
Unseen bigrams and Clustering
  • Unseen bigrams: Derived from language modeling
  • Estimating frequency of unseen bigrams – pairs of words (nodes here) that co-occur in a test corpus but not in the training corpus
  • Clustering: deleting tenuous edges in Gcollab through a clustering procedure and running predictors on the “cleaned-up” subgraph
results
Results
  • The results are presented as:
  • 1. Factor improvement of proposed predictors over
    • Random predictor
    • Graph distance predictor
    • Common neighbors predictor
  • 2. Relative performance vs. the above predictors
  • 3. Common Predictions
conclusions
Conclusions
  • No single clear winner
  • Many outperform the random predictor => there is useful information in the network topology
  • Katz + clustering + low-rank approximation perform significantly well
  • Some simple measures i.e. common neighbors and Adamic/ Adar perform well
critique
Critique
  • Even the best predictor (Katz on gr-qc) is correct on only 16% of predictions
  • How good is that?
  • Treat all collaborations equally. Perhaps, treating recent collaborations as more important than older ones will help?
references
References
  • Lada A. Adamic and Eytan Adar. Friends and neighbors on the web. Social Networks, 25(3):211{230, July 2003.
  • A. L. Barabasi, H. Jeong, Z. N eda, E. Rav asz, A. Schubert, and T. Vicsek. Evolution of the social network of scientist collaboration. Physica A, 311(3{4):590{614, 2002.
  • Sergey Brin and Lawrence Page. The anatomy of a large-scale hyper textual Web search engine Computer Networks and ISDN Systems, 30(1{7):107{117, 1998.
  • Rodrigo De Castro and Jerrold W. Grossman. F amous trails to Paul Erdos. Mathematical Intelligencer, 21(3):51{63, 1999.
question
Question

Question???