Analysis of Social Media MLD 10-802, LTI 11-772

Analysis of Social MediaMLD 10-802, LTI 11-772 William Cohen 2-15-11

The “force” on nodes in a graph • Suppose every node has a value (IQ, income,..) y(i) • Each node ihas value yi… • and neighbors N(i), degree di • If i,jconnected then j exerts a force -K*[yi-yj] on i • Total: • Matrix notation: F = -K(D-A)y - the Laplacian • Interesting (?) goal: set y so (D-A)y = c*y • Picture: neighbors pull iup or down, but net force doesn’t change relative positions of nodes

Spectral Clustering: Graph = Matrix How do I pick y to be an eigenvector for a block-stochastic matrix?

Spectral Clustering: Graph = MatrixW*v1 = v2 “propogates weights from neighbors” e2 0.4 0.2 x x x x x x x x x x x 0.0 x -0.2 y z y y z e3 z z -0.4 y z z z z z z z y e1 e2 -0.4 -0.2 0 0.2 [Shi & Meila, 2002] M

Another way the Laplacian comes up: it defines a cost formula for y where y assigned nodes to + or – classes so as to keep connected nodes in the same class. • Turns out: to minimize yT X y / (yTy) find smallest eigenvector of X • But: this will not be +1/-1, so it’s a “relaxed” solution

Some more terms • If A is an adjacency matrix (maybe weighted) and D is a (diagonal) matrix giving the degree of each node • Then D-A is the (unnormalized) Laplacian • W=AD-1 is a probabilistic adjacency matrix • I-W is the (normalized or random-walk) Laplacian • etc…. • The largest eigenvectors of W correspond to the smallest eigenvectors of I-W • So sometimes people talk about “bottom eigenvectors of the Laplacian”

A W K-nn graph (easy) A Fully connected graph, weighted by distance W

Spectral Clustering: Graph = MatrixW*v1 = v2 “propogates weights from neighbors” e2 0.4 0.2 x x x x x x x x x x x 0.0 x -0.2 y z y y z e3 z z -0.4 y z z z z z z z y e1 e2 -0.4 -0.2 0 0.2 [Shi & Meila, 2002] M

Spectral Clustering: Graph = MatrixW*v1 = v2 “propogates weights from neighbors” • If Wis connected but roughly block diagonal with k blocks then • the top eigenvector is a constant vector • the next k eigenvectors are roughly piecewise constant with “pieces” corresponding to blocks M

Spectral Clustering: Graph = MatrixW*v1 = v2 “propogates weights from neighbors” • If W is connected but roughly block diagonal with k blocks then • the “top” eigenvector is a constant vector • the next k eigenvectors are roughly piecewise constant with “pieces” corresponding to blocks • Spectral clustering: • Find the top k+1 eigenvectors v1,…,vk+1 • Discard the “top” one • Replace every node a with k-dimensional vector xa= <v2(a),…,vk+1 (a) > • Cluster with k-means M

Experimental results: best-case assignment of class labels to clusters Eigenvectors of W Eigenvecs of variant of W

Analysis of Social Media MLD 10-802, LTI 11-772