1 / 53

Clustering Social Networks

Clustering Social Networks. Isabelle Stanton, University of Virginia Master of Science Thesis Defense. Outline. Motivation Previous Work Finding Tightly Knit Clusters Finding Loosely Knit Clusters Group Recommendations Future Work. Motivation. Many large social networks:

lamya
Download Presentation

Clustering Social Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering Social Networks Isabelle Stanton, University of Virginia Master of Science Thesis Defense

  2. Outline • Motivation • Previous Work • Finding Tightly Knit Clusters • Finding Loosely Knit Clusters • Group Recommendations • Future Work

  3. Motivation • Many large social networks: • A fundamental problem is finding communities automatically • Viral and Targeted Marketing • Recommendation Engines

  4. Previous Work – Spectral Methods • Cuts the graph based on an eigenvector • Spectral Methods: • Kannan, Vempala, Vetta 2000, Spielman and Teng 1996, Shi and Malik 2000, Kempe and McSherry 2004, Karypis and Kumar 1998 and many others • cut = partitioning of all elements

  5. Communities in Social Networks • Disjoint partitionings are not good for social networks

  6. Objective: Internal Density,  Each vertex in C is adjacent to at least  fraction of (the rest of) C Examples: =1/2 =3/4 =1

  7. Objective: External Sparsity,  Each vertex outside of C is adjacent to at most  of C  <  =1/5, =1 =1

  8. (α, β)-Clusters • C is an (α, β)- cluster if: • Internally Dense: Every vertex in the cluster neighbors at least a β fraction of the cluster • Externally Sparse: Every vertex outside the cluster neighbors at most an α fraction of the cluster (1/4, 2/3) (1/4, 1)

  9. Contributions of this work • Definition of criterion • Combinatorial results • 3 overlap results • Bound on number of (α,1) clusters • Three algorithms for varying cases • Experiments validating assumptions on real social networks • Novel formulation of group recommendation problem with experiments

  10. Previous Work – (α, β)-clusters • Solved Areas: Our Contributions: 1 (1- ε,1) – Tsukiyama et al, Johnson et al. α = 0 – connected components α β > ½ + α/2 – Algorithm 1 and 2 0 α < β3 – Algorithm 3 1 0 β

  11. Outline • Motivation • Previous Work • Finding Tightly Knit Clusters • Finding Loosely Knit Clusters • Group Recommendations • Future Work

  12. Too Many Clusters.. n vertices MISSING edges drawn x1 y1 x2 y2 ... xn/2 yn/2 Problem:Every vertex in every cluster has as many neighbors outside the cluster as in it

  13. ρ-Champions Ben Stiller Gwenyth Paltrow Will Ferrell Vince Vaughn Wes Anderson Owen Wilson ρ-champion Steve Martin Bill Murray Anjelica Houston

  14. ρ-Champions • Def: A vertex is a ρ-champion of C if it has at most ρ|C| neighbors outside C • Claim: If ρ < 2β – 1 – α, every vertex can ρ-champion at most one cluster

  15. Outline • Motivation • Previous Work • Combinatorial properties • Finding Tightly Knit Clusters • Deterministic Algorithm • Finding Loosely Knit Clusters • Group Recommendations • Future Work

  16. Let c be a ρ-champion If v in C, then v and c share at least (2β -1)|C| neighbors If v is outside C then v and c share at most (ρ + α)|C| neighbors Intuition behind the Algorithm v α|C| β|C| v c ρ|C| β|C| (2β-1)|C| c

  17. Deterministic Algorithm • To find all clusters of size s: • for each c in V do • C←  • For each v within two steps of c do • If v and c share (2β – 1)s neighbors then add v to C • If C is an (α, β)-cluster then output C

  18. Algorithmic Guarantees • Claim: Our algorithm will find all clusters of size s where β > ½ + (ρ + α)/2 • Runs in O(d0.7n1.9+n2+o(1)) time where d is the average degree • d is a small constant for social networks so O(n2)

  19. Evaluation • Do ρ-champions exist in real graphs? • Tsukiyama’s algorithm finds all maximal cliques ((1-ε, 1)-clusters) in a graph • We compare our algorithm’s output with Tsukiyama’s ground truth

  20. Theory Co-Author Dataset Results • Found 797 of 854 clusters ~ 93%

  21. Outline • Motivation • Previous Work • Combinatorial properties • Finding Tightly Knit Clusters • Finding Loosely Knit Clusters • Technical Challenges • Randomized Algorithm • Group Recommendations • Future Work

  22. Loosely Knit Clusters • β≤ ½ • Technical Problem: (0, 1/2)

  23. Connectivity Assumption • Every subset of a cluster has an outside vertex in the cluster that neighbors more than a β-fraction Does satisfy assumption! (β = 2/7) Doesn’t satisfy assumption

  24. Loosely Knit Randomized Algorithm • α < β3 • Two phases • Phase 1: • Draw a sample of the ρ-champion’s neighbors • Sample neighbors to add to the seed • Stop when the seed is “big enough” • Phase 2: • Exploit connectivity assumption to deterministically grow the seed into the cluster

  25. Example • Phase 1: • Sample of the ρ-champion’s neighbors • Sample neighbors to add to the sample • Stop when the sample is “big enough” • Phase 2 • Deterministically grow cluster

  26. Example • Phase 1: • Sample of the ρ-champion’s neighbors • Sample neighbors to add to the sample • Stop when the sample is “big enough” • Phase 2 • Deterministically grow cluster

  27. Example • Phase 1: • Sample of the ρ-champion’s neighbors • Sample neighbors to add to the sample • Stop when the sample is “big enough” • Phase 2 • Deterministically grow cluster

  28. Example • Phase 1: • Sample of the ρ-champion’s neighbors • Sample neighbors to add to the sample • Stop when the sample is “big enough” • Phase 2 • Deterministically grow cluster

  29. Why does this work? • Random sampling guarantees the expected number of neighbors an outside vertex has with the seed is small • The connectivity assumption guarantees we’ll always make progress • Guarantees: Finds all clusters where α < β3 with probability 1 – δ • Runs in time O(n3/δ log(n/δ) |C|2)

  30. Outline • Motivation • Previous Work • Combinatorial properties • Finding Tightly Knit Clusters • Finding Loosely Knit Clusters • Group Recommendations • Future Work

  31. Group Recommendations • Clustering isn’t the end goal • What can we do with (α,β)-clusters? • We built a group recommendation engine powered by our clusters • Recommended groups to users of Orkut and LiveJournal

  32. Recommendation Model • Hofmann and Puzicha ‘99 5 5/60 25/60 20 2/3 25 60 people 3/4 1/3 15 10 1/2 10 20 people

  33. Previous Work • Kleinberg and Sandler: Given the Groups x User matrix, use matrix decomposition • Their code works ~ 100K variables max • No one uses the friendship graph or clusters!

  34. Experimental Setup • Hold out 10% of users with group memberships • Cluster the rest • Create recommendations for held out users based on clusters

  35. Results – LiveJournal Dataset Held out: 355,495 users – Succeeded on: 210,455

  36. Results – LiveJournal Dataset

  37. Conclusions • Defined (α, β)-clusters • Focus: Overlapping clusters • Introduced ρ-champions • Developed algorithms for a subset of the problem • Ran experiments to validate assumptions and show utility of the clusters • Introduced new interpretation of the recommendation model

  38. Future Work • Algorithms that reduce the necessary α-β gap • Relaxing ρ-champion restriction • Weighted and directed graphs • Decentralized algorithms • Streaming algorithms • Expanding work on group recommendations

  39. Citations • Clustering Social Networks, N. Mishra, R. Schreiber, I. Stanton and R. E. Tarjan, The 5th Workshop on Algorithms and Models for the Web-Graph, WAW2007. LNCS, vol 4863, pp. 56-67. • Clustering Social Networks,N.Mishra, R. Schreiber, I. Stanton and R. E. Tarjan, Journal of Internet Mathematics (under submission)

  40. NewKid Algorithm • Input: Graph, Groups, (α,β)-clusters • For each group g and cluster c: • P(g|c) = |members of c in g| / | members in c| • For each new kid, u: • P(c|u) = |friends of u in c| / |friends of u| • Recommend g that maximizes Σc p(g|c)P(c|u)

  41. Results – Orkut Dataset

  42. Results – Orkut Dataset

  43. HEP Co-Author Dataset Results • Found 115 of 126 clusters ~ 90%

  44. LiveJournal Dataset Results • Too big to run Tsukiyama. Found 4289 clusters, 876 have large ρ-champions

  45. Datasets • High Energy Physics Co-Authorship Graph • Theory Co-authorship graph • A subset of LiveJournal.com τ(v) = the neighbors and neighbors’ neighbors of v

  46. Randomized Algorithm • To find all (α, β)-clusters of size s: • for each c in V do: • Repeat k times: • Draw a random sample S of size t from c’s neighbors • C← S U {c} • For each v within two steps of c do • If v has (2β – 1)/ β t neighbors in S then add v to C • If C is an (α, β)-cluster then output C

  47. Randomized Algorithm • t = O( log(n / δ) ), k = O( n / δ ) • Guarantees: Finds all clusters where α < 2β – 1 with probability 1 – δ • Runs in time O(n3/δ log(n/δ) (log(n/δ)+|C|)) • Worst case: O(n4/δ log(n/δ)) • Average case: O(n2/δ log2(n/δ) d2)

  48. Combinatorial Properties - Overlaps • Let A and B be (α, β)-clusters with |A|=|B| • Theorem: A and B overlap by at most (1-(β-α))|A| vertices 1 0 0 1

  49. Combinatorial Properties - |Clusters| • Claim: There are at most (α,1)-clusters of size s in a graph • Bound is tight as α→ 1 and α = 0. Seems loose elsewhere • Proof is from Steiner Systems • 7 points, block size = 3, restriction = 2 • {1,2,4},{2,3,5},{3,4,6},{4,5,7},{1,5,6},{2,6,7},{1,3,7}

  50. Outline • Motivation • Previous Work • Combinatorial properties • Finding Tightly Knit Clusters • Finding Loosely Knit Clusters • Experiments and Group Recommendations • Are ρ-champions valid? • What are these clusters good for? • Future Work

More Related