1 / 74

Analysis of Large Graphs Community Detection

Analysis of Large Graphs Community Detection. By : KIM HYEONGCHEOL WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ SHANG XINDI HE RUIDAN. Overview. Introduction & Motivation Graph cut criterion Min-cut Normalized-cut Non-overlapping community detection Spectral clustering

jwagner
Download Presentation

Analysis of Large Graphs Community Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Large Graphs Community Detection By: KIM HYEONGCHEOL WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ SHANG XINDI HE RUIDAN

  2. Overview • Introduction & Motivation • Graph cut criterion • Min-cut • Normalized-cut • Non-overlapping community detection • Spectral clustering • Deep auto-encoder • Overlapping community detection • BigCLAMalgorithm

  3. Introduction • Objective Intro to Analysis of Large Graphs KIM HYEONG CHEOL

  4. Introduction • What is the graph? • Definition • An ordered pair G = (V, E) • A set V of vertices • A set E of edges • A line of connection between two vertices • 2-elements subsets of V • Types • Undirected graph, directed graph, mixed graph, multigraph, weighted graph and so on

  5. Introduction • Undirected graph • Edges have no orientation • Edge (x,y) = Edge (y,x) • The maximum number of edges : n(n-1)/2 • All pair of vertices are connected to each other • Undirected graph G = (V, E) • V : {1,2,3,4,5,6} • E : {E(1,2), E(2,3), E(1,5), E(2,5), E(4,5) • E(3,4), E(4,6)}

  6. Introduction • The undirected large graph E.g) Social graph Graph of Harry potter fanfiction Adapted from http://colah.github.io/posts/2014-07-FFN-Graphs-Vis/ A sampled user email-connectivity graph : http://research.microsoft.com/en-us/projects/S-GPS/

  7. Introduction • The undirected large graph E.g) Social graph Graph of Harry potter fanfiction Q : What do these large graphs present? Adapted from http://colah.github.io/posts/2014-07-FFN-Graphs-Vis/ A sampled user email-connectivity graph : http://research.microsoft.com/en-us/projects/S-GPS/

  8. Motivation • Social graph : How can you feel? VS A sampled user email-connectivity graph : http://research.microsoft.com/en-us/projects/S-GPS/

  9. Motivation • Graph of Harry potter fanfiction : How can you feel? VS Adapted from http://colah.github.io/posts/2014-07-FFN-Graphs-Vis/

  10. Motivation • If we can partition, we can use it for analysis of graph as below

  11. Motivation • Graph partition & community detection

  12. Motivation • Graph partition & community detection

  13. Motivation • Graph partition & community detection Partition Community

  14. Motivation • Graph partition & community detection Partition Community Q : How can we find the partitions?

  15. Minimum-cut • Normalized-cut Criterion : Graph partitioning KIM HYEONG CHEOL

  16. Criterion : Basic principle • A Basic principle for graph partitioning • Minimize the number of between-group connections • Maximize the number of within-group connections Graph partitioning : A & B

  17. Criterion : Min-cut VS N-cut • A Basic principle for graph partitioning • Minimize the number of between-group connections • Maximize the number of within-group connections Minimum-Cut vs Normalized-Cut

  18. Mathematical expression : Cut (A,B) • For considering between-group

  19. Mathematical expression : Vol (A) • For considering within-group vol (A) = 5 vol (B) = 5

  20. Criterion : Min-cut • Minimize the number of between-group connections • minA,Bcut(A,B) A B Cut(A,B) = 1 -> Minimum value

  21. Criterion : Min-cut A B Cut(A,B) = 1 But, it looks more balanced… How? A B

  22. Criterion : N-cut • Minimize the number of between-group connections • Maximize the number of within-group connections If we define ncut(A,B) as below, -> The minimum value of ncut(A,B) will produces more balanced partitions because it consider both principles

  23. Methodology A B Cut(A,B) = 1 ncut(A,B) = = 1.038.. VS A B Cut(A,B) = 2 ncut(A,B) = = 0.292..

  24. Summary • What is the undirected large graph? • How can we get insight from the undirected large graph? • Graph Partition & Community detection • What were the methodology for good graph partition? • Min-cut • Normalized-cut

  25. Spectral Clustering • Deep GraphEncoder Non-overlapping community detection: Waleed Abdulwahab Yahya Al-Gobi

  26. Finding Clusters • Howto identify such structure? • How to spilt the graph into two pieces? Nodes Nodes Network Adjacency Matrix

  27. Spectral Clustering Algorithm • Three basic stages: • 1)Pre-processing • Construct a matrix representation of the graph • 2)Decomposition • Compute eigenvalues and eigenvectors of the matrix • Focus is about and it corresponding . • 3)Grouping • Assign points to two or more clusters, based on the new representation

  28. Matrix Representations • Adjacency matrix (A): • n n binary matrix • A=[aij], aij=1if edge between node i and j 5 1 2 6 4 3

  29. Matrix Representations • Degree matrix (D): • n n diagonal matrix • D=[dii], dii=degree of node i 5 1 2 6 4 3

  30. Matrix Representations • How can we use (L) to find good partitions of our graph? • What are the eigenvalues and eigenvectors of (L)? • We know: L . x = λ . x

  31. Spectrum of Laplacian Matrix (L) • The Laplacian Matrix (L) has: • Eigenvalues where • Eigenvectors • Important properties: • Eigenvalues are non-negative real numbers • Eigenvectors are real and orthogonal • What is trivial eigenpair? • then and so 31

  32. Best Eigenvector for partitioning • Second Eigenvector • Best eigenvector that represents best quality of graph partitioning. • Let’s check the components of through • Minimum is taken under the constraints • is unit vector: that is • is orthogonal to 1st eigenvector thus: • Fact: For symmetric matrix (L):

  33. Details! λ2 as optimization problem • Fact: For symmetric matrix (L): • What is the meaning of min xTL x on G? • = = • = Remember : L = D - A

  34. λ2 as optimization problem All labelings of nodes so that We want to assign values to nodes i such that few edges cross 0.(we want xi and xj to subtract each other) i j x 0 Balance to minimize

  35. Spectral Partitioning Algorithm: Example • 1) Pre-processing: • Build Laplacianmatrix L of the graph • 2) Decomposition: • Find eigenvalues and eigenvectors xof the matrix L • Map vertices to corresponding components of X2 0.0 0.4 0.3 -0.5 -0.2 -0.4 -0.5 1.0 0.4 0.6 0.4 -0.4 0.4 0.0 3.0 0.4 0.3 0.1 0.6 -0.4 0.5 = X = 3.0 0.4 -0.3 0.1 0.6 0.4 -0.5 0.4 -0.3 -0.5 -0.2 0.4 0.5 4.0 0.4 -0.6 0.4 -0.4 -0.4 0.0 5.0 1 0.3 2 0.6 3 0.3 How do we now find the clusters? 4 -0.3 5 -0.3 6 -0.6

  36. Spectral Partitioning Algorithm: Example • 3)Grouping: • Sort components of reduced 1-dimensional vector • Identify clusters by splitting the sorted vector in two • How to choose a splitting point? • Naïve approaches: • Split at 0 or median value Split at 0: Cluster A: Positive points Cluster B: Negative points 1 0.3 B A 2 0.6 3 0.3 4 -0.3 1 4 -0.3 0.3 2 0.6 5 -0.3 5 -0.3 3 0.3 6 -0.6 6 -0.6

  37. Example: Spectral Partitioning Value of x2 Rank in x2

  38. Example: Spectral Partitioning Components of x2 Value of x2 Rank in x2

  39. k-Way Spectral Clustering • How do we partition a graph into k clusters? • Two basic approaches: • Recursive bi-partitioning[Hagen et al., ’92] • Recursively apply bi-partitioning algorithm in a hierarchical divisive manner • Disadvantages: Inefficient • Cluster multiple eigenvectors[Shi-Malik, ’00] • Build a reduced space from multiple eigenvectors • Commonly used in recent papers • A preferable approach

  40. Spectral Clustering • Deep GraphEncoder Deep GraphEncoder [Tian et al., 2014] Muhammad Burhan Hafez

  41. Autoencoder • Architecture: • Reconstruction loss: D1 D2 E1 E2

  42. Autoencoder & Spectral Clustering • Simple theorem (Eckart-Young-Mirsky theorem): • Let A be any matrix, with singular value decomposition (SVD) A = U Σ VT • Let be the decomposition where we keep only the k largest singular values • Then, is Note: If A is symmetric  singular values are eigenvalues & U = V = eigenvectors. Result (1): Spectral Clustering ⇔ matrix reconstruction

  43. Autoencoder & Spectral Clustering (cont’d) • Autoencoder case: • based on previous theorem, where X = U Σ VT and K is the hidden layer size Result (2): Autoencoder ⇔ matrix reconstruction

  44. Deep GraphEncoder | Algorithm • Clustering with GraphEncoder: • Learn a nonlinear embedding of the original graph by deep autoencoder (the eigenvectors corresponding to the K smallest eigenvalues of graph Lablacian matrix). • Run k-means algorithm on the embedding to obtain clustering result.

  45. Deep GraphEncoder | Efficiency • Approx. guarantee: • Cut found by Spectral Clustering and Deep GraphEncoder is at most 2 times away from the optimal. • Computational Complexity:

  46. Deep GraphEncoder | Flexibility • Sparsity constraint can be easily added. • Improving the efficiency (storage & data processing). • Improving clustering accuracy. Original objective function Sparsity constraint

  47. BigCLAM: Introduction Overlapping Community Detection SHANG XINDI

  48. Non-overlapping Communities Nodes Nodes Adjacency matrix Network

  49. Non-overlapping vs Overlapping

  50. Facebook Network Social communities High school Summerinternship Stanford (Basketball) Stanford (Squash) Nodes: Facebook Users Edges: Friendships 50

More Related