1 / 13

Presentation: Genetic clustering of social networks using random walks

ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering of social networks using random walks Aykut Firat , Sangit Chatterjee , Mustafa Yilmaz College of Business Administration, Northeastern University, Boston, MA 02115, USA Presented by Oleg Kolgushev.

terrel
Download Presentation

Presentation: Genetic clustering of social networks using random walks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ELSEVIER • Computational Statistics & Data Analysis • February 2007 • Genetic clustering of social networks using random walks • AykutFirat, SangitChatterjee, Mustafa Yilmaz • College of Business Administration, Northeastern University, Boston, MA 02115, USA • Presented by Oleg Kolgushev Presentation: Genetic clustering of social networks using random walks • Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

  2. Contents Presentation: Genetic clustering of social networks using random walks • Introduction to Clustering in networks • Random walk based distance measure • Genetic representation • Experiments • Synthetic data creation • Network clustering experiments • Spatial data experiments • Conclusion Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

  3. Introduction Presentation: Genetic clustering of social networks using random walks • Popularity of social networks • Mathematical model is a dream. Use heuristic techniques. • Clustering is NP-hard problem. • Genetic algorithm with medoid based representation. • Random walk measure is superior to Euclidian distance. Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

  4. Background Presentation: Genetic clustering of social networks using random walks • Network is represented by weighted graph (V,E,w) where w is a measure of similarity between vertices. • Objective is to find decomposition into k-clusters (non-overlapping sub-graphs highly connected vertices) • Random walker will likely to stay inside of a cluster until most of vertices are visited. • Calculating “escape probabilities”. • GA fitness function classifies a node based on sum of edges in a cluster versus sum of edges leading to different sets. Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

  5. Random walk based distance Presentation: Genetic clustering of social networks using random walks • Average First time passage m(i,j) • Average Commute Time (ACT) • In matrix and vector multiplication it represented as • Where • ui = [0100…0], L=D-A, A is similarity matrix (wij), e - is a column vector made of [1111…1] , and Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

  6. Random walk based distance Presentation: Genetic clustering of social networks using random walks • This measure is appealing for social networks as clustered nodes connect by lots of short paths, clusters are not similar sizes and not spherically shaped. Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

  7. Genetic Representation Presentation: Genetic clustering of social networks using random walks • GA is a computer simulation of evolution processes (inheritance, mutation, selection, and crossover). • Representation is a key value • Array of size N (nodes in graph) elements restricted by k (clusters) • k-bins with elements restricted by N (nodes) • k-medoids are clusters represented by one node and other nodes are assigned to the nearest cluster • Possible gene is [3,7] with assignment [{1,2,3,4},{5,6,7,8}] • Small genome, tight clustering. Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

  8. Medoid-based representation with exception bins Presentation: Genetic clustering of social networks using random walks • Exception bin contains nodes that do not obey representation by the medoid. • Possible gene [3,7] suggests allocation [{1,2,3,4},{5,6,7,8}] with exception [3,7{5,6},{2}] • Crossover defined by randomly interchanging genes • Mutation is mode of exception creation based on proximity • Fitness function used: inverse of the sum of the distances to the medoids; inverse of the sum of all pair-wise distances within a group; min sum of all pair-wise distances between nodes . Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

  9. Experiments Presentation: Genetic clustering of social networks using random walks • How accurate are the clustering results compare to Euclidian distance clustering? • How efficient this approach and what is algorithm complexity? • Synthetic data creation: Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

  10. Network clustering experiments Presentation: Genetic clustering of social networks using random walks • Example of 50 nodes network with 6 clusters shown. Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

  11. Network clustering experiments Presentation: Genetic clustering of social networks using random walks Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

  12. Spacial data experiments Presentation: Genetic clustering of social networks using random walks • Results of transformation and clustering of 150 iris specimens, 50 from each of three species (Fisher’s Iris data) Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

  13. Conclusion Presentation: Genetic clustering of social networks using random walks • O(n3) limit applicability of random walk distances for large network • Excellent result when number of clusters is known. What k is right? • Superior results compare to Euclidian distances regardless of clustering algorithm used. • Exceptionally good clustering results for representing spacial data as a network when optimum number of nearest neighbors used. Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

More Related