1 / 49

Efficient Identification of Overlapping Communities

This paper proposes a two-stage approach for efficiently identifying overlapping communities in communication networks. The proposed algorithm is compared with existing methods and shows better efficiency and comparable quality. Future work includes strategies for choosing parameters based on network properties.

hillmary
Download Presentation

Efficient Identification of Overlapping Communities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY

  2. Outline • Communities as clusters • What is a cluster? • Cluster seed procedure (LA) • Cluster refinement procedure (IS2) • Experimental results • Conclusions and future work

  3. Communities as clusters • Malicious groups use large communication networks for planning and coordination • Their goal: remain undetected • Our goal: sift through communications for suspicious patterns, using structure only, not content

  4. Communities as clusters • Detecting all social groups (malicious or not) will aide in searching for “hidden” groups • Social groups tend to communicate densely • Approach: Find social groups by finding clusters in the graph of the communication network Add external edges likely a social group A communicates with B likely not a social group actor A actor B

  5. What is a cluster? • Many partitioning algorithms exist • Social groups often overlap • Instead define clusters as locally optimal with respect to density overlapping clustering partitioning

  6. Two-stage process communication network seed procedure seed clusters refinement procedure final clusters

  7. Original procedures communication network Rank Removal (RaRe) seed clusters Iterative Scan (IS) Jeffrey Baumes, Mark Goldberg, Mukkai Krishnamoorthy, Malik Magdon-Ismail, Nathan Preston. "Finding Communities by Clustering a Graph into Overlapping Subgraphs", International Conference on Applied Computing (IADIS 2005), Feb 22-25, Algarve, Portugal. final clusters

  8. Proposed new procedures communication network Link Aggregate (LA) seed clusters Iterative Scan 2 (IS2) final clusters

  9. Link Aggregate (LA) • Order the nodes (two routines are used) • Pass through the nodes • For each node, add it to the clusters it improves, or start a new cluster

  10. LA procedure

  11. LA procedure 8 35 27 12 23 3 24 6 25 5 7 17 16 28 1 21 15 2 9 29 11 4 33 32 26 20 10 14 19 22 31 13 30 34 18

  12. LA procedure 8 35 27 12 23 3 24 6 25 5 7 17 16 28 1 21 15 2 9 29 11 4 33 32 26 20 10 14 19 22 31 13 30 34 18

  13. LA procedure 8 35 27 12 23 3 24 6 25 5 7 17 16 28 1 21 15 2 9 29 11 4 33 32 26 20 10 14 19 22 31 13 30 34 18

  14. LA procedure 8 35 27 12 23 3 24 6 25 5 7 17 16 28 1 21 15 2 9 29 11 4 33 32 26 20 10 14 19 22 31 13 30 34 18

  15. LA procedure 8 35 27 12 23 3 24 6 25 5 7 17 16 28 1 21 15 2 9 29 11 4 33 32 26 20 10 14 19 22 31 13 30 34 18

  16. LA procedure 8 35 27 12 23 3 24 6 25 5 7 17 16 28 1 21 15 2 9 29 11 4 33 32 26 20 10 14 19 22 31 13 30 34 18

  17. Iterative Scan (IS) • Old refinement procedure • Traverses entire node list, adding / removing nodes which increase the density • Repeats the process until no improvements are possible • May be inefficient in sparse networks\ • Guaranteed to be locally optimal

  18. Iterative Scan 2 (IS2) • New refinement procedure • Traverses neighborhood of cluster only, adding / removing nodes which increase the density • Repeats the process until no improvements are possible • More efficient in sparse networks in spite of overhead, less efficient in dense networks

  19. IS2 procedure

  20. IS2 procedure

  21. IS2 procedure

  22. IS2 procedure

  23. IS2 procedure

  24. Experimental results • Compare run time of new vs. old • Compare cluster quality of new vs. old • Compare on different network types • Random • Preferential attachment • Real-world • Compare possible actor orderings for LA

  25. RaRe vs. LA run time New RaRe Original RaRe LA New RaRe LA

  26. IS vs. IS2 run time Define IS* = IS for dense graphs, IS2 for sparse graphs

  27. Old vs. new quality New RaRe → IS New RaRe → IS LA → IS2 LA → IS2

  28. Preferential attachment New RaRe → IS New RaRe → IS LA → IS2 LA → IS2

  29. Real-World Networks Ratio = new/old = (LA→IS*)/(RaRe→IS) IS IS* = IS2 IS2 IS2

  30. LA ordering

  31. Conclusions and future work • Overlapping clustering may be used to discover social groups in communication networks • The new algorithm is more efficient in many cases, while keeping the same or better quality • A unified algorithm should choose strategies and parameters based on network properties

  32. Questions

  33. Rank Removal • Existing seed procedure • Removes highly connected nodes until network is broken into small clusters • Adds removed nodes back into clusters it is well-connected to • Two main inefficiencies • Computed Page Rank at each iteration • Computed connected components at each iteration • Page Rank could be computed once, but reprocessing connected components is crucial

  34. LA procedure detail

  35. IS2 procedure detail

  36. RaRe vs. LA

  37. RaRe vs. LA

  38. RaRe vs. LA

  39. IS vs. IS2

  40. IS vs. IS2

  41. IS vs. IS2

  42. Run time RaRe vs. LA

  43. Run time IS vs. IS2

  44. Cluster quality

  45. Cluster quality

  46. Preferential attachment run time

  47. Preferential attachment quality

  48. LA ordering run time

  49. LA ordering quality

More Related