1 / 30

On Finding Fine-Granularity User Communities by Profile Decomposition

The 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 26-29 August, 2012, Kadir Has University, Istanbul, Turkey. On Finding Fine-Granularity User Communities by Profile Decomposition. Seulki Lee , Minsam Ko , Keejun Han, Jae-Gil Lee

akio
Download Presentation

On Finding Fine-Granularity User Communities by Profile Decomposition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 26-29 August, 2012, Kadir Has University, Istanbul, Turkey On Finding Fine-Granularity User Communities by Profile Decomposition Seulki Lee, MinsamKo, Keejun Han, Jae-Gil Lee Department of Knowledge Service Engineering KAIST(Korea Advanced Institute of Science and Technology) {seulki15, minsam.ko, brianhan87}@gmail.com, jaegil@kaist.ac.kr

  2. Table of Contents • Introduction • DecompClus Algorithm • Evaluation • Related Work • Conclusion

  3. Community Discovery • Community discovery is one of the most popular tasks in social network analysis. • Many real-world applications with community discovery • Advertisement to common interest groups • Recommendation of potential collaborators in workplaces

  4. Relationships in Social Networks • A social network is modeled as a huge graph. • A node is a user. • An edge is a relationship between users. • Two types of relationships in social network • Explicit relationship • Implicit relationship Explicit relationship Implicit relationship Follower / Following Friend Unknown, but similar interest We focus on this relationship.

  5. Extracting implicit relationships • To extract implicit relationships, a user is typically represented by his/her profile, and the similarity between user profiles is measured. • The form of the profile depends on the social network and application. • In DBLP, the profile is a list of papers he/she wrote • In Twitter, the profile is a list of tweets he/she posted User A’s profile User B’s profile Similarity between the profiles = Implicit relationship … …

  6. Limitation of a Single Profile • Generally, a user is described by only a singleprofilewhich oversimplifies the multiple characteristics of a user. • This problem results in loss of meaningful communities.  Though User A and User B share the same interest about photography, overall similarity between the two users is not very high.

  7. DecompClus • We propose DecompClus, the community discovery method of profile decomposition, which divides a profile into sub-profiles. • Step2: sub-profile clustering Step1: Profile Decomposition Profiles Sub-Profiles Communities outdoor, hiking, … … photo, lens, … outdoor, hiking, … outdoor, hiking, … photo, lens, … photo, lens, … photo, color, … photo, color, … photo, color, … art, museum, … … art, museum, art, museum, … …

  8. Table of Contents • Introduction • DecompClus Algorithm • Evaluation • Related Work • Conclusion

  9. Overall Procedure of DecompClus

  10. Step 1: Profile Decomposition (1/2) • A network of unit items (e.g., papers or tweets) is constructed for each user’s profile. • A node (item) is represented by a term vector (weight: TF-IDF). • An edge is determined as the similarity between two nodes (cosine similarity). User A’s profile i1 i5 i2 i6 i3 i7 i4

  11. Step 1: Profile Decomposition (2/2) • Clustering is performed on the small network. • We adopted a clustering algorithm based on modularity optimization, which tries to detect high modularity partitions of networks [V. D. Blondel, et. al., 2008]. • Each cluster becomes a sub-profile. User A’s profile User A’s sub-profiles

  12. Step 2: Sub-Profile Clustering (1/2) • A network of sub-profiles is constructed by accumulating sub-profiles from every user. • A node (sub-profile) is represented by a term vector (weight: TF-IDF). • A edge is weighted by the similarity between two nodes (cosine similarity). User A’s sub-profile User D’s sub-profile User A’s sub-profile User B’s sub-profile User E’s sub-profile User C’s sub-profile

  13. Step 2: Sub-Profile Clustering (2/2) • Clustering is performed on the network of sub-profiles. • The same clustering method is used to group sub-profiles. • Now, each cluster becomes a user community. • A user can belong to multiple communities (e.g., User A is in C1 and C2) • DecompClus is a method to discover overlapping community structure by non-overlapping clustering method. User A’s sub-profile User A User D User D’s sub-profile User A’s sub-profile User A User B User B’s sub-profile User E User C User C’s sub-profile User E’s sub-profile Community C2 Community C1

  14. Overall Procedure of DecompClus

  15. Table of Contents • Introduction • DecompClus Algorithm • Evaluation • Related Work • Conclusion

  16. Experimental Set-up (1/3) • Evaluation methods • Quantitative evaluation: verify that DecompClus finds more tightly and well-connected communities • Modularity value • Intra-similarity • Inter-similarity • Qualitative evaluation: explain how the communities by our method and those by compared method are different semantically • Defining the theme of each community • Case studies (See the paper) • Visualization

  17. Experimental Set-up (2/3) • CiteULike • Social bookmarking service for scholarly papers • http://www.citeulike.org/faq/data.adp • Dataset • # of users = 122 • # of articles = 25,089 • # of unique stemmed tags = 16,161 • Half of the users have more than one interest Distribution of users according to their tags tag like 'data_mining%' or 'mining%' or 'knowledge_discovery%' tag like 'social_network%' or 'socialnetwork%' tag like 'recommend%’

  18. Experimental Set-up (3/3) • Implementation • Gephi Library - open-source software for visualizing and analyzing large network graphs • Baseline • Follows almost the same procedures. • Use only one overall profile for a user Profiles Communities … … photo, lens, … outdoor, hiking, … photo, lens, … outdoor, hiking,… photo, color, … art, museum, … photo, color, … art, museum, … … …

  19. Discovered Communities Baseline DecompClus • # of community • DecompClusfinds more communities than Baseline does. • # of users in community • The discovered communities by DecompClus have a greater number of members than Baseline. • ∵ DecompClus allows a user to belong to multiple communities at the same time.

  20. Quantitative Evaluation • DecompClus achieves better metrics than Baseline • Modularity value: the strength of division of a network into modules • Intra-similarity: the average value of similarities in a community • Inter-similarity: the average value of similarities between communities  In DecompClus the connections between the members within a community are denser; in contrast, the connections between the members in different communities are sparser.

  21. Qualitative Evaluation (1/2) • DecompClus preserves the themes defined by Baseline. • DecompClus finds new communities that are not found by Baseline. Baseline DecompClus newly founded

  22. Qualitative Evaluation (2/2) • In DecompClus , a user’s minor interests are not assimilated into his/her major interests, so new communities which consist of users’ minor interests can be discovered. DecompClus Baseline Distribution of articles related to “Semantic web” Distribution of articles related to “Bioinformatics”

  23. Visualization • The community structure produced by DecompClus is more clearly distinguishable. Baseline DecompClus By ForceAtlas2 layout provided by Gephi

  24. Table of Contents • Introduction • DecompClus Algorithm • Evaluation • Related Work • Conclusion

  25. Related Work (1/2) • Comparison with related areas

  26. Related Work (2/2) • Non-overlapping community discovery • Newman’s method [Newman and Girvan, 2004] • Multi-level graph partitioning method [Karypis and Kumar, 1995] • Attribute augmented graph [Zhou et al., 2006] • Bayesian generative models [Wang, 2006] • Overlapping community discovery • CPM (clique percolation method) [Pallal et al., 2005] • Connectedness and local optimality [Goldberg et al., 2010] • Label propagation [Gregory, 2009]

  27. Conclusion • A novel concept of profile decomposition, which enables us to detect fine-granularity user communities with implicit relationships • Anew approach to discovering overlapping communities with non-overlapping community discovery algorithms • We demonstrate, by using real data set, that our algorithm effectively discovers user communities from social media data.

  28. THANK You !!

  29. Case Studies Case 1 • Users who become a member in multiple communities by profile decomposition For example, a user A’s profile Baseline DecompClus User A’s sub-profile2 User A User A’s sub-profile1 semantics, semantic web, rdf, ontology, social semantic web … user model, recommender, personalization, user profiling, knn, data mining … Community Bc1(data mining& Recommendation) Community Dc2 (semantic web) Community Dc1 (data mining & recommendation) User A’s sub-profile3 social network analysis, social search, graphs, … Community Bc2(Social network) Community Dc3 (Data mining & Bioinformatics) Community Dc4 (social network) In our data set, there are total 99 users (81.1%) like the user A.

  30. Case Studies Case 2 • Users who become a member in the communities newly discovered by DecompClus For example, a user B’s profile Baseline DecompClus User B Community Bc1(data mining& Recommendation) Community Dc1 (data mining & recommendation) Community Dc2 (semantic web) User B’s sub-profile1 statistics, cancer, genomics, gene, sequencing, virus, bacteria, database, classification, … Community Bc2(Social network) Community Dc3 (Data mining & Bioinformatics) Community Dc4 (social network) There are total 9 users (7.3%) like the user B.

More Related