Discovering overlapping groups in social media
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

Discovering Overlapping Groups in Social Media PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on
  • Presentation posted in: General

Discovering Overlapping Groups in Social Media. Xufei Wang , Lei Tang, Huiji Gao, and Huan Liu [email protected] Arizona State University. Social Media. Facebook 500 million active users 50% of users log on to Facebook everyday Twitter 100 million users 300, 000 new users everyday

Download Presentation

Discovering Overlapping Groups in Social Media

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Discovering overlapping groups in social media

Discovering Overlapping Groups in Social Media

Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu

[email protected]

Arizona State University


Social media

Social Media

  • Facebook

    • 500 million active users

    • 50% of users log on to Facebook everyday

  • Twitter

    • 100 million users

    • 300, 000 new users everyday

    • 55 million tweets everyday

  • Flickr

    • 12 million members

    • 5 billion photos


Activities in social media

Activities in Social Media

Connect with others to form “Friends”

Interactwith others (comment, discussion, messaging)

Bookmarkwebsites/URLs (StumbleUpon, Delicious)

Joingroupsif explicitly exist (Flickr, YouTube)

Writeblogs(Wordpress,Myspace)

Updatestatus(Twitter, Facebook)

Sharecontent (Flickr, YouTube, Delicious)


Community structure

Community Structure

  • Behavior Studying

    • Individual ? Too many users

    • Site level ? Lose too much details

    • Community level. Yes, provide information with vary granularity


Overlapping communities

Overlapping Communities

Neighbors

Colleagues

Family


Related work

Related Work

  • Disjoint Community Detection

    • Modularity Maximization

    • Based on Link Structure, (how to understand ?)

  • Overlapping Community Detection

    • Soft Clustering (Clustering is dense)

    • CFinder (Efficiency and Scalability)

  • Co-clustering

    • Disjoint

    • Understanding groups by words (tags)


Problem statement

Problem Statement

u1

t1

u2

t2

u3

t3

u4

t4

u5

Given a User-Tag subscription matrix M, and the number of clusters k, find koverlappingcommunities which consist of both users and tags.


Our contributions

Our Contributions

  • Extracting overlapping communities that better reflect reality

  • Clustering on a user-tag graph. Tags are informative in identifying user interests

    • Understanding groups by looking at tags within each group


Edge centric view

Edge-centric View

u1

t1

u2

t2

u3

t3

u4

t4

u1

t1

u4

t3

u5

u3

u2

t2

u5

t4

  • Cluster edges instead of nodes into disjoint groups

    • One node can belong to multiple groups

    • One edge belongs to one group


Edge centric view1

Edge-centric View

In an Edge-centric view


Clustering edges

Clustering Edges

  • We can use any clustering algorithms (e.g., k-means) to group similar edges together

  • Different similarity schemes


Defining edge similarity

Defining Edge Similarity

tq

ui

tp

uj

  • α is set to 0.5, which suggests the equal importance of user and tag

  • Define user-user and tag-tag similarity

Similarity between two edges e and e’ can be defined, but not limited, by


Independent learning

Independent Learning

  • Assume users are independent, tags are independent


Normalized learning

Normalized Learning

Differentiate nodes with varying degrees by normalizing each node with its nodal degree


Correlational learning

Correlational Learning

u Х t

u Х k

  • Compute user-user and tag-tag cosine similarity in the latent space

  • Tags are semantically close

    • Tagscars, automobile, autos,car reviewsare used to describe a blog written by sid0722 on BlogCatalog


Spectral clustering perspective

Spectral Clustering Perspective

  • Graph partition can be solved by the Generalized Eigenvalue problem


Spectral clustering perspective1

Spectral Clustering Perspective

  • U and V are the right and left singular vectors corresponding to the top k largest singular values of user-tag matrix M

Plug in L,W,Z, we obtain


Synthetic data sets

Synthetic Data Sets

  • Synthetic data sets

    • Number of clusters, users, and tags

    • Inner-cluster density and Inter-cluster density (1% of total user-tag links)

    • Normalized mutual Information

      • Between 0 and 1

      • The higher, the better


Synthetic performance

Synthetic Performance

We fix the number of users, tags, and density, but vary the number of clusters


Synthetic performance1

Synthetic Performance

We fixed the number of users, tags, and clusters, but vary the inner-cluster density


Social media data sets

Social Media Data Sets

  • BlogCatalog

    • Tags describing each blog

    • Category predefined by BlogCatalog for each blog

  • Delicious

    • Tags describing each bookmark

    • Select the top 10 most frequently used tags for each person


Inferring personal interests

Inferring Personal Interests

Category information reveals personal interests, view group affiliation as features to infer personal interests via cross-validation


Connectivity study

Connectivity Study

The correlation between the number of co-occurrence of two users in different affiliations and their connectivity in real networks.

The larger the co-occurrence of two users, the more likely they are connected


Understanding groups via tag cloud

Understanding Groups via Tag Cloud

Tag cloud for Category Health


Understanding groups via tag cloud1

Understanding Groups via Tag Cloud

Tag cloud for Cluster Health


Understanding groups via tag cloud2

Understanding Groups via Tag Cloud

Tag cloud for Cluster Nutrition


Conclusions and future work

Conclusions and Future Work

  • Overlapping communities on a User-Tag graph

  • Propose an edge-centric view and define edge similarity

    • Independent Learning

    • Normalized Learning

    • Correlational Learning

  • Evaluate results in synthetic and real data sets

  • Many applications: link prediction, Scalability


References

References

I. S. Dhillon, “Co-clustering documents and words using bipartite spectral graph partitioning,” in KDD ’01, NY, USA

L. Tang and H. Liu, “Scalable learning of collective behavior based on sparse social dimensions,” in CIKM’09, NY, USA.

L. Tang and H. Liu, “Community Detection and Mining in Social Media,” Morgan & Claypool Publishers, Synthesis Lectures on Data Mining and Knowledge Discovery, 2010.

G. Palla, I. Dernyi, I. Farkas, and T. Vicsek, “Uncovering the overlapping community structure of complex networks in nature and society,” Nature’05, vol.435, no.7043, p.814

K. Yu, S. Yu, and V. Tresp, “Soft clustering on graphs,” in NIPS, p. 05, 2005.

U. Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007.

M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Phys. Rev. E, vol. 69, no. 2, p. 026113, Feb 2004.

S. Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, no. 3-5, pp. 75 – 174, 2010.


Contact the authors

Contact the Authors

  • Xufei Wang

    • [email protected]

    • Arizona State University

  • Lei Tang

    • [email protected]

    • Yahoo! Labs


  • Login