Workshop on Social Computing
Download
1 / 52

Indrajit Bhattacharya Research Scientist IBM Research, Bangalore - PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on

Workshop on Social Computing IIT Kharagpur, Oct 5-6 2012. Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media *. Indrajit Bhattacharya Research Scientist IBM Research, Bangalore.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Indrajit Bhattacharya Research Scientist IBM Research, Bangalore' - driscoll-york


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Workshop on Social Computing

IIT Kharagpur, Oct 5-6 2012

Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media*

Indrajit Bhattacharya

Research Scientist

IBM Research, Bangalore

*Collaboration w/ Himabindu Lakkaraju & Chiranjib Bhattacharyya


Social media analysis motivation
Social Media Analysis: Motivation

  • Microblogs: Twitter, Facebook, MySpace

  • Understanding and analyzing topics & trends

  • Influences on users

  • Variety of stakeholders

    • Business

    • Government

    • Social scientists


Social media analysis challenges
Social Media Analysis: Challenges

  • Network and Influences on Users

    • User personality: Personal preferences, global and geographic trends, social circle in the network [Yang WSDM 11]

  • Dynamic nature

    • Topics & user personalities evolve over time

  • Volume of data

  • Existing approaches fall short


Soc med analysis state of the art
Soc Med Analysis: State of the Art

  • Content Analysis

    • Ramage ICWSM 2010, Hong SOMA 2010

    • Variants of LDA

  • Inferring User Interests

    • Ahmed KDD 2011, Wen KDD 2010

    • Individual features such as user activity or network

  • Patterns in Temporal Evolution

    • Yang et al WSDM 2011


Bayesian non parametric models
Bayesian Non-parametric Models

  • Choosing no of components in a mixture model

  • Particularly severe problem for large data volumes such as for social media data

  • Bayesian solution

  • Infinite dimensional prior

    • Allows no of mixture components to grow with data size

  • Cannot capture richness of social media data

  • Algorithms often not scalable


Talk outline
Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results


Talk outline1
Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results





Talk outline2
Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Parallelized Online Inference Algorithm

  • Experimental Results










Influence of geography
Influence of Geography

China

India

UK



Aggregating influences
Aggregating Influences

  • RelCRP is exchangeable like the CRP

  • Useful as a prior for infinite mixture model

  • RelCRP captures influence of one relation on posts

  • Influences act simultaneously on any user

  • Aggregated influence pattern is user specific

    • Different users affected differently by same combination of world-wide and geographic factors



Talk outline3
Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results


Evolving patterns in social media
Evolving Patterns in Social Media

  • Number of Topics

    • Topics die and new ones are born

  • User Personalities

    • Susceptibility to influence by world-wide, geographic and friends’ preferences

  • Existing Topic Distributions

    • Words go out of fashion, new ones enter vocabulary

  • Topic Characters:

    • Popularity of topic changes world-wide, in users preference, sub-networks and geographies






Talk outline4
Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results



Online algorithm
Online Algorithm

  • Traditional iterative framework does not scale for social media data

  • Sequential Monte Carlo methods [CaniniAIStats ‘09] that rejuvenate some old labels also infeasible

  • Online sampling[Banerjee SDM ‘07] does not revisit old labels at all; initial batch phase

  • Adapt for non-parametric setting


Multi threaded implementation
Multi-threaded Implementation

  • Sequential online implementation does not scale

  • Iterative Gibbs sampling algorithms parallelized for hierarchical Bayesian models [Asuncion NIPS 08, Smola VLDB 10]

  • Our algorithm is parallel, online and non-parametric

  • Explicit consolidation by master thread at the end of each iteration

    • Only new topics consolidated


Talk outline5
Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results


Datasets and baselines
Datasets and Baselines

  • Twitter: 360 million tweets (Jun-Dec 2009)

  • Facebook: 300,000 posts (public profiles, 3 mths)

  • Latent Dirichlet Allocation (LDA)

    • [Hong SOMA 2010]

  • Labeled LDA (L-LDA)

    • Hashtags as topics [Ramage ICWSM 2010]

  • Timeline

    • Dynamic non-parametric topic model [Ahmed UAI 2010]


1 model goodness
1 Model Goodness

  • Perplexity: Ability to generalize to unseen data

  • Both network and dynamics are important for modeling social media data


2 quality of discovered topics
2 Quality of Discovered Topics

  • Label assigned to each post indicating category

  • Distribution over words indicating semantics

  • Clustering posts using topic labels

  • Prediction using topic labels

    • Predicting post authorship & user commenting activity

  • Major event detection


2a post clustering using topics
2A Post Clustering using Topics

  • Use hashtags as gold standard (for Twitter)

    • 16K posts #NIPS2009, #ICML2009, #bollywoodetc

  • DMRelCRP close to L-LDA without using hashtags

  • DMelCRP produces ‘finer-grained’ clusters


2b prediction using topics
2B Prediction Using Topics

  • Authorship: Given post and user, predict if author

  • Commenting activity: Given post and (non-author) user, predict if user comments on that post

  • DMRelCRP topics lead to more accurate prediction






3a global personality trends1
3A Global Personality Trends

FIFA WC

Michael

Jackson’s

death

Google Wave



3b geo specific personality trends
3B Geo-specific Personality Trends

  • Personality trends very similar in UK and US

  • Geographic influences high at different epochs


3b geo specific personality trends1
3B Geo-specific Personality Trends

  • India: W-wide and geographic influences weaker

  • China: W-wide weak, geo strong; stable pattern





Scaling with data size
Scaling with Data Size

  • Java-based multi-threaded framework; 7 threads

  • 8-core 32 GB RAM

  • Scales largely because of multi-threading


Summary
Summary

  • First attempt at studying user influences in social media data

  • New non-parametric model that captures multiple relationships and temporal evolution

  • Multi-threaded online Gibbs sampling algorithm

  • Extensive evaluation on large real dataset

  • Topics lead to better clustering and prediction

  • Insights on user influence patterns


ad