Workshop on Social Computing
Download
1 / 52

Indrajit Bhattacharya Research Scientist IBM Research, Bangalore - PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on

Workshop on Social Computing IIT Kharagpur, Oct 5-6 2012. Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media *. Indrajit Bhattacharya Research Scientist IBM Research, Bangalore.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Indrajit Bhattacharya Research Scientist IBM Research, Bangalore' - driscoll-york


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Indrajit bhattacharya research scientist ibm research bangalore

Workshop on Social Computing

IIT Kharagpur, Oct 5-6 2012

Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media*

Indrajit Bhattacharya

Research Scientist

IBM Research, Bangalore

*Collaboration w/ Himabindu Lakkaraju & Chiranjib Bhattacharyya


Social media analysis motivation
Social Media Analysis: Motivation

  • Microblogs: Twitter, Facebook, MySpace

  • Understanding and analyzing topics & trends

  • Influences on users

  • Variety of stakeholders

    • Business

    • Government

    • Social scientists


Social media analysis challenges
Social Media Analysis: Challenges

  • Network and Influences on Users

    • User personality: Personal preferences, global and geographic trends, social circle in the network [Yang WSDM 11]

  • Dynamic nature

    • Topics & user personalities evolve over time

  • Volume of data

  • Existing approaches fall short


Soc med analysis state of the art
Soc Med Analysis: State of the Art

  • Content Analysis

    • Ramage ICWSM 2010, Hong SOMA 2010

    • Variants of LDA

  • Inferring User Interests

    • Ahmed KDD 2011, Wen KDD 2010

    • Individual features such as user activity or network

  • Patterns in Temporal Evolution

    • Yang et al WSDM 2011


Bayesian non parametric models
Bayesian Non-parametric Models

  • Choosing no of components in a mixture model

  • Particularly severe problem for large data volumes such as for social media data

  • Bayesian solution

  • Infinite dimensional prior

    • Allows no of mixture components to grow with data size

  • Cannot capture richness of social media data

  • Algorithms often not scalable


Talk outline
Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results


Talk outline1
Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results





Talk outline2
Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Parallelized Online Inference Algorithm

  • Experimental Results










Influence of geography
Influence of Geography

China

India

UK



Aggregating influences
Aggregating Influences

  • RelCRP is exchangeable like the CRP

  • Useful as a prior for infinite mixture model

  • RelCRP captures influence of one relation on posts

  • Influences act simultaneously on any user

  • Aggregated influence pattern is user specific

    • Different users affected differently by same combination of world-wide and geographic factors



Talk outline3
Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results


Evolving patterns in social media
Evolving Patterns in Social Media

  • Number of Topics

    • Topics die and new ones are born

  • User Personalities

    • Susceptibility to influence by world-wide, geographic and friends’ preferences

  • Existing Topic Distributions

    • Words go out of fashion, new ones enter vocabulary

  • Topic Characters:

    • Popularity of topic changes world-wide, in users preference, sub-networks and geographies






Talk outline4
Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results



Online algorithm
Online Algorithm

  • Traditional iterative framework does not scale for social media data

  • Sequential Monte Carlo methods [CaniniAIStats ‘09] that rejuvenate some old labels also infeasible

  • Online sampling[Banerjee SDM ‘07] does not revisit old labels at all; initial batch phase

  • Adapt for non-parametric setting


Multi threaded implementation
Multi-threaded Implementation

  • Sequential online implementation does not scale

  • Iterative Gibbs sampling algorithms parallelized for hierarchical Bayesian models [Asuncion NIPS 08, Smola VLDB 10]

  • Our algorithm is parallel, online and non-parametric

  • Explicit consolidation by master thread at the end of each iteration

    • Only new topics consolidated


Talk outline5
Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results


Datasets and baselines
Datasets and Baselines

  • Twitter: 360 million tweets (Jun-Dec 2009)

  • Facebook: 300,000 posts (public profiles, 3 mths)

  • Latent Dirichlet Allocation (LDA)

    • [Hong SOMA 2010]

  • Labeled LDA (L-LDA)

    • Hashtags as topics [Ramage ICWSM 2010]

  • Timeline

    • Dynamic non-parametric topic model [Ahmed UAI 2010]


1 model goodness
1 Model Goodness

  • Perplexity: Ability to generalize to unseen data

  • Both network and dynamics are important for modeling social media data


2 quality of discovered topics
2 Quality of Discovered Topics

  • Label assigned to each post indicating category

  • Distribution over words indicating semantics

  • Clustering posts using topic labels

  • Prediction using topic labels

    • Predicting post authorship & user commenting activity

  • Major event detection


2a post clustering using topics
2A Post Clustering using Topics

  • Use hashtags as gold standard (for Twitter)

    • 16K posts #NIPS2009, #ICML2009, #bollywoodetc

  • DMRelCRP close to L-LDA without using hashtags

  • DMelCRP produces ‘finer-grained’ clusters


2b prediction using topics
2B Prediction Using Topics

  • Authorship: Given post and user, predict if author

  • Commenting activity: Given post and (non-author) user, predict if user comments on that post

  • DMRelCRP topics lead to more accurate prediction






3a global personality trends1
3A Global Personality Trends

FIFA WC

Michael

Jackson’s

death

Google Wave



3b geo specific personality trends
3B Geo-specific Personality Trends

  • Personality trends very similar in UK and US

  • Geographic influences high at different epochs


3b geo specific personality trends1
3B Geo-specific Personality Trends

  • India: W-wide and geographic influences weaker

  • China: W-wide weak, geo strong; stable pattern





Scaling with data size
Scaling with Data Size

  • Java-based multi-threaded framework; 7 threads

  • 8-core 32 GB RAM

  • Scales largely because of multi-threading


Summary
Summary

  • First attempt at studying user influences in social media data

  • New non-parametric model that captures multiple relationships and temporal evolution

  • Multi-threaded online Gibbs sampling algorithm

  • Extensive evaluation on large real dataset

  • Topics lead to better clustering and prediction

  • Insights on user influence patterns