Workshop on Social Computing
Sponsored Links
This presentation is the property of its rightful owner.
1 / 52

Indrajit Bhattacharya Research Scientist IBM Research, Bangalore PowerPoint PPT Presentation


  • 58 Views
  • Uploaded on
  • Presentation posted in: General

Workshop on Social Computing IIT Kharagpur, Oct 5-6 2012. Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media *. Indrajit Bhattacharya Research Scientist IBM Research, Bangalore.

Download Presentation

Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Workshop on Social Computing

IIT Kharagpur, Oct 5-6 2012

Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media*

Indrajit Bhattacharya

Research Scientist

IBM Research, Bangalore

*Collaboration w/ Himabindu Lakkaraju & Chiranjib Bhattacharyya


Social Media Analysis: Motivation

  • Microblogs: Twitter, Facebook, MySpace

  • Understanding and analyzing topics & trends

  • Influences on users

  • Variety of stakeholders

    • Business

    • Government

    • Social scientists


Social Media Analysis: Challenges

  • Network and Influences on Users

    • User personality: Personal preferences, global and geographic trends, social circle in the network [Yang WSDM 11]

  • Dynamic nature

    • Topics & user personalities evolve over time

  • Volume of data

  • Existing approaches fall short


Soc Med Analysis: State of the Art

  • Content Analysis

    • Ramage ICWSM 2010, Hong SOMA 2010

    • Variants of LDA

  • Inferring User Interests

    • Ahmed KDD 2011, Wen KDD 2010

    • Individual features such as user activity or network

  • Patterns in Temporal Evolution

    • Yang et al WSDM 2011


Bayesian Non-parametric Models

  • Choosing no of components in a mixture model

  • Particularly severe problem for large data volumes such as for social media data

  • Bayesian solution

  • Infinite dimensional prior

    • Allows no of mixture components to grow with data size

  • Cannot capture richness of social media data

  • Algorithms often not scalable


Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results


Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results


Dirichlet Process (Informal)


Dirichlet Process: Properties


Chinese Restaurant Process (CRP)


Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Parallelized Online Inference Algorithm

  • Experimental Results


Relational Ch. Rest. Pr. (RelCRP)

R


Relational Ch. Rest. Pr. (RelCRP)


Influence of World-wide Factors


Influence of World-wide Factors


Influence of Personal Preferences


Influence of Personal Preferences


Influence of Friend Network


Influence of Friend Network


Influence of Geography

China

India

UK


Influence of Geography


Aggregating Influences

  • RelCRP is exchangeable like the CRP

  • Useful as a prior for infinite mixture model

  • RelCRP captures influence of one relation on posts

  • Influences act simultaneously on any user

  • Aggregated influence pattern is user specific

    • Different users affected differently by same combination of world-wide and geographic factors


Multi Relational CRP


Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results


Evolving Patterns in Social Media

  • Number of Topics

    • Topics die and new ones are born

  • User Personalities

    • Susceptibility to influence by world-wide, geographic and friends’ preferences

  • Existing Topic Distributions

    • Words go out of fashion, new ones enter vocabulary

  • Topic Characters:

    • Popularity of topic changes world-wide, in users preference, sub-networks and geographies


Dynamic MultiRelCRP


User Personality Trends


Evolving Topic Distributions


Topic Character Trends


Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results


Inference and Estimation Tasks


Online Algorithm

  • Traditional iterative framework does not scale for social media data

  • Sequential Monte Carlo methods [CaniniAIStats ‘09] that rejuvenate some old labels also infeasible

  • Online sampling[Banerjee SDM ‘07] does not revisit old labels at all; initial batch phase

  • Adapt for non-parametric setting


Multi-threaded Implementation

  • Sequential online implementation does not scale

  • Iterative Gibbs sampling algorithms parallelized for hierarchical Bayesian models [Asuncion NIPS 08, Smola VLDB 10]

  • Our algorithm is parallel, online and non-parametric

  • Explicit consolidation by master thread at the end of each iteration

    • Only new topics consolidated


Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results


Datasets and Baselines

  • Twitter: 360 million tweets (Jun-Dec 2009)

  • Facebook: 300,000 posts (public profiles, 3 mths)

  • Latent Dirichlet Allocation (LDA)

    • [Hong SOMA 2010]

  • Labeled LDA (L-LDA)

    • Hashtags as topics [Ramage ICWSM 2010]

  • Timeline

    • Dynamic non-parametric topic model [Ahmed UAI 2010]


1 Model Goodness

  • Perplexity: Ability to generalize to unseen data

  • Both network and dynamics are important for modeling social media data


2 Quality of Discovered Topics

  • Label assigned to each post indicating category

  • Distribution over words indicating semantics

  • Clustering posts using topic labels

  • Prediction using topic labels

    • Predicting post authorship & user commenting activity

  • Major event detection


2A Post Clustering using Topics

  • Use hashtags as gold standard (for Twitter)

    • 16K posts #NIPS2009, #ICML2009, #bollywoodetc

  • DMRelCRP close to L-LDA without using hashtags

  • DMelCRP produces ‘finer-grained’ clusters


2B Prediction Using Topics

  • Authorship: Given post and user, predict if author

  • Commenting activity: Given post and (non-author) user, predict if user comments on that post

  • DMRelCRP topics lead to more accurate prediction


2C Major Event Detection


2C Major Event Detection


3 Analysis of Influences


3A Global Personality Trends


3A Global Personality Trends

FIFA WC

Michael

Jackson’s

death

Google Wave


3A Global Personality Trends


3B Geo-specific Personality Trends

  • Personality trends very similar in UK and US

  • Geographic influences high at different epochs


3B Geo-specific Personality Trends

  • India: W-wide and geographic influences weaker

  • China: W-wide weak, geo strong; stable pattern


3C Topic Character Trends


3C Topic Character Trends


3C Topic Character Trends


Scaling with Data Size

  • Java-based multi-threaded framework; 7 threads

  • 8-core 32 GB RAM

  • Scales largely because of multi-threading


Summary

  • First attempt at studying user influences in social media data

  • New non-parametric model that captures multiple relationships and temporal evolution

  • Multi-threaded online Gibbs sampling algorithm

  • Extensive evaluation on large real dataset

  • Topics lead to better clustering and prediction

  • Insights on user influence patterns


  • Login