Workshop on Social Computing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 52

Indrajit Bhattacharya Research Scientist IBM Research, Bangalore PowerPoint PPT Presentation


  • 51 Views
  • Uploaded on
  • Presentation posted in: General

Workshop on Social Computing IIT Kharagpur, Oct 5-6 2012. Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media *. Indrajit Bhattacharya Research Scientist IBM Research, Bangalore.

Download Presentation

Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Indrajit bhattacharya research scientist ibm research bangalore

Workshop on Social Computing

IIT Kharagpur, Oct 5-6 2012

Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media*

Indrajit Bhattacharya

Research Scientist

IBM Research, Bangalore

*Collaboration w/ Himabindu Lakkaraju & Chiranjib Bhattacharyya


Social media analysis motivation

Social Media Analysis: Motivation

  • Microblogs: Twitter, Facebook, MySpace

  • Understanding and analyzing topics & trends

  • Influences on users

  • Variety of stakeholders

    • Business

    • Government

    • Social scientists


Social media analysis challenges

Social Media Analysis: Challenges

  • Network and Influences on Users

    • User personality: Personal preferences, global and geographic trends, social circle in the network [Yang WSDM 11]

  • Dynamic nature

    • Topics & user personalities evolve over time

  • Volume of data

  • Existing approaches fall short


Soc med analysis state of the art

Soc Med Analysis: State of the Art

  • Content Analysis

    • Ramage ICWSM 2010, Hong SOMA 2010

    • Variants of LDA

  • Inferring User Interests

    • Ahmed KDD 2011, Wen KDD 2010

    • Individual features such as user activity or network

  • Patterns in Temporal Evolution

    • Yang et al WSDM 2011


Bayesian non parametric models

Bayesian Non-parametric Models

  • Choosing no of components in a mixture model

  • Particularly severe problem for large data volumes such as for social media data

  • Bayesian solution

  • Infinite dimensional prior

    • Allows no of mixture components to grow with data size

  • Cannot capture richness of social media data

  • Algorithms often not scalable


Talk outline

Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results


Talk outline1

Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results


Dirichlet process informal

Dirichlet Process (Informal)


Dirichlet process properties

Dirichlet Process: Properties


Chinese restaurant process crp

Chinese Restaurant Process (CRP)


Talk outline2

Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Parallelized Online Inference Algorithm

  • Experimental Results


Relational ch rest pr relcrp

Relational Ch. Rest. Pr. (RelCRP)

R


Relational ch rest pr relcrp1

Relational Ch. Rest. Pr. (RelCRP)


Influence of world wide factors

Influence of World-wide Factors


Influence of world wide factors1

Influence of World-wide Factors


Influence of personal preferences

Influence of Personal Preferences


Influence of personal preferences1

Influence of Personal Preferences


Influence of friend network

Influence of Friend Network


Influence of friend network1

Influence of Friend Network


Influence of geography

Influence of Geography

China

India

UK


Influence of geography1

Influence of Geography


Aggregating influences

Aggregating Influences

  • RelCRP is exchangeable like the CRP

  • Useful as a prior for infinite mixture model

  • RelCRP captures influence of one relation on posts

  • Influences act simultaneously on any user

  • Aggregated influence pattern is user specific

    • Different users affected differently by same combination of world-wide and geographic factors


Multi relational crp

Multi Relational CRP


Talk outline3

Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results


Evolving patterns in social media

Evolving Patterns in Social Media

  • Number of Topics

    • Topics die and new ones are born

  • User Personalities

    • Susceptibility to influence by world-wide, geographic and friends’ preferences

  • Existing Topic Distributions

    • Words go out of fashion, new ones enter vocabulary

  • Topic Characters:

    • Popularity of topic changes world-wide, in users preference, sub-networks and geographies


Dynamic multirelcrp

Dynamic MultiRelCRP


User personality trends

User Personality Trends


Evolving topic distributions

Evolving Topic Distributions


Topic character trends

Topic Character Trends


Talk outline4

Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results


Inference and estimation tasks

Inference and Estimation Tasks


Online algorithm

Online Algorithm

  • Traditional iterative framework does not scale for social media data

  • Sequential Monte Carlo methods [CaniniAIStats ‘09] that rejuvenate some old labels also infeasible

  • Online sampling[Banerjee SDM ‘07] does not revisit old labels at all; initial batch phase

  • Adapt for non-parametric setting


Multi threaded implementation

Multi-threaded Implementation

  • Sequential online implementation does not scale

  • Iterative Gibbs sampling algorithms parallelized for hierarchical Bayesian models [Asuncion NIPS 08, Smola VLDB 10]

  • Our algorithm is parallel, online and non-parametric

  • Explicit consolidation by master thread at the end of each iteration

    • Only new topics consolidated


Talk outline5

Talk Outline

  • Background: Chinese Restaurant Processes

  • CRP with multiple relationships: (RelCRP, MRelCRP)

  • Dynamic MRelCRP

  • Multi-threaded Online Inference Algorithm

  • Experimental Results


Datasets and baselines

Datasets and Baselines

  • Twitter: 360 million tweets (Jun-Dec 2009)

  • Facebook: 300,000 posts (public profiles, 3 mths)

  • Latent Dirichlet Allocation (LDA)

    • [Hong SOMA 2010]

  • Labeled LDA (L-LDA)

    • Hashtags as topics [Ramage ICWSM 2010]

  • Timeline

    • Dynamic non-parametric topic model [Ahmed UAI 2010]


1 model goodness

1 Model Goodness

  • Perplexity: Ability to generalize to unseen data

  • Both network and dynamics are important for modeling social media data


2 quality of discovered topics

2 Quality of Discovered Topics

  • Label assigned to each post indicating category

  • Distribution over words indicating semantics

  • Clustering posts using topic labels

  • Prediction using topic labels

    • Predicting post authorship & user commenting activity

  • Major event detection


2a post clustering using topics

2A Post Clustering using Topics

  • Use hashtags as gold standard (for Twitter)

    • 16K posts #NIPS2009, #ICML2009, #bollywoodetc

  • DMRelCRP close to L-LDA without using hashtags

  • DMelCRP produces ‘finer-grained’ clusters


2b prediction using topics

2B Prediction Using Topics

  • Authorship: Given post and user, predict if author

  • Commenting activity: Given post and (non-author) user, predict if user comments on that post

  • DMRelCRP topics lead to more accurate prediction


2c major event detection

2C Major Event Detection


2c major event detection1

2C Major Event Detection


3 analysis of influences

3 Analysis of Influences


3a global personality trends

3A Global Personality Trends


3a global personality trends1

3A Global Personality Trends

FIFA WC

Michael

Jackson’s

death

Google Wave


3a global personality trends2

3A Global Personality Trends


3b geo specific personality trends

3B Geo-specific Personality Trends

  • Personality trends very similar in UK and US

  • Geographic influences high at different epochs


3b geo specific personality trends1

3B Geo-specific Personality Trends

  • India: W-wide and geographic influences weaker

  • China: W-wide weak, geo strong; stable pattern


3c topic character trends

3C Topic Character Trends


3c topic character trends1

3C Topic Character Trends


3c topic character trends2

3C Topic Character Trends


Scaling with data size

Scaling with Data Size

  • Java-based multi-threaded framework; 7 threads

  • 8-core 32 GB RAM

  • Scales largely because of multi-threading


Summary

Summary

  • First attempt at studying user influences in social media data

  • New non-parametric model that captures multiple relationships and temporal evolution

  • Multi-threaded online Gibbs sampling algorithm

  • Extensive evaluation on large real dataset

  • Topics lead to better clustering and prediction

  • Insights on user influence patterns


  • Login