slide1
Download
Skip this Video
Download Presentation
Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Loading in 2 Seconds...

play fullscreen
1 / 52

Indrajit Bhattacharya Research Scientist IBM Research, Bangalore - PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on

Workshop on Social Computing IIT Kharagpur, Oct 5-6 2012. Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media *. Indrajit Bhattacharya Research Scientist IBM Research, Bangalore.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Indrajit Bhattacharya Research Scientist IBM Research, Bangalore' - driscoll-york


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Workshop on Social Computing

IIT Kharagpur, Oct 5-6 2012

Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media*

Indrajit Bhattacharya

Research Scientist

IBM Research, Bangalore

*Collaboration w/ Himabindu Lakkaraju & Chiranjib Bhattacharyya

social media analysis motivation
Social Media Analysis: Motivation
  • Microblogs: Twitter, Facebook, MySpace
  • Understanding and analyzing topics & trends
  • Influences on users
  • Variety of stakeholders
    • Business
    • Government
    • Social scientists
social media analysis challenges
Social Media Analysis: Challenges
  • Network and Influences on Users
    • User personality: Personal preferences, global and geographic trends, social circle in the network [Yang WSDM 11]
  • Dynamic nature
    • Topics & user personalities evolve over time
  • Volume of data
  • Existing approaches fall short
soc med analysis state of the art
Soc Med Analysis: State of the Art
  • Content Analysis
    • Ramage ICWSM 2010, Hong SOMA 2010
    • Variants of LDA
  • Inferring User Interests
    • Ahmed KDD 2011, Wen KDD 2010
    • Individual features such as user activity or network
  • Patterns in Temporal Evolution
    • Yang et al WSDM 2011
bayesian non parametric models
Bayesian Non-parametric Models
  • Choosing no of components in a mixture model
  • Particularly severe problem for large data volumes such as for social media data
  • Bayesian solution
  • Infinite dimensional prior
    • Allows no of mixture components to grow with data size
  • Cannot capture richness of social media data
  • Algorithms often not scalable
talk outline
Talk Outline
  • Background: Chinese Restaurant Processes
  • CRP with multiple relationships: (RelCRP, MRelCRP)
  • Dynamic MRelCRP
  • Multi-threaded Online Inference Algorithm
  • Experimental Results
talk outline1
Talk Outline
  • Background: Chinese Restaurant Processes
  • CRP with multiple relationships: (RelCRP, MRelCRP)
  • Dynamic MRelCRP
  • Multi-threaded Online Inference Algorithm
  • Experimental Results
talk outline2
Talk Outline
  • Background: Chinese Restaurant Processes
  • CRP with multiple relationships: (RelCRP, MRelCRP)
  • Dynamic MRelCRP
  • Parallelized Online Inference Algorithm
  • Experimental Results
aggregating influences
Aggregating Influences
  • RelCRP is exchangeable like the CRP
  • Useful as a prior for infinite mixture model
  • RelCRP captures influence of one relation on posts
  • Influences act simultaneously on any user
  • Aggregated influence pattern is user specific
    • Different users affected differently by same combination of world-wide and geographic factors
talk outline3
Talk Outline
  • Background: Chinese Restaurant Processes
  • CRP with multiple relationships: (RelCRP, MRelCRP)
  • Dynamic MRelCRP
  • Multi-threaded Online Inference Algorithm
  • Experimental Results
evolving patterns in social media
Evolving Patterns in Social Media
  • Number of Topics
    • Topics die and new ones are born
  • User Personalities
    • Susceptibility to influence by world-wide, geographic and friends’ preferences
  • Existing Topic Distributions
    • Words go out of fashion, new ones enter vocabulary
  • Topic Characters:
    • Popularity of topic changes world-wide, in users preference, sub-networks and geographies
talk outline4
Talk Outline
  • Background: Chinese Restaurant Processes
  • CRP with multiple relationships: (RelCRP, MRelCRP)
  • Dynamic MRelCRP
  • Multi-threaded Online Inference Algorithm
  • Experimental Results
online algorithm
Online Algorithm
  • Traditional iterative framework does not scale for social media data
  • Sequential Monte Carlo methods [CaniniAIStats ‘09] that rejuvenate some old labels also infeasible
  • Online sampling[Banerjee SDM ‘07] does not revisit old labels at all; initial batch phase
  • Adapt for non-parametric setting
multi threaded implementation
Multi-threaded Implementation
  • Sequential online implementation does not scale
  • Iterative Gibbs sampling algorithms parallelized for hierarchical Bayesian models [Asuncion NIPS 08, Smola VLDB 10]
  • Our algorithm is parallel, online and non-parametric
  • Explicit consolidation by master thread at the end of each iteration
    • Only new topics consolidated
talk outline5
Talk Outline
  • Background: Chinese Restaurant Processes
  • CRP with multiple relationships: (RelCRP, MRelCRP)
  • Dynamic MRelCRP
  • Multi-threaded Online Inference Algorithm
  • Experimental Results
datasets and baselines
Datasets and Baselines
  • Twitter: 360 million tweets (Jun-Dec 2009)
  • Facebook: 300,000 posts (public profiles, 3 mths)
  • Latent Dirichlet Allocation (LDA)
    • [Hong SOMA 2010]
  • Labeled LDA (L-LDA)
    • Hashtags as topics [Ramage ICWSM 2010]
  • Timeline
    • Dynamic non-parametric topic model [Ahmed UAI 2010]
1 model goodness
1 Model Goodness
  • Perplexity: Ability to generalize to unseen data
  • Both network and dynamics are important for modeling social media data
2 quality of discovered topics
2 Quality of Discovered Topics
  • Label assigned to each post indicating category
  • Distribution over words indicating semantics
  • Clustering posts using topic labels
  • Prediction using topic labels
    • Predicting post authorship & user commenting activity
  • Major event detection
2a post clustering using topics
2A Post Clustering using Topics
  • Use hashtags as gold standard (for Twitter)
    • 16K posts #NIPS2009, #ICML2009, #bollywoodetc
  • DMRelCRP close to L-LDA without using hashtags
  • DMelCRP produces ‘finer-grained’ clusters
2b prediction using topics
2B Prediction Using Topics
  • Authorship: Given post and user, predict if author
  • Commenting activity: Given post and (non-author) user, predict if user comments on that post
  • DMRelCRP topics lead to more accurate prediction
3a global personality trends1
3A Global Personality Trends

FIFA WC

Michael

Jackson’s

death

Google Wave

3b geo specific personality trends
3B Geo-specific Personality Trends
  • Personality trends very similar in UK and US
  • Geographic influences high at different epochs
3b geo specific personality trends1
3B Geo-specific Personality Trends
  • India: W-wide and geographic influences weaker
  • China: W-wide weak, geo strong; stable pattern
scaling with data size
Scaling with Data Size
  • Java-based multi-threaded framework; 7 threads
  • 8-core 32 GB RAM
  • Scales largely because of multi-threading
summary
Summary
  • First attempt at studying user influences in social media data
  • New non-parametric model that captures multiple relationships and temporal evolution
  • Multi-threaded online Gibbs sampling algorithm
  • Extensive evaluation on large real dataset
  • Topics lead to better clustering and prediction
  • Insights on user influence patterns
ad