1 / 52

Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

Workshop on Social Computing IIT Kharagpur, Oct 5-6 2012. Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media *. Indrajit Bhattacharya Research Scientist IBM Research, Bangalore.

Download Presentation

Indrajit Bhattacharya Research Scientist IBM Research, Bangalore

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Workshop on Social Computing IIT Kharagpur, Oct 5-6 2012 Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research, Bangalore *Collaboration w/ Himabindu Lakkaraju & Chiranjib Bhattacharyya

  2. Social Media Analysis: Motivation • Microblogs: Twitter, Facebook, MySpace • Understanding and analyzing topics & trends • Influences on users • Variety of stakeholders • Business • Government • Social scientists

  3. Social Media Analysis: Challenges • Network and Influences on Users • User personality: Personal preferences, global and geographic trends, social circle in the network [Yang WSDM 11] • Dynamic nature • Topics & user personalities evolve over time • Volume of data • Existing approaches fall short

  4. Soc Med Analysis: State of the Art • Content Analysis • Ramage ICWSM 2010, Hong SOMA 2010 • Variants of LDA • Inferring User Interests • Ahmed KDD 2011, Wen KDD 2010 • Individual features such as user activity or network • Patterns in Temporal Evolution • Yang et al WSDM 2011

  5. Bayesian Non-parametric Models • Choosing no of components in a mixture model • Particularly severe problem for large data volumes such as for social media data • Bayesian solution • Infinite dimensional prior • Allows no of mixture components to grow with data size • Cannot capture richness of social media data • Algorithms often not scalable

  6. Talk Outline • Background: Chinese Restaurant Processes • CRP with multiple relationships: (RelCRP, MRelCRP) • Dynamic MRelCRP • Multi-threaded Online Inference Algorithm • Experimental Results

  7. Talk Outline • Background: Chinese Restaurant Processes • CRP with multiple relationships: (RelCRP, MRelCRP) • Dynamic MRelCRP • Multi-threaded Online Inference Algorithm • Experimental Results

  8. Dirichlet Process (Informal)

  9. Dirichlet Process: Properties

  10. Chinese Restaurant Process (CRP)

  11. Talk Outline • Background: Chinese Restaurant Processes • CRP with multiple relationships: (RelCRP, MRelCRP) • Dynamic MRelCRP • Parallelized Online Inference Algorithm • Experimental Results

  12. Relational Ch. Rest. Pr. (RelCRP) R

  13. Relational Ch. Rest. Pr. (RelCRP)

  14. Influence of World-wide Factors

  15. Influence of World-wide Factors

  16. Influence of Personal Preferences

  17. Influence of Personal Preferences

  18. Influence of Friend Network

  19. Influence of Friend Network

  20. Influence of Geography China India UK

  21. Influence of Geography

  22. Aggregating Influences • RelCRP is exchangeable like the CRP • Useful as a prior for infinite mixture model • RelCRP captures influence of one relation on posts • Influences act simultaneously on any user • Aggregated influence pattern is user specific • Different users affected differently by same combination of world-wide and geographic factors

  23. Multi Relational CRP

  24. Talk Outline • Background: Chinese Restaurant Processes • CRP with multiple relationships: (RelCRP, MRelCRP) • Dynamic MRelCRP • Multi-threaded Online Inference Algorithm • Experimental Results

  25. Evolving Patterns in Social Media • Number of Topics • Topics die and new ones are born • User Personalities • Susceptibility to influence by world-wide, geographic and friends’ preferences • Existing Topic Distributions • Words go out of fashion, new ones enter vocabulary • Topic Characters: • Popularity of topic changes world-wide, in users preference, sub-networks and geographies

  26. Dynamic MultiRelCRP

  27. User Personality Trends

  28. Evolving Topic Distributions

  29. Topic Character Trends

  30. Talk Outline • Background: Chinese Restaurant Processes • CRP with multiple relationships: (RelCRP, MRelCRP) • Dynamic MRelCRP • Multi-threaded Online Inference Algorithm • Experimental Results

  31. Inference and Estimation Tasks

  32. Online Algorithm • Traditional iterative framework does not scale for social media data • Sequential Monte Carlo methods [CaniniAIStats ‘09] that rejuvenate some old labels also infeasible • Online sampling[Banerjee SDM ‘07] does not revisit old labels at all; initial batch phase • Adapt for non-parametric setting

  33. Multi-threaded Implementation • Sequential online implementation does not scale • Iterative Gibbs sampling algorithms parallelized for hierarchical Bayesian models [Asuncion NIPS 08, Smola VLDB 10] • Our algorithm is parallel, online and non-parametric • Explicit consolidation by master thread at the end of each iteration • Only new topics consolidated

  34. Talk Outline • Background: Chinese Restaurant Processes • CRP with multiple relationships: (RelCRP, MRelCRP) • Dynamic MRelCRP • Multi-threaded Online Inference Algorithm • Experimental Results

  35. Datasets and Baselines • Twitter: 360 million tweets (Jun-Dec 2009) • Facebook: 300,000 posts (public profiles, 3 mths) • Latent Dirichlet Allocation (LDA) • [Hong SOMA 2010] • Labeled LDA (L-LDA) • Hashtags as topics [Ramage ICWSM 2010] • Timeline • Dynamic non-parametric topic model [Ahmed UAI 2010]

  36. 1 Model Goodness • Perplexity: Ability to generalize to unseen data • Both network and dynamics are important for modeling social media data

  37. 2 Quality of Discovered Topics • Label assigned to each post indicating category • Distribution over words indicating semantics • Clustering posts using topic labels • Prediction using topic labels • Predicting post authorship & user commenting activity • Major event detection

  38. 2A Post Clustering using Topics • Use hashtags as gold standard (for Twitter) • 16K posts #NIPS2009, #ICML2009, #bollywoodetc • DMRelCRP close to L-LDA without using hashtags • DMelCRP produces ‘finer-grained’ clusters

  39. 2B Prediction Using Topics • Authorship: Given post and user, predict if author • Commenting activity: Given post and (non-author) user, predict if user comments on that post • DMRelCRP topics lead to more accurate prediction

  40. 2C Major Event Detection

  41. 2C Major Event Detection

  42. 3 Analysis of Influences

  43. 3A Global Personality Trends

  44. 3A Global Personality Trends FIFA WC Michael Jackson’s death Google Wave

  45. 3A Global Personality Trends

  46. 3B Geo-specific Personality Trends • Personality trends very similar in UK and US • Geographic influences high at different epochs

  47. 3B Geo-specific Personality Trends • India: W-wide and geographic influences weaker • China: W-wide weak, geo strong; stable pattern

  48. 3C Topic Character Trends

  49. 3C Topic Character Trends

  50. 3C Topic Character Trends

More Related