1 / 95

Fast and Accurate Inference for Topic Models

Fast and Accurate Inference for Topic Models. James Foulds University of California, Santa Cruz Presented at eBay Research Labs. Motivation. There is an ever-increasing wealth of digital information available Wikipedia News articles Scientific articles Literature Debates

lirit
Download Presentation

Fast and Accurate Inference for Topic Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast and Accurate Inference for Topic Models James Foulds University of California, Santa Cruz Presented at eBay Research Labs

  2. Motivation • There is an ever-increasing wealth of digital information available • Wikipedia • News articles • Scientific articles • Literature • Debates • Blogs, social media … • We would like automatic methods to help us understand this content

  3. Motivation • Personalized recommender systems • Social network analysis • Exploratory tools for scientists • The digital humanities • …

  4. The Digital Humanities

  5. Dimensionality reduction The quick brown fox jumps over the sly lazy dog

  6. Dimensionality reduction The quick brown fox jumps over the sly lazy dog [5 6 37 1 4 30 9 22 570 12]

  7. Dimensionality reduction The quick brown fox jumps over the sly lazy dog [5 6 37 1 4 30 9 22 570 12] FoxesDogsJumping [40% 40% 20% ]

  8. Latent Variable Models Z Latent variables X Φ Parameters Observed data Data Points Dimensionality(X) >> dimensionality(Z) Z is a bottleneck, which finds a compressed, low-dimensional representationof X

  9. Latent Feature Models forSocial Networks Alice Bob Claire

  10. Latent Feature Models forSocial Networks Alice Bob Tango Salsa Cycling Fishing Running Claire Waltz Running

  11. Latent Feature Models forSocial Networks Alice Bob Tango Salsa Cycling Fishing Running Claire Waltz Running

  12. Latent Feature Models forSocial Networks Alice Bob Tango Salsa Cycling Fishing Running Claire Waltz Running

  13. Miller, Griffiths, Jordan (2009)Latent Feature Relational Model Alice Bob Tango Salsa Cycling Fishing Running Claire Waltz Running Z =

  14. Latent Representations • Binary latent feature • Latent class • Mixed membership

  15. Latent Representations • Binary latent feature • Latent class • Mixed membership

  16. Latent Representations • Binary latent feature • Latent class • Mixed membership

  17. Latent Variable ModelsAs Matrix Factorization

  18. Latent Variable ModelsAs Matrix Factorization

  19. Miller, Griffiths, Jordan (2009)Latent Feature Relational Model Alice Bob Tango Salsa Cycling Fishing Running Claire Waltz Running Z =

  20. Miller, Griffiths, Jordan (2009)Latent Feature Relational Model Alice Bob Tango Salsa Cycling Fishing Running E[Y]=(ZWZT) Claire Waltz Running Z =

  21. Topics Topic 1 Reinforcement learning Topic 2 Learning algorithms Topic 3 Character recognition Distribution over all words in dictionary A vector of discrete probabilities (sums to one)

  22. Topics Topic 1 Reinforcement learning Topic 2 Learning algorithms Topic 3 Character recognition Top 10 words

  23. Topics Topic 1 Reinforcement learning Topic 2 Learning algorithms Topic 3 Character recognition Top 10 words

  24. Latent Dirichlet Allocation(Blei et al., 2003) • For each document d • Draw its topic proportionθ(d) ~ Dirichlet(α) • For each wordwd,n • Draw a topic assignmentzd,n ~ Discrete(θ(d)) • Draw a word from the chosen topic wd,n ~ Discrete(φZd,n) φ

  25. Latent Dirichlet Allocation(Blei et al., 2003) • For each topic k • Draw its distribution over wordsφ(k) ~ Dirichlet(β) • For each wordwd,n • Draw a topic assignmentzd,n ~ Discrete(θ(d)) • Draw a word from the chosen topic wd,n ~ Discrete(φZd,n) φ

  26. Latent Dirichlet Allocation(Blei et al., 2003) • For each document d • Draw its topic proportionθ(d) ~ Dirichlet(α) • For each wordwd,n • Draw a topic assignmentzd,n ~ Discrete(θ(d)) • Draw a word from the chosen topic wd,n ~ Discrete(φZd,n) φ

  27. Latent Dirichlet Allocation(Blei et al., 2003) • For each document d • Draw its topic proportionθ(d) ~ Dirichlet(α) • For each wordwd,n • Draw a topic assignmentzd,n ~ Discrete(θ(d)) • Draw a word from the chosen topic wd,n ~ Discrete(φZd,n) φ

  28. Latent Dirichlet Allocation(Blei et al., 2003) • For each document d • Draw its topic proportionθ(d) ~ Dirichlet(α) • For each wordwd,n • Draw a topic assignmentzd,n ~ Discrete(θ(d)) • Draw a word from the chosen topic wd,n ~ Discrete(φZd,n) φ

  29. Latent Dirichlet Allocation(Blei et al., 2003) • For each document d • Draw its topic proportionθ(d) ~ Dirichlet(α) • For each wordwd,n • Draw a topic assignmentzd,n ~ Discrete(θ(d)) • Draw a word from the chosen topic wd,n ~ Discrete(φZd,n) φ

  30. Latent Dirichlet Allocation(Blei et al., 2003) • For each document d • Draw its topic proportionθ(d) ~ Dirichlet(α) • For each wordwd,n • Draw a topic assignmentzd,n ~ Discrete(θ(d)) • Draw a word from the chosen topic wd,n ~ Discrete(φZd,n) φ

  31. LDA as Matrix Factorization x θ φT

  32. Let’s say we want to build an LDAtopic model on Wikipedia

  33. LDA on Wikipedia 10 mins 1 hour 6 hours 12 hours

  34. LDA on Wikipedia 10 mins 1 hour 6 hours 12 hours

  35. LDA on Wikipedia 1 full iteration = 3.5 days! 10 mins 1 hour 6 hours 12 hours

  36. LDA on Wikipedia Stochastic variational inference Stochastic variational inference 10 mins 1 hour 6 hours 12 hours

  37. LDA on Wikipedia Stochastic collapsedvariational inference 10 mins 1 hour 6 hours 12 hours

  38. Available tools

  39. Available tools

  40. Collapsed Inference for LDAGriffiths and Steyvers (2004) • Marginalize out the parameters, and perform inference on the latent variables only Z Z

  41. Collapsed Inference for LDAGriffiths and Steyvers (2004) • Marginalize out the parameters, and perform inference on the latent variables only • Simpler, faster and fewer update equations • Better mixing for Gibbs sampling

  42. Collapsed Inference for LDAGriffiths and Steyvers (2004) • Collapsed Gibbs sampler

  43. Collapsed Inference for LDAGriffiths and Steyvers (2004) • Collapsed Gibbs sampler Word-topic counts

  44. Collapsed Inference for LDAGriffiths and Steyvers (2004) • Collapsed Gibbs sampler Document-topic counts

  45. Collapsed Inference for LDAGriffiths and Steyvers (2004) • Collapsed Gibbs sampler Topic counts

  46. Stochastic Optimization for ML Stochastic algorithms • While (not converged) • Process a subset of the dataset, to estimate the update • Update parameters

  47. Stochastic Optimization for ML • Stochastic gradient descent • Estimate the gradient • Stochastic variational inference (Hoffman et al. 2010, 2013) • Estimate the natural gradient of the variational parameters • Online EM (Cappe and Moulines, 2009) • EstimateE-step sufficient statistics

  48. Goal: Build a Fast, Accurate,Scalable Algorithm for LDA • Collapsed LDA • Easy to implement • Fast • Accurate • Mixes well / propagates information quickly • Stochastic algorithms • Scalable • Quickly forgets random initialization • Memory requirements, update time independent of size of data set • Can estimate topics before a single pass of the data is complete • Our contribution: an algorithm which gets the best of both worlds

  49. Variational Bayesian Inference • An optimization strategy for performing posterior inference, i.e. estimating Pr(Z|X) Q P

  50. Variational Bayesian Inference • An optimization strategy for performing posterior inference, i.e. estimating Pr(Z|X) Q KL(Q || P) P

More Related