latent dirichlet allocation l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Latent Dirichlet Allocation PowerPoint Presentation
Download Presentation
Latent Dirichlet Allocation

Loading in 2 Seconds...

play fullscreen
1 / 32

Latent Dirichlet Allocation - PowerPoint PPT Presentation


  • 774 Views
  • Uploaded on

Latent Dirichlet Allocation. Presenter: Hsuan-Sheng Chiu. Reference. D. M. Blei, A. Y. Ng and M. I. Jordan, “ Latent Dirichlet allocation ” , Journal of Machine Learning Research, vol. 3, no. 5, pp. 993-1022, 2003. Outline. Introduction Notation and terminology Latent Dirichlet allocation

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Latent Dirichlet Allocation' - albert


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
latent dirichlet allocation

Latent Dirichlet Allocation

Presenter: Hsuan-Sheng Chiu

reference
Reference
  • D. M. Blei, A. Y. Ng and M. I. Jordan, “Latent Dirichlet allocation”, Journal of Machine Learning Research, vol. 3, no. 5, pp. 993-1022, 2003.

Speech Lab. NTNU

outline
Outline
  • Introduction
  • Notation and terminology
  • Latent Dirichlet allocation
  • Relationship with other latent variable models
  • Inference and parameter estimation
  • Discussion

Speech Lab. NTNU

introduction
Introduction
  • We consider with the problem of modeling text corpora and other collections of discrete data
    • To find short description of the members a collection
  • Significant process in IR
    • tf-idf scheme (Salton and McGill, 1983)
    • Latent Semantic Indexing (LSI, Deerwester et al., 1990)
    • Probabilistic LSI (pLSI, aspect model, Hofmann, 1999)

Speech Lab. NTNU

introduction cont
Introduction (cont.)
  • Problem of pLSI:
    • Incomplete: Provide no probabilistic model at the level of documents
    • The number of parameters in the model grows linear with the size of the corpus
    • It is not clear how to assign probability to a document outside of the training data
  • Exchangeability: bag of words

Speech Lab. NTNU

notation and terminology
Notation and terminology
  • A word is the basic unit of discrete data ,from vocabulary indexed by {1,…,V}. The vth word is represented by a V-vector w such that wv = 1 and wu = 0 for u≠v
  • A document is a sequence of N words denote by w = (w1,w2,…,wN)
  • A corpus is a collection of M documents denoted by D = {w1,w2,…,wM}

Speech Lab. NTNU

latent dirichlet allocation7
Latent Dirichlet allocation
  • Latent Dirichlet allocation (LDA) is a generative probabilistic model of a corpus.
  • Generative process for each document w in a corpus D:
    • 1. Choose N ~ Poisson(ξ)
    • 2. Choose θ ~ Dir(α)
    • 3. For each of the N words wn
      • Choose a topic zn ~ Multinomial(θ)
      • Choose a word wn from p(wn|zn, β), a multinomial probability conditioned on the topic zn

βij is a k×V matrix = p(wj = 1| zi = 1)

Speech Lab. NTNU

latent dirichlet allocation cont
Latent Dirichlet allocation (cont.)
  • Representation of a document generation:

θ~ Dir(α) → {z1,z2,…,zk}

β(z) →{w1,w2,…,wn}

w

N ~ Poisson

Speech Lab. NTNU

latent dirichlet allocation cont9
Latent Dirichlet allocation (cont.)
  • Several simplifying assumptions:
    • 1. The dimensionality k of Dirichlet distribution is known and fixed
    • 2. The word probabilities β is fixed quantity that is to be estimated
    • 3. Document length N is independent of all the other data generating variable θ and z
  • A k-dimensional Dirichlet random variable θ can take values in the (k-1)-simplex

http://www.answers.com/topic/dirichlet-distribution

Speech Lab. NTNU

latent dirichlet allocation cont10
Latent Dirichlet allocation (cont.)
  • Simplex:

The above figures show the graphs for the n-simplexes with n =2 to 7.

(from mathworld, http://mathworld.wolfram.com/Simplex.html)

Speech Lab. NTNU

latent dirichlet allocation cont11
Latent Dirichlet allocation (cont.)
  • The joint distribution of a topic θ, and a set of N topic z, and a set of N words w:
  • Marginal distribution of a document:
  • Probability of a corpus:

Speech Lab. NTNU

latent dirichlet allocation cont12

document

corpus

Latent Dirichlet allocation (cont.)
  • There are three levels to LDA representation
    • αβ are corpus-level parameters
    • θd are document-level variables
    • zdn, wdn are word-level variables

Refer to as hierarchical models, conditionally independent

hierarchical models and parametric empirical Bayes models

Speech Lab. NTNU

latent dirichlet allocation cont13
Latent Dirichlet allocation (cont.)
  • LDA and exchangeability
    • A finite set of random variables {z1,…,zN} is said exchangeable if the joint distribution is invariant to permutation
    • A infinite sequence of random variables is infinitely exchangeable if every finite subsequence is exchangeable
    • http://en.wikipedia.org/wiki/De_Finetti's_theorem

Speech Lab. NTNU

latent dirichlet allocation cont14
Latent Dirichlet allocation (cont.)
  • In LDA, we assume that words are generated by topics ad that those topics are infinitely exchangeable within a document

Speech Lab. NTNU

latent dirichlet allocation cont15
Latent Dirichlet allocation (cont.)
  • A continuous mixture of unigrams
    • By marginalizing over the hidden topic variable z, we can understand LDA as a two-level model
  • Generative process for a document w
    • 1. choose θ~ Dir(α)
    • 2. For each of the N word wn
      • Choose a word wn from p(wn|θ, β)
    • Marginal distribution od a document

Speech Lab. NTNU

latent dirichlet allocation cont16
Latent Dirichlet allocation (cont.)
  • The distribution on the (V-1)-simplex is attained with only k+kV parameters.

Speech Lab. NTNU

relationship with other latent variable models
Relationship with other latent variable models
  • Unigram model
  • Mixture of unigrams
    • Each document is generated by first choosing a topic z and then generating N words independently form conditional multinomial
    • k-1 parameters

Speech Lab. NTNU

relationship with other latent variable models cont
Relationship with other latent variable models (cont.)
  • Probabilistic latent semantic indexing
    • Attempt to relax the simplifying assumption made in the mixture of unigrams models
    • In a sense, it does capture the possibility that a document may contain multiple topics
    • kv+kM parameters and linear growth in M

Speech Lab. NTNU

relationship with other latent variable models cont19
Relationship with other latent variable models (cont.)
  • Problem of PLSI
    • There is no natural way to use it to assign probability to a previously unseen document
    • The linear growth in parameters suggests that the model is prone to overfitting and empirically , overfitting is indeed a serious problem
  • LDA overcomes both of there problems by treating the topic mixture weights as a k-parameter hidden random variable
  • The k+kV parameters in a k-topic LDA model do not grow with the size of the training corpus.

Speech Lab. NTNU

relationship with other latent variable models cont20
Relationship with other latent variable models (cont.)
  • A geometric interpretation: three topics and three words

Speech Lab. NTNU

relationship with other latent variable models cont21
Relationship with other latent variable models (cont.)
  • The unigram model find a single point on the word simplex and posits that all word in the corpus come from the corresponding distribution.
  • The mixture of unigram models posits that for each documents, one of the k points on the word simplex is chosen randomly and all the words of the document are drawn from the distribution
  • The pLSI model posits that each word of a training documents comes from a randomly chosen topic. The topics are themselves drawn from a document-specific distribution over topics.
  • LDA posits that each word of both the observed and unseen documents is generated by a randomly chosen topic which is drawn from a distribution with a randomly chosen parameter

Speech Lab. NTNU

inference and parameter estimation
Inference and parameter estimation
  • The key inferential problem is that of computing the posteriori distribution of the hidden variable given a document

Unfortunately, this distribution is intractable to compute in general.

A function which is intractable due to the coupling between

θ and β in the summation over latent topics

Speech Lab. NTNU

inference and parameter estimation cont
Inference and parameter estimation (cont.)
  • The basic idea of convexity-based variational inference is to make use of Jensen’s inequality to obtain an adjustable lower bound on the log likelihood.
  • Essentially, one considers a family of lower bounds, indexed by a set of variational parameters.
  • A simple way to obtain a tractable family of lower bound is to consider simple modifications of the original graph model in which some fo the edges and nodes are removed.

Speech Lab. NTNU

inference and parameter estimation cont24
Inference and parameter estimation (cont.)
  • Drop some edges and the w nodes

Speech Lab. NTNU

inference and parameter estimation cont25
Inference and parameter estimation (cont.)
  • Variational distribution:

Speech Lab. NTNU

inference and parameter estimation cont26
Inference and parameter estimation (cont.)
  • Finding a tight lower bound on the log likelihood
  • Maximizing the lower bound with respect to γand φ is equivalent to minimizing the KL divergence between the variational posterior probability and the true posterior probability

Speech Lab. NTNU

inference and parameter estimation cont27
Inference and parameter estimation (cont.)
  • Expand the lower bound:

Speech Lab. NTNU

inference and parameter estimation cont29
Inference and parameter estimation (cont.)
  • We can get variational parameters by adding Lagrange multipliers and setting this derivative to zero:

Speech Lab. NTNU

inference and parameter estimation cont30
Inference and parameter estimation (cont.)
  • Parameter estimation
    • Maximize log likelihood of the data:
    • Variational inference provide us with a tractable lower bound on the log likelihood, a bound which we can maximize with respect α and β
  • Variational EM procedure
    • 1. (E-step) for each document, find the optimizing values of the variational parameters {γ, φ}
    • 2. maximize the result lower bound on the log likelihood with respect to the model parameters α and β

Speech Lab. NTNU

inference and parameter estimation cont31
Inference and parameter estimation (cont.)
  • Smoothed LDA model:

Speech Lab. NTNU

discussion
Discussion
  • LDA is a flexible generative probabilistic model for collection of discrete data.
  • Exact inference is intractable for LDA, but any or a large suite of approximate inference algorithms for inference and parameter estimation can be used with the LDA framework.
  • LDA is a simple model and is readily extended to continuous data or other non-multinomial data.

Speech Lab. NTNU