expectation maximization algorithm l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Expectation Maximization Algorithm PowerPoint Presentation
Download Presentation
Expectation Maximization Algorithm

Loading in 2 Seconds...

play fullscreen
1 / 61

Expectation Maximization Algorithm - PowerPoint PPT Presentation


  • 418 Views
  • Uploaded on

Expectation Maximization Algorithm. Rong Jin. A Mixture Model Problem. Apparently, the dataset consists of two modes How can we automatically identify the two modes?. Gaussian Mixture Model (GMM). Assume that the dataset is generated by two mixed Gaussian distributions Gaussian model 1:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Expectation Maximization Algorithm' - issac


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a mixture model problem
A Mixture Model Problem
  • Apparently, the dataset consists of two modes
  • How can we automatically identify the two modes?
gaussian mixture model gmm
Gaussian Mixture Model (GMM)
  • Assume that the dataset is generated by two mixed Gaussian distributions
    • Gaussian model 1:
    • Gaussian model 2:
  • If we know the memberships for each bin, estimating the two Gaussian models is easy.
  • How to estimate the two Gaussian models without knowing the memberships of bins?
em algorithm for gmm
EM Algorithm for GMM
  • Let memberships to be hidden variables
  • EM algorithm for Gaussian mixture model
    • Unknown memberships:
    • Unknown Gaussian models:
    • Learn these two sets of parameters iteratively
start with a random guess
Start with A Random Guess
  • Random assign the memberships to each bin
start with a random guess6
Start with A Random Guess
  • Random assign the memberships to each bin
  • Estimate the means and variance of each Gaussian model
e step
E-step
  • Fixed the two Gaussian models
  • Estimate the posterior for each data point
em algorithm for gmm8
EM Algorithm for GMM
  • Re-estimate the memberships for each bin
m step

Weighted by posteriors

Weighted by posteriors

M-Step
  • Fixed the memberships
  • Re-estimate the two model Gaussian
em algorithm for gmm10
EM Algorithm for GMM
  • Re-estimate the memberships for each bin
  • Re-estimate the models
at the 5 th iteration
At the 5-th Iteration
  • Red Gaussian component slowly shifts toward the left end of the x axis
at the10 th iteration
At the10-th Iteration
  • Red Gaussian component still slowly shifts toward the left end of the x axis
at the 20 th iteration
At the 20-th Iteration
  • Red Gaussian component make more noticeable shift toward the left end of the x axis
at the 50 th iteration
At the 50-th Iteration
  • Red Gaussian component is close to the desirable location
at the 100 th iteration
At the 100-th Iteration
  • The results are almost identical to the ones for the 50-th iteration
em as a bound optimization
EM as A Bound Optimization
  • EM algorithm in fact maximizes the log-likelihood function of training data
  • Likelihood for a data point x
  • Log-likelihood of training data
em as a bound optimization17
EM as A Bound Optimization
  • EM algorithm in fact maximizes the log-likelihood function of training data
  • Likelihood for a data point x
  • Log-likelihood of training data
em as a bound optimization18
EM as A Bound Optimization
  • EM algorithm in fact maximizes the log-likelihood function of training data
  • Likelihood for a data point x
  • Log-likelihood of training data
logarithm bound algorithm
Logarithm Bound Algorithm
  • Start with initial guess
logarithm bound algorithm20
Logarithm Bound Algorithm

Touch Point

  • Start with initial guess
  • Come up with a lower bounded
logarithm bound algorithm21
Logarithm Bound Algorithm
  • Start with initial guess
  • Come up with a lower bounded
  • Search the optimal solution that maximizes
logarithm bound algorithm22
Logarithm Bound Algorithm
  • Start with initial guess
  • Come up with a lower bounded
  • Search the optimal solution that maximizes
  • Repeat the procedure
logarithm bound algorithm23
Logarithm Bound Algorithm

Optimal Point

  • Start with initial guess
  • Come up with a lower bounded
  • Search the optimal solution that maximizes
  • Repeat the procedure
  • Converge to the local optimal
em as a bound optimization24
EM as A Bound Optimization
  • Parameter for previous iteration:
  • Parameter for current iteration:
  • Compute
maximize gmm model
Maximize GMM Model
  • What is the global optimal solution to GMM?
  • Maximizing the objective function of GMM is ill-posed problem
maximize gmm model31
Maximize GMM Model
  • What is the global optimal solution to GMM?
  • Maximizing the objective function of GMM is ill-posed problem
identify hidden variables
Identify Hidden Variables
  • For certain learning problems, identifying hidden variables is not a easy task
  • Consider a simple translation model
    • For a pair of English and Chinese sentences:
    • A simple translation model is
    • The log-likelihood of training corpus
identify hidden variables33
Identify Hidden Variables
  • Consider a simple case
  • Alignment variable a(i)
  • Rewrite
identify hidden variables34
Identify Hidden Variables
  • Consider a simple case
  • Alignment variable a(i)
  • Rewrite
identify hidden variables35
Identify Hidden Variables
  • Consider a simple case
  • Alignment variable a(i)
  • Rewrite
identify hidden variables36
Identify Hidden Variables
  • Consider a simple case
  • Alignment variable a(i)
  • Rewrite
em algorithm for a translation model
EM Algorithm for A Translation Model
  • Introduce an alignment variable for each translation pair
  • EM algorithm for the translation model
    • E-step: compute the posterior for each alignment variable
    • M-step: estimate the translation probability Pr(e|c)
em algorithm for a translation model38
EM Algorithm for A Translation Model
  • Introduce an alignment variable for each translation pair
  • EM algorithm for the translation model
    • E-step: compute the posterior for each alignment variable
    • M-step: estimate the translation probability Pr(e|c)

We are luck here. In general, this step can be extremely difficult and usually requires approximate approaches

compute pr e c
Compute Pr(e|c)
  • First compute
compute pr e c40
Compute Pr(e|c)
  • First compute
iterative scaling
Iterative Scaling
  • Maximum entropy model
  • Iterative scaling
    • All features
    • Sum of features are constant
iterative scaling44
Iterative Scaling
  • Compute the empirical mean for each feature of every class, i.e., for every j and every class y
  • Start w1,w2 …, wc = 0
  • Repeat
    • Compute p(y|x) for each training data point (xi, yi) using w from the previous iteration
    • Compute the mean of each feature of every class using the estimated probabilities, i.e., for every j and every y
    • Compute for every j and every y
    • Update w as
iterative scaling46
Iterative Scaling

Can we use the concave property of logarithm function?

No, we can’t because we need a lower bound

iterative scaling49
Iterative Scaling

Wait a minute, this can not be right! What happens?

logarithm bound algorithm50
Logarithm Bound Algorithm
  • Start with initial guess
  • Come up with a lower bounded
  • Search the optimal solution that maximizes
iterative scaling51
Iterative Scaling

Where does it go wrong?

iterative scaling56
Iterative Scaling
  • How about ?

Is this solution unique?

iterative scaling57
Iterative Scaling
  • How about negative features?
faster iterative scaling

Univariate functions!

Faster Iterative Scaling
  • The lower bound may not be tight given all the coupling between weights is removed
  • A tighter bound can be derived by not fully decoupling the correlation between weights
bad news
Bad News
  • You may feel great after the struggle of the derivation.
  • However, is iterative scaling a true great idea?
  • Given there have been so many studies in optimization, we should try out existing methods.
comparing improved iterative scaling to newton s method
Comparing Improved Iterative Scaling to Newton’s Method

Try out the standard numerical methods before you get excited about your algorithm

Limited-memory Quasi-Newton method

Improved iterative scaling