the em method n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The EM Method PowerPoint Presentation
Download Presentation
The EM Method

Loading in 2 Seconds...

play fullscreen
1 / 17

The EM Method - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

The EM Method. Arthur Pece aecp@diku.dk Basic concepts EM clustering algorithm EM method and relationship to ML estimation. What is EM?. Expectation-Maximization A fairly general optimization method Useful when the model includes 3 kinds of variables: visible variables x

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The EM Method' - nairi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the em method

The EM Method

Arthur Pece

aecp@diku.dk

Basic concepts

EM clustering algorithm

EM method and relationship to ML estimation

what is em
What is EM?
  • Expectation-Maximization
  • A fairly general optimization method
  • Useful when the model includes 3 kinds of variables:
  • visible variables x
  • intermediate variables h *
  • parameters/state variables s

and we want to optimize only w.r.t. the parameters.

* Here we assume that the intermediate variables are discrete

em method
EM Method
  • A method to obtain ML parameter estimates

-> maximize log-likelihood w.r.t. parameters.

Assuming that the xi are statistically independent:

likelihood for the data set = sum of likelihoods for the data points:

L = Si log p(xi | s)

= Si log Skp(xi | hk,s) p (hk | s)

(replace 2nd sum with an integral if intermediate variables are continuous rather than discrete)

em functional
EM functional

Given a pdf q(h) for the intermediate variables we define the EMfunctional:

Qq = SiSkq(hk) log p(xi | hk,s) p (hk | s)

This is usually much simpler than the log-likelihood:

L = Si log Skp(xi | hk,s) p (hk | s)

because there is no logarithm of a sum in Qq .

em iteration
EM iteration

Two steps: E and M

  • E step: q(h) is set equal to the pdf of h conditional on xi and the current estimate s(t) of s:

q(t)(hk) = p(hk | xi, s(t-1))

  • M step: the EM functional is maximized w.r.t. s to obtain s(t).
example em clustering
Example: EM clustering
  • m data points xi are generated by n generative processes, each process j generating a fraction wj of the data points with pdf fj (xi), parameterized by the parameter set sj (which includes wj)
  • We want to estimate the parameters sj for all processes
example em clustering1
Example: EM clustering
  • Visible variables: m data points xi
  • Intermediate variables:

m xn binary labels hij,

Sjhij = 1

  • State variables: n parameter sets sj
em clustering for gaussian pdf s
EM clustering for Gaussian pdf’s
  • The parameters are weight wj, centroid cj, covariance Aj
  • If we knew which data point belongs to which cluster, we could compute fraction, mean and covariance for each cluster:

wj = Sihij/m

cj = Sihijxi / wj

Aj = Sihij (xi - cj) (xi - cj)T / wj

em clustering continued
EM clustering (continued)
  • Since we do not know which cluster a data point belongs to, we assign each point to all clusters, with different probabilities qij, Sjqij = 1:

wj = Siqij

cj = Siqijxi / wj

Aj = Siqij (xi - cj) (xi - cj)T / wj

em clustering continued1
EM clustering (continued)
  • The probabilities qij can be computed from the cluster parameters
  • Chicken & egg problem: the cluster parameters are needed to compute the probabilities, and the probabilities are needed to compute the cluster parameters
em clustering continued2
EM clustering (continued)

The solution: iterate to convergence:

  • E step: for each data point and each cluster, compute the probability qij that the point belongs to the cluster (from the cluster parameters)
  • M step: re-compute the cluster parameters for all clusters by weighted averages over all points (use the equations given 2 slides ago).
how to compute the probability that a given data point originates from a given process
How to compute the probability that a given data point originates from a given process?
  • Use Bayes’ theorem:

qij = wjfj (xi) / Skwkfk (xi)

This is how the cluster parameters are used to compute the qij

non decreasing log likelihood in the em method
Non-decreasing log-likelihoodin the EM method

Let’s return to the general EM method:

we want to prove that the log-likelihood does not decrease from one iteration to the next.

To do so we introduce 2 more functionals.

entropy and kullback leibler divergence
Entropy and Kullback-Leibler divergence

Define the entropy

S(q) = -SiSkq(hk) log q(hk)

and the Kullback-Leibler divergence

DKL[q ; p(h| x, s)]

= Si Skq(hk) log [q(hk) /p(hk | xi, s)]

non decreasing log likelihood i
Non-decreasing log-likelihood I

It can be proven that

L = Qq + S(q) + DKL[q ; p(h| x, s)]

After the E step, q(t)(h) = p(h| x, s(t-1))

and thereforeDKL is zero:

L (t-1)= Qq(t-1) + S(q (t))

non decreasing log likelihood ii
Non-decreasing log-likelihood II

After the M step, Qq is maximized in standard EM

[ Qq is increased but not maximized in GEM (generalized EM) but the result is the same ] and therefore:

Qq(t) sQq(t-1)

In addition we have that:

L (t)sQq (t) + S(q(t))

[ This is because, for any two pdf’s q and p:

DKL[q ; p] s 0 ]

non decreasing log likelihood iii
Non-decreasing log-likelihood III

Putting the above results together:

L (t)sQq (t) + S(q(t) )

sQq (t-1) + S(q(t) ) = L (t-1)

which proves that L is non-decreasing.