The EM Method. Arthur Pece email@example.com Basic concepts EM clustering algorithm EM method and relationship to ML estimation. What is EM?. Expectation-Maximization A fairly general optimization method Useful when the model includes 3 kinds of variables: visible variables x
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
EM clustering algorithm
EM method and relationship to ML estimation
and we want to optimize only w.r.t. the parameters.
* Here we assume that the intermediate variables are discrete
-> maximize log-likelihood w.r.t. parameters.
Assuming that the xi are statistically independent:
likelihood for the data set = sum of likelihoods for the data points:
L = Si log p(xi | s)
= Si log Skp(xi | hk,s) p (hk | s)
(replace 2nd sum with an integral if intermediate variables are continuous rather than discrete)
Given a pdf q(h) for the intermediate variables we define the EMfunctional:
Qq = SiSkq(hk) log p(xi | hk,s) p (hk | s)
This is usually much simpler than the log-likelihood:
L = Si log Skp(xi | hk,s) p (hk | s)
because there is no logarithm of a sum in Qq .
Two steps: E and M
q(t)(hk) = p(hk | xi, s(t-1))
m xn binary labels hij,
Sjhij = 1
wj = Sihij/m
cj = Sihijxi / wj
Aj = Sihij (xi - cj) (xi - cj)T / wj
wj = Siqij
cj = Siqijxi / wj
Aj = Siqij (xi - cj) (xi - cj)T / wj
The solution: iterate to convergence:
qij = wjfj (xi) / Skwkfk (xi)
This is how the cluster parameters are used to compute the qij
Let’s return to the general EM method:
we want to prove that the log-likelihood does not decrease from one iteration to the next.
To do so we introduce 2 more functionals.
Define the entropy
S(q) = -SiSkq(hk) log q(hk)
and the Kullback-Leibler divergence
DKL[q ; p(h| x, s)]
= Si Skq(hk) log [q(hk) /p(hk | xi, s)]
It can be proven that
L = Qq + S(q) + DKL[q ; p(h| x, s)]
After the E step, q(t)(h) = p(h| x, s(t-1))
and thereforeDKL is zero:
L (t-1)= Qq(t-1) + S(q (t))
After the M step, Qq is maximized in standard EM
[ Qq is increased but not maximized in GEM (generalized EM) but the result is the same ] and therefore:
In addition we have that:
L (t)sQq (t) + S(q(t))
[ This is because, for any two pdf’s q and p:
DKL[q ; p] s 0 ]
Putting the above results together:
L (t)sQq (t) + S(q(t) )
sQq (t-1) + S(q(t) ) = L (t-1)
which proves that L is non-decreasing.