1 / 4

Lecture 11

Lecture 11. Generalizations of EM. Last Time. Example of Gaussian mixture model. E-step: compute sufficient statistics w.r.t. posterior M-step: maximize Q. MoG_demo. Generalizations. Map-EM: include prior for parameters. EM computes maximum a-posteriori distribution.

braden
Download Presentation

Lecture 11

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 11 Generalizations of EM

  2. Last Time • Example of Gaussian mixture model. • E-step: compute sufficient statistics w.r.t. posterior • M-step: maximize Q. • MoG_demo

  3. Generalizations • Map-EM: include prior for parameters. EM computes maximum a-posteriori distribution. • By interchanging the role of X and the parameters we can also compute the maximum likely configuration for P(x). • “Generalized EM” (GEM) we only need to do partial M-steps. • We can apply EM to maximize positive functions of a special form. • We can do partial E-steps as well !

  4. Variational EM (VEM) • EM can be viewed as coordinate ascent on Q(theta,q), where q(y) is a parameterized family of distributions. • Optimal value for q=p(y|x,theta). • But, we don’t even have to be able to include that optimal solution in the allowed family. In this case we maximize a bound on the log-likelihood which still makes sense. • This approximate EM algorithm can be very helpful in making an intractable E-step tractable (at the expense of accuracy). • A simple example is k-means, where we choose q(y) to be a delta peak at a certain mean.

More Related