1 / 6

Opinionated

Opinionated. Lessons. in Statistics. by Bill Press. #30 Expectation Maximization (EM) Methods. X. f. ¸. I. 1. =. i. i. X. X. h. l. ¸. ¸. l. T. Q. Q. ¸. e. n. n. n. i. i. i. i. i. i. Let’s look at the theory behind EM methods:. Preliminary: Jensen’s inequality

ismael
Download Presentation

Opinionated

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Opinionated Lessons in Statistics by Bill Press #30 Expectation Maximization (EM) Methods Professor William H. Press, Department of Computer Science, the University of Texas at Austin

  2. X f ¸ I 1 = i i X X h l ¸ ¸ l T Q Q ¸ e n n n i i i i i i Let’s look at the theory behind EM methods: Preliminary: Jensen’s inequality If a function is concave (downward), then function(interpolation)  interpolation(function) Log is concave (downward). Jensen’s inequality is thus: This gets used a lot when playing with log-likelihoods. Proof of the EM method that we now give is just one example. Professor William H. Press, Department of Computer Science, the University of Texas at Austin

  3. h d t t x a r e e a a d b l i i i i t z a r e m s s n g a a o r n u s a n c e v a r a e s ( ) ( j ) µ l µ L P ´ x n µ b d d i t t t a r e p a r a m e e r s o e e e r m n e " # X ( j ) ( j ) l µ µ P P x z z n = z " # ( j ) ( j ) µ µ P P x z z X 0 0 0 ( j ) ( j ) ( ) l µ l µ µ P P L ¡ + z x x n n = 0 ( j ) µ P z x z · ¸ ( j ) ( j ) µ µ P P x z z X 0 0 ( j ) ( ) µ l µ P L ¸ + z x n 0 0 ( j ) ( j ) µ µ P P z x x z · ¸ ( j ) µ P x z X 0 0 ( j ) ( ) µ l µ P L + z x n = 0 ( j ) µ P z x z 0 0 ( ) ( ) µ µ f µ b d µ B L ´ o r a n y a o u n o n ; , The basic EM theorem (Dempster, Laird, and Rubin): Find qthat maximizes the log-likelihood of the data: marginalize over z Professor William H. Press, Department of Computer Science, the University of Texas at Austin

  4. 0 0 0 ( ) ( ) ( ) f h µ µ µ µ h µ µ µ d h h S N B L L i i i i t t t t t t t t o o c e w e m a a a x m z e w e a v o e v e r w e a r e g u a r a n e e a e n e w = = , ; , , 0 0 ( ) h µ b l l d h h µ h l l k l h d l b L T i i i i i i i t t t t t t t m s o a x e o w u n n c o r u e c a s e e s e a c u a s c a n e e o r m o n : a e o n y y c o n v e r g n g o . ( ) ( ) l l l f µ L t t a e a s a o c a m a x o (Can you see why?) And it works whether the maximization step is local or global. Professor William H. Press, Department of Computer Science, the University of Texas at Austin

  5. · ¸ ( j ) µ P x z X 0 0 0 ( j ) µ µ l P z x a r g m a x n = 0 ( j ) µ P µ z x z X 0 ( j ) [ ( j ) ] µ l µ P P z x x z a r g m a x n = µ z X 0 ( j ) [ ( j ) ( j ) ] µ l µ µ P P P z x x z z a r g m a x n = µ z So the general EM algorithm repeats the maximization: sometimes (missing data) no dependence on z sometimes (nuisance parameters) a uniform prior each form is sometimes useful computing this is the E-step This is an expectation that can often be computed in some better way than literally integrating over all possible values of z. maximizing this is the M-step This is a general way of handling missing data or nuisance parameters if you can estimate the probability of what is missing, given what you see (and a parameters guess). Professor William H. Press, Department of Computer Science, the University of Texas at Austin

  6. ( ) h f d i i i i i t t t t t t z m s s n g s e a s s g n m e n o a a p o n s o c o m p o n e n s µ f h d § i t t ¹ c o n s s s o e s a n s 0 ( j ) µ P z x p ! k n X X 1 ¡ £ ¤ 0 ( j ) [ ( j ) ] ( ) ( ) µ l µ l d § § P P t ¡ ¡ ¡ ¡ z x x x ¹ ¢ ¢ x ¹ n p n e ! k k k k k n n n k z n ; Might not be instantly obvious how GMM fits this paradigm! Showing that this is maximized by the previous re-estimation formulas for m and S is a multidimensional (fancy) form of the theorem that the mean is the measure of central tendency that minimizes the mean square deviation. See Wikipedia: Expectation-Maximization Algorithm for detailed derivation. The next time we see an EM method will be when we discuss Hidden Markov Models. The “Baum-Welch re-estimation algorithm” for HMMs is an EM method. Professor William H. Press, Department of Computer Science, the University of Texas at Austin

More Related