Opinionated

Opinionated Lessons in Statistics by Bill Press #30 Expectation Maximization (EM) Methods Professor William H. Press, Department of Computer Science, the University of Texas at Austin

X f ¸ I 1 = i i X X h l ¸ ¸ l T Q Q ¸ e n n n i i i i i i Let’s look at the theory behind EM methods: Preliminary: Jensen’s inequality If a function is concave (downward), then function(interpolation)  interpolation(function) Log is concave (downward). Jensen’s inequality is thus: This gets used a lot when playing with log-likelihoods. Proof of the EM method that we now give is just one example. Professor William H. Press, Department of Computer Science, the University of Texas at Austin

h d t t x a r e e a a d b l i i i i t z a r e m s s n g a a o r n u s a n c e v a r a e s ( ) ( j ) µ l µ L P ´ x n µ b d d i t t t a r e p a r a m e e r s o e e e r m n e " # X ( j ) ( j ) l µ µ P P x z z n = z " # ( j ) ( j ) µ µ P P x z z X 0 0 0 ( j ) ( j ) ( ) l µ l µ µ P P L ¡ + z x x n n = 0 ( j ) µ P z x z · ¸ ( j ) ( j ) µ µ P P x z z X 0 0 ( j ) ( ) µ l µ P L ¸ + z x n 0 0 ( j ) ( j ) µ µ P P z x x z · ¸ ( j ) µ P x z X 0 0 ( j ) ( ) µ l µ P L + z x n = 0 ( j ) µ P z x z 0 0 ( ) ( ) µ µ f µ b d µ B L ´ o r a n y a o u n o n ; , The basic EM theorem (Dempster, Laird, and Rubin): Find qthat maximizes the log-likelihood of the data: marginalize over z Professor William H. Press, Department of Computer Science, the University of Texas at Austin

0 0 0 ( ) ( ) ( ) f h µ µ µ µ h µ µ µ d h h S N B L L i i i i t t t t t t t t o o c e w e m a a a x m z e w e a v o e v e r w e a r e g u a r a n e e a e n e w = = , ; , , 0 0 ( ) h µ b l l d h h µ h l l k l h d l b L T i i i i i i i t t t t t t t m s o a x e o w u n n c o r u e c a s e e s e a c u a s c a n e e o r m o n : a e o n y y c o n v e r g n g o . ( ) ( ) l l l f µ L t t a e a s a o c a m a x o (Can you see why?) And it works whether the maximization step is local or global. Professor William H. Press, Department of Computer Science, the University of Texas at Austin

· ¸ ( j ) µ P x z X 0 0 0 ( j ) µ µ l P z x a r g m a x n = 0 ( j ) µ P µ z x z X 0 ( j ) [ ( j ) ] µ l µ P P z x x z a r g m a x n = µ z X 0 ( j ) [ ( j ) ( j ) ] µ l µ µ P P P z x x z z a r g m a x n = µ z So the general EM algorithm repeats the maximization: sometimes (missing data) no dependence on z sometimes (nuisance parameters) a uniform prior each form is sometimes useful computing this is the E-step This is an expectation that can often be computed in some better way than literally integrating over all possible values of z. maximizing this is the M-step This is a general way of handling missing data or nuisance parameters if you can estimate the probability of what is missing, given what you see (and a parameters guess). Professor William H. Press, Department of Computer Science, the University of Texas at Austin

( ) h f d i i i i i t t t t t t z m s s n g s e a s s g n m e n o a a p o n s o c o m p o n e n s µ f h d § i t t ¹ c o n s s s o e s a n s 0 ( j ) µ P z x p ! k n X X 1 ¡ £ ¤ 0 ( j ) [ ( j ) ] ( ) ( ) µ l µ l d § § P P t ¡ ¡ ¡ ¡ z x x x ¹ ¢ ¢ x ¹ n p n e ! k k k k k n n n k z n ; Might not be instantly obvious how GMM fits this paradigm! Showing that this is maximized by the previous re-estimation formulas for m and S is a multidimensional (fancy) form of the theorem that the mean is the measure of central tendency that minimizes the mean square deviation. See Wikipedia: Expectation-Maximization Algorithm for detailed derivation. The next time we see an EM method will be when we discuss Hidden Markov Models. The “Baum-Welch re-estimation algorithm” for HMMs is an EM method. Professor William H. Press, Department of Computer Science, the University of Texas at Austin

Opinionated

Opinionated

Presentation Transcript

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated

Opinionated