1 / 13

Machine Learning Saarland University, SS 2007

Machine Learning Saarland University, SS 2007. Lecture 9, Friday June 15 th , 2007 (EM algorithm + convergence). Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany. Overview of this Lecture. Quick recap of last lecture maximum likelihood principle / our 3 examples

sarila
Download Presentation

Machine Learning Saarland University, SS 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine LearningSaarland University, SS 2007 Lecture 9, Friday June 15th, 2007 (EM algorithm + convergence) Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany

  2. Overview of this Lecture • Quick recap of last lecture • maximum likelihood principle / our 3 examples • The EM algorithm • writing down the formula (very easy) • understanding the formula (very hard) • Example: mixture of two normal distributions • Convergence • to local maximum (under mild assumptions) • Exercise Sheet • explain / discuss / make a start

  3. Maximum Likelihood: Example 1 • Sequence of coin flips HHTTTTTTHTTTTTHTTHHT • say 5 times H and 15 times T • which Prob(H) and Prob(T) are most likely? • Formalization • Data X = (x1, … , xn),xi in {H,T} • Parameters Θ = (pH, pT), pH + pT = 1 • Likelihood L(X,Θ) =pHh · pTt, h = #{i : xi = H}, t = #{i : xi = T} • Log Likelihood Q(X,Θ) = log L(X,Θ) = h · log pH + t · log pT • find Θ* = argmaxΘ L(X,Θ) = argmaxΘ Q(X,Θ) • Solution • here pH = h / (h + t) and pT = t / (h + t) looks likeProb(H) = ¼ Prob(T) = ¾ simple calculus[blackboard]

  4. Maximum Likelihood: Example 2 • Sequence of reals drawn from N(μ, σ) • which μ and σ are most likely? • Formalization • Data X = (x1, … , xn),xi real number • Parameters Θ = (μ, σ) • Likelihood L(X,Θ) = πi 1/(sqrt(2π)σ) · exp( - (xi - μ)2 / 2σ2 ) • Log Likelihood Q(X,Θ) = - n/2·log(2π) - n·logσ – Σi (xi - μ)2 / 2σ2 • find Θ* = argmaxΘ L(X,Θ) = argmaxΘ Q(X,Θ) • Solution • here μ = 1/n * Σixi and σ2 = 1/n * Σi (xi - μ)2 normal distributionwith mean μ and standard deviation σ simple calculus[blackboard]

  5. Maximum Likelihood: Example 3 • Sequence of real numbers • each drawn from either N1(μ1, σ1) or N2(μ2, σ2) • from N1 with prob p1, and from N2 with prob p2 • which μ1, σ1, μ2, σ2, p1, p2 are most likely? • Formalization • Data X = (x1, … , xn),xi real number • Hidden dataZ = (z1, … , zn), zi = j iff xi drawn from Nj • Parameters Θ = (μ1, σ1, μ2, σ2, p1, p2), p1 + p2 = 1 • Likelihood L(X,Θ) = [blackboard] • Log Likelihood Q(X,Θ) = [blackboard] • find Θ* = argmaxΘ L(X,Θ) = argmaxΘ Q(X,Θ) standard calculus fails (derivative of sum of logs of sum)

  6. The EM algorithm — Formula • Given • Data X = (x1, … ,xn) • Hidden dataZ = (z1, … ,zn) • Parameters Θ + an initial guess θ1 • Expectation-Step: • Pr(Z|X;θt) = Pr(X|Z;θt) ∙ Pr(Z|θt) /ΣZ’Pr(X|Z’;θt) ∙ Pr(Z’|θt) • Maximization-Step: • θt+1 = argmaxΘ EZ[ log Pr(X,Z|Θ) | X;θt ] What the hell does this mean? crucial to understand each of these probabilities / expected values What is fixed? What is random and how? What do the conditionals mean?

  7. Three attempts to maximize the likelihood consider the mixture of twoGaussians as an example • The direct way … • given x1, … ,xn • find parameters μ1, σ1, μ2, σ2, p1, p2 • such that log L(x1, … ,xn) is maximized • If only we knew … • given data x1, … ,xn and hidden data z1, … ,zn • find parameters μ1, σ1, μ2, σ2, p1, p2 • such that log L(x1, … ,xn, z1, … ,zn) is maximized • The EM way … • given x1, … ,xn and random variables Z1, … ,Zn • find parameters μ1, σ1, μ2, σ2, p1, p2 • such that E log L(x1, … ,xn, Z1, … ,Zn) is maximized optimization too hard(sum of logs of sums) would be feasible [show on blackboard] but we don’t knowthe z1, … ,zn E-Step provides the Z1, … ,Zn M-Step of theEM algorithm

  8. E-Step — Formula consider the mixture of twoGaussians as an example • We have (at the beginning of each iteration) • the data x1, … ,xn • the fully specified distributionsN1(μ1,σ1) and N2(μ2,σ2) • the probability of choosing between N1and N2 = random variable Z with p1 = Pr(Z=1) and p2= Pr(Z=2) • We want • for each data point xi a probability of choosing N1 or N2 = random variables Z1, … ,Zn • Solution (the actual E-Step) • take Zi asthe conditional Z | Xi • Pr(Zi=1) = Pr(Z=1 | xi) = Pr(xi | Z=1) ∙ Pr(Z=1) / Pr(xi) with Pr(xi) = ΣZ Pr(xi | Z=z) ∙ Pr(Z=z) Bayes’ law

  9. E-Step — analogy to a simple example • Draw ball from one of two urns Pr(Urn 1) = 1/3, Pr(Urn 2) = 2/3 Pr(Blue | Urn 1) = 1/2, Pr(Blue | Urn 2) = 1/4 Pr(Blue) = Pr(Blue | Urn 1) ∙ Pr(Urn 1) + Pr(Blue | Urn 2) ∙ Pr(Urn2) = 1/2 ∙ 1/3 + 2/3 ∙ 1/4 = 1/3 Pr(Urn 1 | Blue) = Pr(B | Urn 1) ∙ Pr(Urn 1) / Pr(B) = 1/2 ∙ 1/3 / 1/3 = 1/2 Urn 2pick with prob 2/3 Urn 1pick with prob 1/3

  10. M-Step — Formula • [Blackboard]

  11. Convergence of EM Algorithm • Two (log) likelihoods • true: log L(x1,…,xn) • EM: E log L(x1,…,xn, Z1,…,Zn) • Lemma 1 (lower bound) • E log L(x1,…,xn, Z1,…,Zn) ≤ log L(x1,…,xn) • Lemma 2 (touch) • E log L(x1,…,xn, Z1,…,Zn)(θt) = log L(x1,…,xn)(θt) • Convergence • if expected likelihood function is well-behaved, e.g., if first derivate at local maxima exist and second derivate is < 0 • then Lemmas 1 and 2 imply convergence [blackboard] [blackboard]

  12. 2 2 ( ) ( ) ¡ ¡ x ¹ x ¹ i i 1 2 ¡ ¡ 1 1 2 2 Y Y 2 2 ¾ ¾ ( ) L 1 2 ¢ ¢ ¢ x x z z e e = 1 1 n n ; : : : ; ; ; : : : ; p p 2 2 ¼ ¾ ¼ ¾ 1 2 I I i i 2 2 1 2 Attempt Two: Calculations • If only we knew … • given data x1, … ,xn and hidden data z1, … ,zn • find parameters μ1, σ1, μ2, σ2, p1, p2 • such that log L(x1, … ,xn, z1, … ,zn) is maximized • let I1 = {i : zi = 1} and I2 = {i : zi = 2} • The two products can be maximized separately • here μ1 = Σi in I1xi / |I1| and σ12 = Σ i in I1(xi – μ1)2 /|I1| • here μ2 = Σi in I2xi / |I2| and σ22 = Σ i in I2(xi – μ2)2 /|I2|

More Related