1 / 55

EM algorithm

EM algorithm. LING 572 Fei Xia 03/02/06. Outline. The EM algorithm EM for PM models Three special cases Inside-outside algorithm Forward-backward algorithm IBM models for MT. The EM algorithm. Basic setting in EM. X is a set of data points: observed data Θ is a parameter vector.

denis
Download Presentation

EM algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EM algorithm LING 572 Fei Xia 03/02/06

  2. Outline • The EM algorithm • EM for PM models • Three special cases • Inside-outside algorithm • Forward-backward algorithm • IBM models for MT

  3. The EM algorithm

  4. Basic setting in EM • X is a set of data points: observed data • Θ is a parameter vector. • EM is a method to find θML where • Calculating P(X | θ) directly is hard. • Calculating P(X,Y|θ) is much simpler, where Y is “hidden” data (or “missing” data).

  5. The basic EM strategy • Z = (X, Y) • Z: complete data (“augmented data”) • X: observed data (“incomplete” data) • Y: hidden data (“missing” data) • Given a fixed x, there could be many possible y’s. • Ex: given a sentence x, there could be many state sequences in an HMM that generates x.

  6. Examples of EM

  7. The log-likelihood function • L is a function of θ, while holding X constant:

  8. The iterative approach for MLE In many cases, we cannot find the solution directly. An alternative is to find a sequence: s.t.

  9. Jensen’s inequality

  10. Jensen’s inequality log is a concave function

  11. Maximizing the lower bound The Q function

  12. The Q-function • Define the Q-function (a function of θ): • Y is a random vector. • X=(x1, x2, …, xn) is a constant (vector). • Θt is the current parameter estimate and is a constant (vector). • Θ is the normal variable (vector) that we wish to adjust. • The Q-function is the expected value of the complete data log-likelihood P(X,Y|θ) with respect to Y given X and θt.

  13. The inner loop of the EM algorithm • E-step: calculate • M-step: find

  14. L(θ) is non-decreasing at each iteration • The EM algorithm will produce a sequence • It can be proved that

  15. The inner loop of the Generalized EM algorithm (GEM) • E-step: calculate • M-step: find

  16. Recap of the EM algorithm

  17. Idea #1: find θ that maximizes the likelihood of training data

  18. Idea #2: find the θt sequence No analytical solution  iterative approach, find s.t.

  19. Idea #3: find θt+1 that maximizes a tight lower bound of a tight lower bound

  20. Idea #4: find θt+1 that maximizes the Q function Lower bound of The Q function

  21. The EM algorithm • Start with initial estimate, θ0 • Repeat until convergence • E-step: calculate • M-step: find

  22. Important classes of EM problem • Products of multinomial (PM) models • Exponential families • Gaussian mixture • …

  23. The EM algorithm for PM models

  24. PM models Where is a partition of all the parameters, and for any j

  25. HMM is a PM

  26. PCFG • PCFG: each sample point (x,y): • x is a sentence • y is a possible parse tree for that sentence.

  27. PCFG is a PM

  28. Q-function for PM

  29. Maximizing the Q function Maximize Subject to the constraint Use Lagrange multipliers

  30. Optimal solution Expected count Normalization factor

  31. PM Models is rth parameter in the model. Each parameter is the member of some multinomial distribution. Count(x,y, r) is the number of times that is seen in the expression for P(x, y | θ)

  32. The EM algorithm for PM Models • Calculate expected counts • Update parameters

  33. PCFG example • Calculate expected counts • Update parameters

  34. The EM algorithm for PM models // for each iteration // for each training example xi // for each possible y // for each parameter // for each parameter

  35. Inside-outside algorithm

  36. Inner loop of the Inside-outside algorithm Given an input sequence and • Calculate inside probability: • Base case • Recursive case: • Calculate outside probability: • Base case: • Recursive case:

  37. Inside-outside algorithm (cont) 3. Collect the counts 4. Normalize and update the parameters

  38. Expected counts for PCFG rules This is the formula if we have only one sentence. Add an outside sum if X contains multiple sentences.

  39. Expected counts (cont)

  40. Relation to EM • PCFG is a PM Model • Inside-outside algorithm is a special case of the EM algorithm for PM Models. • X (observed data): each data point is a sentence w1m. • Y (hidden data): parse tree Tr. • Θ (parameters):

  41. Forward-backward algorithm

  42. The inner loop for forward-backward algorithm Given an input sequence and • Calculate forward probability: • Base case • Recursive case: • Calculate backward probability: • Base case: • Recursive case: • Calculate expected counts: • Update the parameters:

  43. Expected counts

  44. Expected counts (cont)

  45. Relation to EM • HMM is a PM Model • Forward-backward algorithm is a special case of the EM algorithm for PM Models. • X (observed data): each data point is an O1T. • Y (hidden data): state sequence X1T. • Θ (parameters): aij, bijk, πi.

  46. IBM models for MT

  47. Expected counts for (f, e) pairs • Let Ct(f, e) be the fractional count of (f, e) pair in the training data. Alignment prob Actual count of times e and f are linked in (E,F) by alignment a

  48. Relation to EM • IBM models are PM Models. • The EM algorithm used in IBM models is a special case of the EM algorithm for PM Models. • X (observed data): each data point is a sentence pair (F, E). • Y (hidden data): word alignment a. • Θ (parameters): t(f|e), d(i | j, m, n), etc..

  49. Summary • The EM algorithm • An iterative approach • L(θ) is non-decreasing at each iteration • Optimal solution in M-step exists for many classes of problems. • The EM algorithm for PM models • Simpler formulae • Three special cases • Inside-outside algorithm • Forward-backward algorithm • IBM Models for MT

  50. Relations among the algorithms The generalized EM The EM algorithm PM Inside-Outside Forward-backward IBM models Gaussian Mix

More Related