1 / 36

CS 59000 Statistical Machine learning Lecture 24

CS 59000 Statistical Machine learning Lecture 24. Yuan (Alan) Qi Purdue CS Nov. 20 2008. Outline . Review of K-medoids, Mixture of Gaussians, Expectation Maximization (EM), Alternative view of EM

tola
Download Presentation

CS 59000 Statistical Machine learning Lecture 24

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 59000 Statistical Machine learningLecture 24 Yuan (Alan) Qi Purdue CS Nov. 20 2008

  2. Outline Review of K-medoids, Mixture of Gaussians, Expectation Maximization (EM), Alternative view of EM Hidden Markvo Models, forward-backward algorithm, EM for learning HMM parameters, Viterbi Algorithm, Linear state space models, Kalman filtering and smoothing

  3. K-medoids Algorithm

  4. Mixture of Gaussians Mixture of Gaussians: Introduce latent variables: Marginal distribution:

  5. Conditional Probability Responsibility that component k takes for explaining the observation.

  6. Maximum Likelihood Maximize the log likelihood function

  7. Severe Overfitting by Maximum Likelihood When a cluster has only data point, its variance goes to 0.

  8. Maximum Likelihood Conditions (1) Setting the derivatives of to zero:

  9. Maximum Likelihood Conditions (2) Setting the derivative of to zero:

  10. Maximum Likelihood Conditions (3) Lagrange function: Setting its derivative to zero and use the normalization constraint, we obtain:

  11. Expectation Maximization for Mixture Gaussians Although the previous conditions do not provide closed-form conditions, we can use them to construct iterative updates: E step: Compute responsibilities . M step: Compute new mean , variance , and mixing coefficients . Loop over E and M steps until the log likelihood stops to increase.

  12. General EM Algorithm

  13. EM and Jensen Inequality Goal: maximize Define: We have From Jesen’s Inequality, we see is a lower bound of .

  14. Lower Bound is a functional of the distribution . Since and , is a lower bound of the log likelihood function . (Another way to see the lower bound without using Jensen’s inequality)

  15. Lower Bound Perspective of EM Expectation Step: Maximizing the functional lower bound over the distribution . Maximization Step: Maximizing the lower bound over the parameters .

  16. Illustration of EM Updates

  17. Sequential Data There are temporal dependence between data points

  18. Markov Models By chain rule, a joint distribution can be re-written as: Assume conditional independence, we have It is known as first-order Markov chain

  19. High Order Markov Chains Second order Markov assumption Can be generalized to higher order Markov Chains. But the number of the parameters explores exponentially with the order.

  20. State Space Models Important graphical models for many dynamic models, includes Hidden Markov Models (HMMs) and linear dynamic systems Questions: order for the Markov assumption

  21. Hidden Markov Models Many applications, e.g., speech recognition, natural language processing, handwriting recognition, bio-sequence analysis

  22. From Mixture Models to HMMs By turning a mixture Model into a dynamic model, we obtain the HMM. Let model the dependence between two consecutive latent variables by a transition probability:

  23. HMMs Prior on initial latent variable: Emission probabilities: Joint distribution:

  24. Samples from HMM (a) Contours of constant probability density for the emission distributions corresponding to each of the three states of the latent variable. (b) A sample of 50 points drawn from the hidden Markov model, with lines connecting the successive observations.

  25. Inference: Forward-backward Algorithm Goal: compute marginals for latent variables. Forward-backward Algorithm: exact inference as special case of sum-product algorithm on the HMM. Factor graph representation (grouping emission density and transition probability in one factor at a time):

  26. Forward-backward Algorithm as Message Passing Method (1) Forward messages:

  27. Forward-backward Algorithm as Message Passing Method (2) Backward messages (Q: how to compute it?): The messages actually involves X Similarly, we can compute the following (Q: why)

  28. Rescaling to Avoid Overflowing When a sequence is long, the forward message will become to small to be represented by the dynamic range of the computer. We redefine the forward message as Similarly, we re-define the backward message as Then, we can compute See detailed derivation in textbook

  29. Viterbi Algorithm Viterbi Algorithm: Finding the most probable sequence of states Special case of sum-product algorithm on HMM. What if we want to find the most probable individual states?

  30. Maximum Likelihood Estimation for HMM Goal: maximize Looks familiar? Remember EM for mixture of Gaussians… Indeed the updates are similar.

  31. EM for HMM E step: Computed from forward-backward/sum-product algorithm M step:

  32. Linear Dynamical Systems Equivalently, we have where

  33. Kalman Filtering and Smoothing Inference on linear Gaussian systems. Kalman filtering: sequentially update scaled forward message: Kalman smoothing: sequentially update state beliefs based on scaled forward and backward messages:

  34. Learning in LDS EM again…

  35. Extension of HMM and LDS Discrete latent variables: Factorized HMMs Continuous latent variables: switching Kalman filtering models

More Related