1 / 19

Maximum Likelihood And Expectation Maximization

Maximum Likelihood And Expectation Maximization. Lecture Notes for CMPUT 466/551 Nilanjan Ray. MLE and EM. Maximum Likelihood Estimation (MLE) and Expectation Maximization are two very important tools in Machine Learning

yank
Download Presentation

Maximum Likelihood And Expectation Maximization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray

  2. MLE and EM • Maximum Likelihood Estimation (MLE) and Expectation Maximization are two very important tools in Machine Learning • Essentially you use them in estimating probability distributions in a learning algorithm; we have already seen one such example– in logistic regression we used MLE • We will revisit MLE here, realize certain difficulties of MLE • Then Expectation Maximization (EM) will rescue us

  3. Probability Density Estimation: Quick Points Two different routes: • Parametric • Provide a parametrized class of density functions • Tools: • Maximum likelihood estimation • Expectation Maximization • Sampling techniques • …. • Non-Parametric • Density is modeled by samples: • Tools: • Kernel Methods • Sampling techniques • …

  4. Revisiting Maximum Likelihood The data is coming from a known probability distribution The probability distribution has some parameters that are unknown to you Example: data is distributed as Gaussian yi~ N(, 2), so the unknown parameters here are  = (, 2) MLE is a tool that estimates the unknown parameters of the probability distribution from data

  5. MLE: Recapitulation • Assume observation data yiare independent • Form the Likelihood: • Form the Log-likelihood: • To find out the unknown parameter values, maximize the log-likelihood with respect to the unknown parameters:

  6. MLE: A Challenging Example Source: Department of Statistics, CMU Observation data: histogram Indicator variable Mixture model:  is the probability with which the observation is chosen from density 2 (1- ) is the probability with which the observation is chosen from density 1

  7. MLE: A Challenging Example… Maximum likelihood fitting for parameters: Numerically (and of course analytically, too) Challenging to solve!!

  8. Expectation Maximization: A Rescuer EM augments the data space– assumes some latent data Source: Department of Statistics, CMU

  9. EM: A Rescuer… Maximizing this form of log-likelihood is now tractable Note that we cannotanalytically maximize this log-likelihood Source: Department of Statistics, CMU

  10. EM: The Complete Data Likelihood By simple differentiations we have: So, maximization of the complete data likelihood is much easier! How do we get the latent variables?

  11. Obtaining Latent Variables The latent variables are computed as expected values given the data and parameters: Apply Bayes’ rule:

  12. EM for Two-component Gaussian Mixture • Initialize 1, 1, 2, 2,  • Iterate until convergence • Expectation of latent variables • Maximization for finding parameters

  13. EM for Mixture of K Gaussians • Initialize mean vectors, covariance matrices, and mixing probabilities: k, k, k, k =1,2,…,K. • Expectation Step: compute responsibilities • Maximization Step: update parameters • Iterate Steps Expectation and Maximization until convergence

  14. EM Algorithm in General T = (Z, Zm) is the complete data; we only know Z, Zm is missing Taking logarithm: Because we have access to previous parameter values , we can do better: Let us now consider the expression: It can be shown that This is actually done by Jensen’s inequality Thus if ’ maximizes then

  15. EM Algorithm in General • Start with initial parameter values (0); t = 1 • Expectation step: compute • Maximization step: • t =t + 1 and iterate

  16. EM Algorithm: Summary • Augment the original data space by latent/hidden/missing data • Frame a suitable probability model for the augmented data space • In EM iterations, first assume initial values for the parameters • Iterate the Expectation and the Maximization steps • In the Expectation step, find the expected values of the latent variables (here you need to use the current parameter values) • In the Maximization step, first plug in the expected values of the latent variables in the log-likelihood of the augmented data. Then maximize this log-likelihood to reevaluate the parameters • Iterate last two steps until convergence

  17. Applications of EM • Mixture models • HMMs • PCA • Latent variable models • Missing data problems • many computer vision problems • …

  18. References • The EM Algorithm and Extensions by Geoffrey J. MacLauchlan, Thriyambakam Krishnan • For a non-parametric density estimate by EM look at: http://bioinformatics.uchc.edu/LectureNotes_2006/Tools_EM_SA_2006_files/frame.htm

  19. EM: Important Issues • Is the convergence of the algorithm guaranteed? • Does the outcome of EM depend on the initial choice of the parameter values? • How about the speed of convergence? • How easy or difficult could it be to compute the expected values of the latent variables?

More Related