1 / 22

Lecture 18 Expectation Maximization

Lecture 18 Expectation Maximization . Machine Learning. Last Time. Expectation Maximization Gaussian Mixture Models. Term Project. Projects may use existing machine learning software weka , libsvm , liblinear , mallet, crf ++, etc. But must experiment with Type of data

elma
Download Presentation

Lecture 18 Expectation Maximization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 18Expectation Maximization Machine Learning

  2. Last Time • Expectation Maximization • Gaussian Mixture Models

  3. Term Project • Projects may use existing machine learning software • weka, libsvm, liblinear, mallet, crf++, etc. • But must experiment with • Type of data • Feature Representations • a variety of training styles – amount of data, classifiers. • Evaluation

  4. Gaussian Mixture Model • Mixture Models. • How can we combine many probability density functions to fit a more complicated distribution?

  5. Gaussian Mixture Model • Fitting Multimodal Data • Clustering

  6. Gaussian Mixture Model • Expectation Maximization. • E-step • Assign points. • M-step • Re-estimate model parameters.

  7. Today • EM Proof • Jensen’s Inequality • Clustering sequential data • EM over HMMs

  8. Gaussian Mixture Models

  9. How can we be sure GMM/EM works? • We’ve already seen that there are multiple clustering solutions for the same data. • Non-convex optimization problem • Can we prove that we’re approaching some maximum, even if many exist.

  10. Bound maximization • Since we can’t optimize the GMM parameters directly, maybe we can find the maximum of a lower bound. • Technically: optimize a convex lower bound of the initial non-convex function.

  11. EM as a bound maximization problem • Need to define a function Q(x,Θ) such that • Q(x,Θ) ≤ l(x,Θ) for all x,Θ • Q(x,Θ) = l(x,Θ) at a single point • Q(x,Θ) is concave

  12. EM as bound maximization • Claim: • for GMM likelihood • The GMM MLE estimate is a convex lower bound

  13. EM Correctness Proof • Prove that l(x,Θ) ≥ Q(x,Θ) Likelihood function Introduce hidden variable (mixtures in GMM) A fixed value of θt Jensen’s Inequality (coming soon…)

  14. EM Correctness Proof GMM Maximum Likelihood Estimation

  15. The missing link: Jensen’s Inequality • If f is concave (or convex down): • Incredibly important tool for dealing with mixture models. if f(x) = log(x)

  16. Generalizing EM from GMM • Notice, the EM optimization proof never introduced the exact form of the GMM • Only the introduction of a hidden variable, z. • Thus, we can generalize the form of EM to broader types of latent variable models

  17. General form of EM • Given a joint distribution over observed and latent variables: • Want to maximize: • Initialize parameters • E Step: Evaluate: • M-Step: Re-estimate parameters (based on expectation of complete-data log likelihood) • Check for convergence of params or likelihood

  18. Applying EM to Graphical Models • Now we have a general form for learning parameters for latent variables. • Take a Guess • Expectation: Evaluate likelihood • Maximization: Reestimate parameters • Check for convergence

  19. Clustering over sequential data • HMMs • What if you believe the data is sequential, but you can’t observe the state.

  20. Training latent variables in Graphical Models • Now consider a general Graphical Model with latent variables.

  21. EM on Latent Variable Models • Guess • Easy, just assign random values to parameters • E-Step: Evaluate likelihood. • We can use JTA to evaluate the likelihood. • And marginalize expected parameter values • M-Step: Re-estimate parameters. • Based on the form of the models generate new expected parameters • (CPTs or parameters of continuous distributions) • Depending on the topology this can be slow

  22. Break

More Related