1 / 22

Gaussian Mixture Models

Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND. Gaussian Mixture Models. Clustering Methods: Part 8. Ville Hautamäki. Preliminaries. We assume that the dataset X has been generated by a parametric distribution p(X) .

rogerjames
Download Presentation

Gaussian Mixture Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech and Image Processing UnitDepartment of Computer Science University of Joensuu, FINLAND Gaussian Mixture Models Clustering Methods: Part 8 Ville Hautamäki

  2. Preliminaries • We assume that the dataset X has been generated by a parametric distribution p(X). • Estimation of the parameters of p is known as density estimation. • We consider Gaussian distribution. Figures taken from: http://research.microsoft.com/~cmbishop/PRML/

  3. Typical parameters (1) • Mean(μ): average value of p(X), also called expectation. • Variance (σ): provides a measure of variability in p(X) around the mean.

  4. Typical parameters (2) • Covariance: measures how much two variables vary together. • Covariance matrix: collection of covariances between all dimensions. • Diagonal of the covariance matrix contains the variances of each attribute.

  5. One-dimensional Gaussian • Parameters to be estimated are the mean (μ) and variance (σ)

  6. Multivariate Gaussian (1) • In multivariate case we have covariance matrix instead of variance

  7. Multivariate Gaussian (2) Diagonal Single Full covariance Complete data log likelihood:

  8. Maximum Likelihood (ML) parameter estimation • Maximize the log likelihood formulation • Setting the gradient of the complete data log likelihood to zero we can find the closed form solution. • Which in the case of mean, is the sample average.

  9. When one Gaussian is not enough • Real world datasets are rarely unimodal!

  10. Mixtures of Gaussians

  11. Mixtures of Gaussians (2) • In addition to mean and covariance parameters (now M times), we have mixing coefficients πk. Following properties hold for the mixing coefficients: It can be seen as the prior probability of the component k

  12. Responsibilities (1) Complete data Incomplete data Responsibilities • Component labels (red, green and blue) cannot be observed. • We have to calculate approximations (responsibilities).

  13. Responsibilities (2) • Responsibility describes, how probably observation vector x is from component k. • In clustering, responsibilities take values 0 and 1, and thus, it defines the hard partitioning.

  14. Responsibilities (3) We can express the marginal density p(x) as: From this, we can find the responsibility of the kth component of x using Bayesian theorem:

  15. Expectation Maximization (EM) • Goal: Maximize the log likelihood of the whole data • When responsibilities are calculated, we can maximize individually for the means, covariances and the mixing coefficients!

  16. Exact update equations New mean estimates: Covariance estimates Mixing coefficient estimates

  17. EM Algorithm • Initialize parameters • while not converged • E step: Calculate responsibilities. • M step: Estimate new parameters • Calculate log likelihood of the new parameters

  18. Example of EM

  19. Computational complexity • Hard clustering with MSE criterion is NP-complete. • Can we find optimal GMM in polynomial time? • Finding optimal GMM is in class NP

  20. Some insights • In GMM we need to estimate the parameters, which all are real numbers • Number of parameters: M+M(D) + M(D(D-1)/2) • Hard clustering has no parameters, just set partitioning (remember optimality criteria!)

  21. Some further insights (2) • Both optimization functions are mathematically rigorous! • Solutions minimizing MSE are always meaningful • Maximization of log likelihood might lead to singularity!

  22. Example of singularity

More Related