1 / 92

Bayesian Rule & Gaussian Mixture Models

Bayesian Rule & Gaussian Mixture Models. Jianping Fan Dept of Computer Science UNC-Charlotte. New Mail. Basic Classification. Not-spam. Spam. What we have learned from K-Means clustering ?. y. Test Sample. Difference Between Classification and Clustering. Classification.

gupta
Download Presentation

Bayesian Rule & Gaussian Mixture Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Rule & Gaussian Mixture Models Jianping Fan Dept of Computer Science UNC-Charlotte

  2. New Mail Basic Classification Not-spam Spam

  3. What we have learned from K-Means clustering? y Test Sample

  4. Difference Between Classification and Clustering Classification labels vs. Clustering non-labels

  5. New Mail What are the critical issues for classification? Not-spam Spam

  6. Basic Classification Input Output Spam vs. Not-Spam Spam filtering Binary !!!!$$$!!!! Spam email prior; Not-spam email prior Spam email distribution/density Not-spam email distribution/density &

  7. Basic Classification Input Output Spam vs. Not-Spam Spam filtering Binary !!!!$$$!!!!

  8. Basic Classification Input Output Multi-Class Character recognition C C vs. other 25 characters

  9. Structured Classification Input Output Handwriting recognition Structured output Graph Model brace building 3D object recognition tree

  10. Overview of Bayesian Decision • Bayesian classification: one example • E.g. How to decide if a patient is sick or healthy, based on • A probabilistic model of the observed data (data distributions) • Prior knowledge (ratio or importance) • Majority has bigger voice

  11. Overview of Bayesian Decision Test patient • Group assignment for test patient; • Prior knowledge about the assigned group • (c) Properties of the assigned group (sick or healthy)

  12. Overview of Bayesian Decision Test patient Observations: Bayesian decision process is a data modeling process, e.g., estimate the data distribution K-means clustering: any relationship & difference?

  13. Bayes’ Rule: normalization Who is who in Bayes’ rule

  14. Classification problem • Training data: examples of the form (d,h(d)) • where d are the data objects to classify (inputs) • and h(d) are the correct class info for d, h(d){1,…K} • Goal: given dnew, provide h(dnew)

  15. Why Bayesian? • Provides practical learning algorithms • E.g. Naïve Bayes • Prior knowledge and observed data can be combined • It is a generative (model based) approach, which offers a useful conceptual framework • E.g. sequences could also be classified, based on a probabilistic model specification • Any kind of objects can be classified, based on a probabilistic model specification

  16. Bayes’ Rule: P(d|h): data distribution of group h P(h): importance of group h data distribution of whole data set

  17. Bayes’ Rule: • Estimating the data distribution for whole data set • Estimating the data distribution for specific group • Prior knowledge about specific group Works to support Bayesian decision:

  18. Gaussian Mixture Model (GMM) How to estimate the distribution for a given dataset?

  19. Gaussian Mixture Model (GMM)

  20. Gaussian Mixture Model (GMM) What does GMM mean? 100 = 5*10 + 2*20 + 2*5 100 = 100*1 100 = 10*10 ………………. GMM may prefer larger K with ``smaller” Gaussians

  21. Gaussian Mixture Model (GMM)

  22. Gaussian Mixture Model (GMM) Real Data Distribution Any complex function (distribution) can be approximated by using a limited number of other functions (distributions) such as Gaussian functions.

  23. Gaussian Mixture Model (GMM) Real Data Distribution Approximated Gaussian functions

  24. Gaussian Function Why we select Gaussian function not others?

  25. When one Gaussian Function is used to approximate data distribution a likelihood function (often simply the likelihood) is a function of the parameters of a statistical model Given a dataset, how to use likelihood function to determine Gaussian parameters?

  26.  Matching between Gaussian Function and Samples Sampling

  27.  Maximum Likelihood Sampling We want to maximize it. Given x, it is a function of  and 2

  28. Log-Likelihood Function Maximize this instead By setting and

  29. Max. the Log-Likelihood Function

  30. Max. the Log-Likelihood Function

  31. When multiple Gaussian Functions are used to approximate data distribution a likelihood function (often simply the likelihood) is a function of the parameters of a statistical model Given a dataset, how to use likelihood function to determine parameters for multiple Gaussian functions?

  32. Gaussian Mixture Model (GMM) Parameters to be estimated: Training Set: We assume K is available (or pre-defined)! Algorithms may always prefer larger K!

  33. Gaussian Mixture Models • Rather than identifying clusters by “nearest” centroids • Fit a Set of k Gaussians to the data • Maximum Likelihood over a mixture model

  34. GMM example

  35. Mixture Models • Formally a Mixture Model is the weighted sum of a number of pdfs where the weights are determined by a distribution,

  36. Gaussian Mixture Models • GMM: the weighted sum of a number of Gaussians where the weights are determined by a distribution,

  37. Maximum Likelihood over a GMM • As usual: Identify a likelihood function • Log-likelihood

  38. Maximum Likelihood of a GMM • Optimization of means. k=1, …..K

  39. Maximum Likelihood of a GMM • Optimization of covariance k=1, …..K

  40. Maximum Likelihood of a GMM • Optimization of mixing term

  41. EM for GMMs • Initialize the parameters • Evaluate the log likelihood • Expectation-step: Evaluate the responsibilities • Maximization-step: Re-estimate Parameters • Evaluate the log likelihood • Check for convergence

  42. EM for GMMs • E-step: Evaluate the Responsibilities

  43. EM for GMMs • M-Step: Re-estimate Parameters

  44. EM for GMMs • Evaluate the log likelihood and check for convergence of either the parameters or the log likelihood. If the convergence criterion is not satisfied, return to E-Step.

  45. Visual example of EM

  46. Incorrect Number of Gaussians

  47. Incorrect Number of Gaussians

  48. Too many parameters • For n-dimensional data, we need to estimate at least 3n parameters

  49. Relationship to K-means • K-means makes hard decisions. • Each data point gets assigned to a single cluster. • GMM/EM makes soft decisions. • Each data point can yield a posterior p(z|x) • Soft K-means is a special case of EM.

  50. Soft means as GMM/EM • Assume equal covariance matrices for every mixture component: • Likelihood: • Responsibilities: • As epsilon approaches zero, the responsibility approaches unity.

More Related