1 / 19

Generative classifiers: The Gaussian classifier

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham. Outline. We have already seen how Bayes rule can be turned into a classifier In all our examples so far we had discrete valued attributes (e.g. in {‘ sunny’,’rainy ’}, {+,-})

Download Presentation

Generative classifiers: The Gaussian classifier

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham

  2. Outline • We have already seen how Bayes rule can be turned into a classifier • In all our examples so far we had discrete valued attributes (e.g. in {‘sunny’,’rainy’}, {+,-}) • Today we learn how to do this when the data attributes are continuous valued

  3. 40 Empirical data for male 35 Empirical data for female 30 25 20 15 10 5 0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 Example • Task: predict gender of individuals based on their heights • Given • 100 height examples of women • 100 height examples of man Frequency Height (meters)

  4. Class priors • We can encode the values of the hypothesis (class) as 1 (male) and 0 (female). So, • Since in this example we had the same number of males and females, we have P(h=1)=P(h=0)=0.5. These are the prior probabilities of class membership because they can be set before measuring any data. • Note that in cases when the class proportions are imbalanced, we can use the priors to make predictions even before seeing any data.

  5. Class-conditional likelihood • Our measurements are heights. This is our data, . • Class-conditional likelihoods: p(x|h=1): probability that a male has height x meters p(x|h=0):

  6. Class posterior • As before, from Bayes rule we can obtain the class posteriors:Meaning of the denominator is the probability of measuring the height value x irrespective of the class. • If we can compute this then we can use it for predicting the gender from the height measurement

  7. 40 Empirical data for male 35 Empirical data for female 30 25 20 15 10 5 0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 Discriminant function • When does our prediction switch from predicting h=0 vs predicting h=1? • … more precisely Frequency Height (meters)

  8. Discriminant function • If we make a measurement, say we get • We compute the posteriors and find • Then we decide to predict ‘, i.e. , male • If we measured , we will get

  9. Discriminant function • We can define a discriminant function as: and compare the function value to 1. • More convenient to have the switching at 0 rather than at 1. Define discriminant function as the log of f1: • Then the sign of this function defines the prediction (if f(x)>0 => male, if f(x)<0 => female)

  10. How do we compute it? • Let’s write it out using Bayes rule: • Now, we need the class conditional likelihood terms, and . Note that now takes continuous real values. • We will model each class by a Gaussian distribution. (Note, there are other ways to do it, this is a generic problem that Density Estimation deals with. Here consider the specific case of using Gaussian, which is fairly commonly done in practice.)

  11. 40 Empirical data for male Fitted distributionfor male 35 Empirical data for female Fitted distribution for female 30 25 20 15 10 5 0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 Illustration – our 1D example Frequency Height (meters)

  12. Gaussian - univariate Where is the mean (center), and is the variance (spread). These are the parameters that describe the distributions. We will have a separate Gaussian for each class. So, the female class will have mean, and as its variance. The male class will have mean, and as its variance. We need to estimate these parameters from the data.

  13. Gaussian - multivariate Let . So x has d attributes. Let k in {0,1}. = Where the mean vectors, andis the covariance matrices. These are the parameters that describe the distributions, and they are estimated from the data.

  14. Gaussian - multivariate

  15. 2D example with 2 classes Attribute 2 Attribute 1

  16. Naïve Bayes • Notice the full covariances are • In many situations there is not enough data to estimate the full covariance – e.g. when d is large. • The Naïve Bayes assumption is again an easy simplification that we can make and tends to work well in practice. In the Gaussian model it means that the covariance matrix is diagonal. • For the brave: Check this last statement for yourself! – 3% extra credit if you hand in a correct solution to me before next Thursday’s class!

  17. Are we done? • How do we estimate the parameters, i.e. the means and the variance/ covariance? • If we use the Naïve Bayes assumption, we can compute the estimates of the mean and variance in each class separately for each feature. • If d is small, and you have many points in your training set, then working with full covariance is expected to work better. • In MatLab there are built-in functions that you can use: mean, cov, var.

  18. Multi-class classification • We may have more than 2 classes – e.g. ‘healthy’, ‘disease type 1’, ‘disease type 2’. • Our Gaussian classifier is easy to use in multi-class problems. • We compute the posterior probability for each of the classes • We predict the class whose posterior probability is highest.

  19. Summing up • This type of classifier is called ‘generative’, because it rests on the assumption that the cloud of points in each class can be seen as generated by some distribution, e.g. a Gaussian, and works out its decisions based on estimating these distributions. • One could instead model the discriminant function directly! That type of classifier is called ‘discriminative’. • For the brave: Try to work out the form of the discriminant function by plugging into it the form of the Gaussian class conditional densities. You will get a quadratic function of x in general. When does it reduce to a linear functon? • Recommended reading: Rogers & Girolami, Chapter 5.

More Related