1 / 30

Elements of Pattern Recognition CNS/EE-148 -- Lecture 5 M. Weber P. Perona

Elements of Pattern Recognition CNS/EE-148 -- Lecture 5 M. Weber P. Perona. What is Classification?. We want to assign objects to classes based on a selection of attributes (features). Examples: (age, income)  {credit worthy, not credit worthy}

shel
Download Presentation

Elements of Pattern Recognition CNS/EE-148 -- Lecture 5 M. Weber P. Perona

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Elements ofPattern RecognitionCNS/EE-148 -- Lecture 5M. WeberP. Perona

  2. What is Classification? • We want to assign objects to classes based on a selection of attributes (features). • Examples: • (age, income)  {credit worthy, not credit worthy} • (blood cell count, body temp)  {flue, hepatitis B, hepatitis C} • (pixel vector)  {Bill Clinton, coffee cup} • Feature vector can be continuous, discrete or mixed.

  3. x2 Signal 1 Signal 2 x1 Noise What is Classification? • Want to find a function from measurements to class labels decision boundary. Space of Feature Vectors • Statistical methods use pdf:p(C,x) • Assume p(C,x) known for now

  4. Some Terminology • p(C) is called a prior or a priori probability • p(x|C) is called a class-conditional density or likelihood of C with respect to x • p(C|x) is called a posterior or a posteriori probability

  5. Examples • One measurement, symmetric cost, equal priors bad p(x|C2) p(x|C1) x

  6. Examples • One measurement, symmetric cost, equal priors good p(x|C2) p(x|C1) x

  7. How to Make the Best Decision? (Bayes Decision Theory) • Define a cost function for mistakes, e.g. • Minimize expected loss (risk) over entire p(C,x). • Sufficient to assure optimal decision for each individual x. • Result: decide according to maximum posterior probability:

  8. Two Classes, C1, C2 • It is helpful to consider the likelihood ratio: • Use known priors p(Ci) or ignore them. • For more elaborate loss function (proof is easy): • g(x) is called a discriminant function ?

  9. Discriminant Functions for Multivariate Gaussian Class Conditional Densities • Two multivariate Gaussians in d dimensions • Since log is monotonic, we can look at log g(x). Mahalanobis Distance2 superfluous

  10. iso-distance lines = iso-probability lines Decision surface: Mahalanobis Distance x2 2 1 x1 decision surface

  11. Case 1: i = 2I • Discriminant functions… • …simplify to:

  12. Decision Boundary • If 2=0, we obtain...The matched filter! With an expression for the threshold.

  13. Two Signals and Additive White Gaussian Noise x2 Signal 1 1-2 1 x x-2 Signal 2 2 x1

  14. Case 2: i =  • Two classes, 2D measurements, p(x|C) are multivariate Gaussians with equal covariance matrices. • Derivation is similar • Quadratic term vanishes since it is independent of class • We obtain a linear decision surface • Matlab demo

  15. Case 3: General Covariance Matrix • See transparency

  16. Isn’t this to simple? • Not at all… • It is true that images form complicated manifolds (from a pixel point of view, translation, rotation and scaling are all highly non-linear operations) • The high dimensionality helps

  17. Assume Unknown Class Densitites • In real life, we do not know the class conditional densities. • But we do have example data. • This puts us in the typical machine learning scenario:We want to learn a function, c(x), from examples. • Why not just estimate class densities from examples and apply the previous ideas? • Learn Gaussian (simple density): in N dimensions need N2 samples at least! • 10x10 pixels 10,000 examples! • Avoid estimating densities whenever you can! (too general) • posterior is generally simpler than class conditional(see transparency)

  18. Remember PCA? x2 • Principal components are eigenvectors of covariance matrix • Use reconstruction error for recognition (e.g. Eigenfaces) • good • reduces dimensionality • bad • no model within subspace • linearity may be inappropriate • covariance not appropriate to optimize discrimination x u1  x1

  19. Fisher’s Linear Discriminant • Goal: Reduce dimensionality before training classifiers etc. (Feature Selection) • Similar goal as PCA! • Fisher has classification in mind… • Find projection directions such that separation is easiest • Eigenfaces vs. Fisherfaces x2 x1

  20. Fisher’s Linear Discriminant • Assume we have n d-dimensional samples x1,…,xn • n1from set (class) X1 and n2 from set X2 • we form linear combinations: • and obtain y1…,yn • only direction of w is important

  21. Objective for Fisher • Measure the separation as the distance between the means after projecting (k = 1,2): • Measure the scatter after projecting: • Objective becomes to maximize

  22. We need to make the dependence on w explicit: • Defining the within-class scatter matrix, SW=S1+S2, we obtain • Similarly for the separation (between-class scatter matrix) • Finally we can write

  23. Fisher’s Solution • Is called a generalized Rayleigh quotient. Any w that maximizes J must satisfy the generalized eigenvalue problem • Since SB is very singular (rank 1), and SBw is in the direction of (m1-m2), we are done:

  24. Comments on FLD • We did not follow Bayes Decision Theory • FLD is useful for many types of densities • Fisher can be extended (see demo): • more than one projection direction • more than two clusters • Let’s try it out: Matlab Demo

  25. Fisher vs. Bayes • Assume we do have identical Gaussian class densities, then Bayes says: • while Fisher says: • Since SW is proportional to the covariance matrix, w is in the same direction in both cases. • Comforting...

  26. What have we achieved? • Found out that maximum posterior strategy is optimal. Always. • Looked at different cases of Gaussian class densities, where we could derive simple decision rules. • Gaussian classifiers do reasonable jobs! • Learned about FLD which is useful and often preferable to PCA.

  27. Just for Fun: Support Vector Machine x2 • Very fashionable…s.o.t.a? • Does not model densities • Fits decision surface directly • Maximizes margin  reduces “complexity” • Decision surface only depends on nearby samples • Matlab Demo x1

  28. p(x,y) Learning Algorithms Examples: (xi,yi) y = f(x) f = ? Learning Algorithm Learned function Set of functions

  29. Assume Unknown Class Densitites • SVM Examples • Densitites are hard to estimate -> avoid it • example from Ripley • Give intuitions on overfitting • Need to learn • Standard machine learning problem • Training/Test sets

More Related