Elements of Pattern Recognition CNS/EE-148 -- Lecture 5 M. Weber P. Perona

Elements ofPattern RecognitionCNS/EE-148 -- Lecture 5M. WeberP. Perona

What is Classification? • We want to assign objects to classes based on a selection of attributes (features). • Examples: • (age, income)  {credit worthy, not credit worthy} • (blood cell count, body temp)  {flue, hepatitis B, hepatitis C} • (pixel vector)  {Bill Clinton, coffee cup} • Feature vector can be continuous, discrete or mixed.

x2 Signal 1 Signal 2 x1 Noise What is Classification? • Want to find a function from measurements to class labels decision boundary. Space of Feature Vectors • Statistical methods use pdf:p(C,x) • Assume p(C,x) known for now

Some Terminology • p(C) is called a prior or a priori probability • p(x|C) is called a class-conditional density or likelihood of C with respect to x • p(C|x) is called a posterior or a posteriori probability

Examples • One measurement, symmetric cost, equal priors bad p(x|C2) p(x|C1) x

Examples • One measurement, symmetric cost, equal priors good p(x|C2) p(x|C1) x

How to Make the Best Decision? (Bayes Decision Theory) • Define a cost function for mistakes, e.g. • Minimize expected loss (risk) over entire p(C,x). • Sufficient to assure optimal decision for each individual x. • Result: decide according to maximum posterior probability:

Two Classes, C1, C2 • It is helpful to consider the likelihood ratio: • Use known priors p(Ci) or ignore them. • For more elaborate loss function (proof is easy): • g(x) is called a discriminant function ?

Discriminant Functions for Multivariate Gaussian Class Conditional Densities • Two multivariate Gaussians in d dimensions • Since log is monotonic, we can look at log g(x). Mahalanobis Distance2 superfluous

iso-distance lines = iso-probability lines Decision surface: Mahalanobis Distance x2 2 1 x1 decision surface

Case 1: i = 2I • Discriminant functions… • …simplify to:

Decision Boundary • If 2=0, we obtain...The matched filter! With an expression for the threshold.

Two Signals and Additive White Gaussian Noise x2 Signal 1 1-2 1 x x-2 Signal 2 2 x1

Case 2: i =  • Two classes, 2D measurements, p(x|C) are multivariate Gaussians with equal covariance matrices. • Derivation is similar • Quadratic term vanishes since it is independent of class • We obtain a linear decision surface • Matlab demo

Case 3: General Covariance Matrix • See transparency

Isn’t this to simple? • Not at all… • It is true that images form complicated manifolds (from a pixel point of view, translation, rotation and scaling are all highly non-linear operations) • The high dimensionality helps

Assume Unknown Class Densitites • In real life, we do not know the class conditional densities. • But we do have example data. • This puts us in the typical machine learning scenario:We want to learn a function, c(x), from examples. • Why not just estimate class densities from examples and apply the previous ideas? • Learn Gaussian (simple density): in N dimensions need N2 samples at least! • 10x10 pixels 10,000 examples! • Avoid estimating densities whenever you can! (too general) • posterior is generally simpler than class conditional(see transparency)

Remember PCA? x2 • Principal components are eigenvectors of covariance matrix • Use reconstruction error for recognition (e.g. Eigenfaces) • good • reduces dimensionality • bad • no model within subspace • linearity may be inappropriate • covariance not appropriate to optimize discrimination x u1  x1

Fisher’s Linear Discriminant • Goal: Reduce dimensionality before training classifiers etc. (Feature Selection) • Similar goal as PCA! • Fisher has classification in mind… • Find projection directions such that separation is easiest • Eigenfaces vs. Fisherfaces x2 x1

Fisher’s Linear Discriminant • Assume we have n d-dimensional samples x1,…,xn • n1from set (class) X1 and n2 from set X2 • we form linear combinations: • and obtain y1…,yn • only direction of w is important

Objective for Fisher • Measure the separation as the distance between the means after projecting (k = 1,2): • Measure the scatter after projecting: • Objective becomes to maximize

We need to make the dependence on w explicit: • Defining the within-class scatter matrix, SW=S1+S2, we obtain • Similarly for the separation (between-class scatter matrix) • Finally we can write

Fisher’s Solution • Is called a generalized Rayleigh quotient. Any w that maximizes J must satisfy the generalized eigenvalue problem • Since SB is very singular (rank 1), and SBw is in the direction of (m1-m2), we are done:

Comments on FLD • We did not follow Bayes Decision Theory • FLD is useful for many types of densities • Fisher can be extended (see demo): • more than one projection direction • more than two clusters • Let’s try it out: Matlab Demo

Fisher vs. Bayes • Assume we do have identical Gaussian class densities, then Bayes says: • while Fisher says: • Since SW is proportional to the covariance matrix, w is in the same direction in both cases. • Comforting...

What have we achieved? • Found out that maximum posterior strategy is optimal. Always. • Looked at different cases of Gaussian class densities, where we could derive simple decision rules. • Gaussian classifiers do reasonable jobs! • Learned about FLD which is useful and often preferable to PCA.

Just for Fun: Support Vector Machine x2 • Very fashionable…s.o.t.a? • Does not model densities • Fits decision surface directly • Maximizes margin  reduces “complexity” • Decision surface only depends on nearby samples • Matlab Demo x1

p(x,y) Learning Algorithms Examples: (xi,yi) y = f(x) f = ? Learning Algorithm Learned function Set of functions

Assume Unknown Class Densitites • SVM Examples • Densitites are hard to estimate -> avoid it • example from Ripley • Give intuitions on overfitting • Need to learn • Standard machine learning problem • Training/Test sets

Elements of Pattern Recognition CNS/EE-148 -- Lecture 5 M. Weber P. Perona

Elements of Pattern Recognition CNS/EE-148 -- Lecture 5 M. Weber P. Perona

Presentation Transcript

Document Analysis: Non Parametric Methods for Pattern Recognition

Handwriting Recognition

Pattern Recognition and Machine Learning-Chapter 13: Sequential Data

Gait Recognition

Pattern Recognition

REVENUE RECOGNITION

Lecture 10 - Evolution of Allorecognition What gave rise to our system of allo-recognition??

Problem solving Lecture 3

Elements of Design 1.02 Investigate Design Principles and Elements

Recognition with Expression Variations

Design and Implementation of Speech Recognition Systems

Lecture 16. The Halogens

TEXTAL: A System for Automated Model Building Based on Pattern Recognition

Pattern recognition lab 6

CS479/679 Pattern Recognition Spring 2013 – Dr. George Bebis

CS 210

Studies on stand-alone Si tracking with SVT

Pattern recognition lab 8

CSSE463: Image Recognition Day 11