Maximizing Pattern Recognition Performance: A Deeper Dive into Classification Rules

Nanjing University of Science & Technology Pattern Recognition:Statistical and Neural Lonnie C. Ludeman Lecture 5 Sept 21, 2005

Review 1. Basic Pattern Classification Structure

Review 2: Classifier performance Measures 1. A’Posterioi Probability (Maximize) 2. Probability of Error ( Minimize) 3. Bayes Average Cost (Maximize) 4. Probability of Detection ( Maximize with fixed Probability of False alarm) (Neyman Pearson Rule) 5. Losses (Minimize the maximum)

Review 3: The Maximum A’Posteriori Classification Rule gives a likelihood ratio test as optimum.

Topics for Today: 1. Maximum A’ Posteriori Classifier Derivation 2. MAP Classifier Example 3. Introduce Minimum Probability of error Classifier

Maximum A’Posteriori Classification Rule (2 Class Case ) Derivation A. Basic Assumptions: Know : Conditional Probability Density functions pX( x | C1) and pX( x | C2 ). Know : A’Priori Probabilities P( C1 ) and P( C2 ) Performance Measure: A’posteriori Probability P( Ci | x )

Maximum A’Posteriori Classification Rule (2 Class Case ) B. Decision Rule for an observed vector x if P(C1 | x) > P(C2 | x ) then decide x from C1 if P(C1 | x) < P(C2 | x ) then decide x from C2 if P(C1 | x) = P(C2 | x ) then decide x from C1 or C2 randomly Shorthand Notation C1 > if P(C1 | x) P(C2 | x ) < C2 >

Maximum A’Posteriori Classification Rule (2 Class Case ) A. Basic Assumptions: Know : Conditional Probability Density functions pX( x | C1) and pX( x | C2 ). Know : A’Priori Probabilities P( C1 ) and P( C2 ) Use Performance Measure: A’posteriori Probability P( Ci | x )

Derivation of MAP Decision Rule C1 > If p( x | C1 ) P(C1 ) / p( x ) p( x | C2 ) P(C2 ) / p( x ) < C2 > Simplifies to the following equivalent rule C1 > If p( x | C1 ) P(C1 ) p( x | C2 ) P(C2 ) < C2 > Likelihood for C1 Likelihood for C2 Which can be rearranged into the following C1 > If p( x | C1 ) / p( x | C2 ) P(C2 ) / P(C1 ) < C2 >

The MAP Decision Rule as a Likelihood Ratio Test (LRT) Define the Likelihood Ratio l ( x ) = p( x | C1 ) / p( x | C2 ) and the Threshold as N = P(C2 ) / P(C1 ) Then the Maximum A’Posteriori decision rule is a Likelihood ratio Test given by > C1 > If l( x ) N < C2 Likelihood ratio Threshold

Conceptual drawing for Likelihood ratio test ( scalar function of a vector ) L(x) Think scalar T x Think Vector C2 C1 C2 C1

2. Example: Maximum A’Posterior (MAP) Decision Rule Classification of the sex (male or female ) of a person walking into a Department store by using only a height measurement. Light Sources Light Sensors Doorway

Assumptions: A’Priori Probabilities Conditional Probability Densitiy functions Find Maximum Aposteriori Decision Rule

Assumed Conditional Probability Density Functions

Assumed A’priori Probabilities P ( female ) = 0.5 P ( male ) = 0.5 Males and females are assumed equally likely

Error for MAP decision Rule

3. Introduction to Minimum Probability of Error Classification Rule (2 Class Case ) Basic Assumptions: Known conditional probability density functions p(x | C1) p(x | C2) Known a’priori probabilities P(C1) P(C2) Performance: (Probability of Error) P(error) = p(error | C1) P(C1) + P(error | C2) P(C2) Decision Rule:Minimizes P(error)

Shorthand Notation: C1 : x ~ p(x | C1), P(C1) C2 : x ~ p(x | C2), P(C2) Minimize P(error)

Solution : Minimum Probability of Error Classification Rule (2 Class Case ) Selects decision regions such that P(error) is minimized Decide C1 R2 R1 Decide C2 Pattern Space X P(error) = p(error | C1) P(C1) + P(error | C2) P(C2) = p(x | C1 ) dx P(C1) + p( x | C2 ) dx P(C2) R2 R1

Summary: We looked at the following Maximum A’ Posteriori Classifier Derivation MAP Classifier Example Introduce Minimum Probability of error Classifier

End of Lecture 5

Maximizing Pattern Recognition Performance: A Deeper Dive into Classification Rules