Pattern Recognition: Statistical and Neural

Nanjing University of Science & Technology Pattern Recognition:Statistical and Neural Lonnie C. Ludeman Lecture 10 Sept 28, 2005

P(C2) P(C2) 0 NMAP = NMPE= P(C1) P(C1) (C22 - C12 ) P(C2) NBAYES = (C11 - C21 ) P(C1) Review 3: MAP, MPE , Bayes, Neyman Pearson Classification Rules C1 > If l( x ) N < C2 Threshold Likelihood ratio = p(x | C2 ) dx R1( NNP)

Lecture 10 Topics 1. Gaussian Random variables and Vectors 2. General Gaussian Problem: 2-Class Case Special Cases: Quadratic and Linear Classifiers 3. Mahalanobis Distance 4.General Gaussian: M-Class Case Special Cases: Quadratic and Linear Classifiers

Gaussian (Normal) Random Variable: X ~ N(m, s2) X is a Gaussian (Normal) Random Variable if its probability density function pX(x) is given by 1 exp { - (x - m) 2 } pX(x) = 2s2 2 s X =Random Variable m = Mean Value s2 = Variance

General Gaussian Density: x ~ N(M, K) The random vector X is normal (Gaussian) distributed if its density function p(x)is given by 1 1 2 exp( - (x - M)TK-1(x – M) ) p(x) = N 2 1 2 (2 ) K x =[ x1, x2, … , xN ]T Pattern Vector M =[ m1, m2, … , mN ]T Mean Vector k11 k12 … k1N k21 k22 … k2N Covariance Matrix K = kN1 kN2 … kNN

Properties of Covariance Martix For j , k = 1, 2, … , M kjk = E[ ( xj – mj ) ( xk – mk ) ] Covariance kjj = E[ ( xj – mj )2 ] Component Variance K is a positive definite Matrix K has positive eigen values

General Gaussian Problem: 2-Class Case The random vector X is normally (Gaussian) distributed under both classes C1 :X ~ N( M1, K1 ) 1 1 2 p(x|C1) = exp(- (x – M1)TK1-1(x – M1) ) N 2 1 2 (2 ) K1 C2 :X ~ N( M2, K2 ) 1 1 2 exp(- (x – M2)TK2-1(x – M2) ) p(x|C2) = N 2 1 2 (2 ) K2

General Gaussian Framework: 2-Class Case A. Assumptions: C1 :X ~ N( M1, K1 ) , P(C1) C2 :X ~ N( M2, K2 ) , P(C2) B. Performance Measure: MAP, P(errror), Risk, PD , MiniMax C: Optimum Classification: Min or Max

1 1 2 exp(- (x – M1)TK1-1(x – M1) ) N 2 p(x|C1) (2 ) = p(x|C2) 1 1 2 exp(- (x – M2)TK2-1(x – M2) ) N 2 1 2 (2 ) K2 Optimum Decision Rule:2-Class Gaussian Derivation of Optimum Decision Rule which is a likelihood ratio test- threshold determined by type 1 2 K1 C1 1 2 K2 exp(- (x – M1)TK1-1(x – M1) ) > ½ T < if 1 2 exp(- (x – M2)TK2-1(x – M2) ) ½ K1 C2

Optimum Decision Rule: 2-Class Gaussian C1 if > T1 - (x – M1)TK1-1(x – M1) + (x – M2)TK2-1(x – M2) < C2 Quadratic Processing 1 2 K1 T1 = 2 ln(T ) = 2 lnT + ln - ln where K1 K2 1 2 K2 And T is the optimum threshold for the type of performance measure used

P(C2) P(C2) 0 NMAP= NMPE= P(C1) P(C1) (C22 - C12 ) P(C2) NBAYES= (C11 - C21 ) P(C1) T = NMAPorNBAYESor NMPEor NNP = p(x | C2 ) dx R1( NNP)

Mahalanobis Distance: Definition Given two N-Vectors x and y the Mahalanobis Distance dMAH(x,y) is defined by dMAH(x, y) = (x – y)TA-1(x – y) If A = the identity Matrix then dMAH(x, y) = dEUCLIDIAN(x, y) = (x – y)T(x – y)

2-Class Gaussian: SpecialCase 1: K1 = K2 = K Equal Covariance Matrices C1 > T2 if ( M1 – M2)T K-1 x < C2 Linear Processing T2 = ln T + ½ ( M1T K-1 M1 – M2T K-1 M2) And T is the optimum threshold for the type of performance measure used

2-Class Gaussian:Case 2: K1 = K2 = K = s2 I Equal Scaled Identity Covariance Matrices C1 > T3 if ( M1 – M2)T x < C2 Linear Processing T3 = s2 ln T + ½ ( M1T M1 – M2T M2) And T is the optimum threshold for the type of performance measure used

2-Class Case Gaussian: Case 3: K1 = K2 = K = s2 I MPE or Bayes 0,1 costs with P(C1) = P(C2) C1 > T4 if ( M1 – M2)T x < C2 Linear Processing T4 = ½ ( M1T M1 – M2T M2) And T is the optimum threshold for the type of performance measure used

General Gaussian: M-Class Case A. Asssumptions C1 :X ~ N( M1, K1 ) , P(C1) 1 exp(-½ (x – M1)TK1-1(x – M1) ) p(x|C1) = N/2 ½ (2 ) K1 C2 :X ~ N( M2, K2 ) , P(C2) 1 exp(- ½ (x – M2)TK2-1(x – M2) ) p(x|C2) = ½ N/2 (2 ) K2

CM :X ~ N( MM, KM ) , P(CM) 1 exp(- ½ (x – MM)TKM-1(x – MM) ) p(x|CM) = ½ N/2 (2 ) KM B: Performance Measue : P(error) C: Decision Rule: Minimum P(error)

General Gaussian: M-Class Case C: Optimum MPE Decision Rule Derivation Selects class Ck if p(x | Ck) P(Ck) > p(x | Cj) P(Cj) for all j = k where P(Cj) exp{-½ (x – Mj)TKj-1(x – Mj) } p(x |Cj) P(Cj) = N/2 ½ (2 ) Kj

M- Class General Gaussian - Continued Define equivalent statistic: Sj(x) for j = 1, 2, … , M ½ Sj(x) = P(Cj) exp{-½ (x – Mj)TKj-1(x – Mj) } / Kj Another equivalent statistic: Qj(x) for j = 1, 2, … , M Select ClassCj if Qj(x) is MINIMUM Qi(x) = (x – Mj)TKj-1(x – Mj) } – 2 ln P(Cj) + ln | Ki | 2 dMAH(x , Mj) Bias Quadratic Operation on observation vector x

M-Class Gaussian: Case 1: K1 = K2 = … = KM = K Define equivalent statistic: Sj/(x) for j = 1, 2, … , M Sj/(x) = P(Cj) exp{-½ (x – Mj)TK-1(x – Mj) } Define equivalent statistic: Sj//(x) for j = 1, 2, … , M Sj//(x) = (x – Mj)TK-1(x – Mj) – 2 lnP(Cj)

2 dMAH(x, y) Gaussian M-Class: Case 1: K1 = K2 = … = KM = K Equivalent Decision Rule Compute min of (x – Mj)TK-1(x – Mj) - 2 lnP(Cj) Select class Cj with minimum value bias

Case 1a: K1 = K2 = … = KM = K (Continued) Compute min of (x – Mj)TK-1(x – Mj) = xTK-1x – x K-1 Mj - MjTK-1x + MjTK-1Mj Same terms Same for each class Select ClassCj if following is minimum - 2MjTK-1x + Mj K-1MjT – 2 lnP(Cj)

M-Class Gaussian – Case 1: K1 = K2 = … = KM = K Equivalent Rule Select ClassCj if Lj(x) is MAXIMUM Lj(x) = MjTK-1x – ½ MjTK-1Mj+ lnP(Cj) Dot Product Bias Linear Operation on observation vector x

Summary 1. Gaussian Random variables and Vectors 2. General Gaussian Problem: 2-Class Case Special Cases: Quadratic and Linear Classifiers 3. Mahalanobis Distance 4.General Gaussian: M-Class Case Special Cases: Quadratic and Linear Classifiers

End of Lecture 10

Pattern Recognition: Statistical and Neural