1 / 60

12 Discriminant Analysis

12 Discriminant Analysis. Discriminant Analysis. Introduction ✔ Two-group discriminant analysis Mahalanobis approach Linear discriminant analysis Discussions with LDA. Potential Applications. One of main methods for feature extraction in pattern recognition. Introduction.

nigel
Download Presentation

12 Discriminant Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 12 Discriminant Analysis

  2. Discriminant Analysis • Introduction ✔ • Two-group discriminant analysis • Mahalanobis approach • Linear discriminant analysis • Discussions with LDA

  3. Potential Applications One of main methods for feature extraction in pattern recognition

  4. Introduction

  5. Discriminant Analysis • Introduction ✔ • Two-group discriminant analysis ✔ • Mahalanobis approach • Linear discriminant analysis • Discussions with LDA

  6. Two-group discriminant analysis

  7. Two-group discriminant analysis

  8. Fisher’s approach (1936)

  9. Ronald Aylmer Fisher(1890-1962) • Brilliant biologists • Development of methods suitable for small samples • Discovery of the precise distributions of many sample statistics • Invention of analysis of variance.

  10. Fisher’s approach (1936) Choose k to maximize L: where Cb=dd’, d=μ2-μ1 is a vector describing the difference between the two group means, Cw is the pooled within-group covariance matrix of X. Solution: k∝Cw-1d

  11. Mechanics: Fisher’s approach (1936)

  12. Relationship to Regression

  13. Discriminant Analysis • Introduction ✔ • Two-group discriminant analysis ✔ • Mahalanobis approach ✔ • Linear discriminant analysis • Discussions with LDA

  14. Mahalanobis approach

  15. Mahalanobis approach

  16. Simple example

  17. Mahalanobis in general

  18. Simple example revisited

  19. Discriminant scores, hit/miss table

  20. Testing for equality of var/cov matrices

  21. What if var/cov matrices not equal?

  22. Multiple-group discriminant analysis

  23. Predicting new observations

  24. Discriminant Analysis • Introduction ✔ • Two-group discriminant analysis ✔ • Mahalanobis approach ✔ • Linear discriminant analysis ✔ • Discussions with LDA

  25. Linear discriminant analysis • Fisher vector [1936] • Linear discriminant analysis [1962] • Foley-Sammon optimal discriminant vectors [1975] • Uncorrelated optimal discriminant vectors [1999]

  26. d_opt Dimensionality problem The accuracy of statistical pattern classifiers increases as the number of features increases, and decreases as the number of features becomes too large. Hughes, 1968

  27. Feature selection and feature extraction • Reducing the number of features • Increasing discriminantinformation of features

  28. Fisher vector In 1936, Fisher proposed to construct a 1D feature space by projecting the high-dimensional feature vector on a discriminant direction (vector). Fisher criterion:

  29. Fisher vector (cont’d) Fisher criterion: where

  30. Fisher vector (cont’d) Suppose Xij be the jth sample of the class ωi (i=1,…,L, j=1,…, Ni)

  31. Fisher vector (cont’d) Fisher vector is the eigen-vector corresponding to maximum eigen-value of the following eigen-equation: • Problem • One discriminant vector is not enough.

  32. Linear discriminant analysis (LDA) Linear transformation fromntom: Discriminant classifiability criterion

  33. Linear discriminant analysis (LDA) (cont’d) Wilks (1962): • m=L-1 for L-class problems • The optimal transformation matrix is composed of (L-1) eigen-vectors of the matrix (Sw-1Sb) Question: Can we extract more than (L-1) discriminant vectors for L-class problems

  34. Foley-Sammon optimal discriminant vectors[1975] Let 1be Fisher vector. Suppose r directions (1 ,2, …,r)are obtained. We can obtain the (r+1)th direction r+1which maximizes the Fisher criterion function with the following orthogonality constraints:

  35. Uncorrelated optimal discriminant vectors[1999] Let 1be Fisher vector. Suppose r directions (1 , 2, …,r)are obtained. We can obtain the (r+1)th direction r+1which maximizes the Fisher criterion function with the following conjugateorthogonality constraints:

  36. Uncorrelated optimal discriminant vectors[cont’d] • UODV is shown to be much more powerful than FSODV • Arguments • From 1985 to 1991, Okada ,Tomita, Hamamoto, Kanaoka, et al. claimed that the orthgonal set of discriminant vectors is more powerful. • In 1977, Kittler proposed a method based on conjugate orthogonality constraints, which was showed to be powerful than FSODV[1975].

  37. Uncorrelated optimal discriminant vectors[cont’d] • There are (L-1) uncorrelated optimal discriminant vectors for L-class problems • UODV is equivalent to LDA (Pattern Recognition, 34(10):2041-2047, 2001)

  38. Uncorrelated optimal discriminant vectors[cont’d] Significance • A better understanding to LDA. • A link between two discriminant criteria:

  39. 人脸识别方向科研影响力统计http://sciencewatch.com/ana/st/face/人脸识别方向科研影响力统计http://sciencewatch.com/ana/st/face/

  40. 人脸识别方向科研影响力统计http://sciencewatch.com/ana/st/face/人脸识别方向科研影响力统计http://sciencewatch.com/ana/st/face/

  41. 人脸识别方向科研影响力统计http://sciencewatch.com/ana/st/face/人脸识别方向科研影响力统计http://sciencewatch.com/ana/st/face/

  42. Problems with LDA • For a large number of L, the dimensionality may be much less than (L-1). More work is needed to discuss the relationship among the dimensionality, the size of database, etc. • LDA fails for a simple example as follows. More work is needed to combine discriminant analysis with unsupervised clustering techniques.

  43. Key assumption for LDA • Each class has a mean vector around which the samples are distributed . • All the classes have similar covariance matrices. ?

  44. PCA mixture model + LDA Problem: Distribution in high dimensional space by using only a small number of samples.

More Related