170 likes | 269 Views
Delve into statistical pattern recognition methods including linear discriminant function, principal component analysis, and more. Understand the importance of feature selection, dimensionality reduction, and class-conditional probability. Explore the trade-offs in classification accuracy and measurement costs.
E N D
Review of Statistical Pattern Recognition Wen-Hung Liao 10/9/2007
Review Paper • A.K. Jain, R.P.W. Duin and J. Mao, “Statistical Pattern Recognition: A Review”, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 22, No. 1, pp. 4-37, Jan. 2000. • More review papers: http://www.ph.tn.tudelft.nl/PRInfo/revpapers.html
Statistical Approach in PR • Each pattern is represented in terms of d features and is viewed as a point in a d-dimensional feature space. • Goal: establish decision boundaries to separate patterns belonging to different classes. • Need to specify/estimate the probability distributions of the patterns.
Linear Discriminant Function Principal Component Analysis Nonlinear Discriminant Function Parzen Window Density-based Classifier Perceptron Auto-Associative Networks Multilayer Perceptron Radial Basis Function Network Links Between Statistical and Neural Network Methods
Model for Statistical Pattern Recognition Classification Feature Measurement Classification Preprocessing Training Feature Extraction /Selection Learning Preprocessing
The Curse of Dimensionality • The performance of a classifier depends on the relationship between sample sizes, number of features and classifier complexity. • Number of training data points be an exponential function of the feature dimension space.
Class-Conditional Probability • Length d feature vector: x = (x1,x2,…,xd) • C Classes (or categories):w1,w2,…,wc • Class-conditional probability:The probability of x happening given that it belongs to class wi: p(x|wi)
How Many Features are Enough? • Question: More features, better classification? • Answer: • Yes, if the class-conditional densities are completely known. • No, if we need to estimate the the class-conditional densities.
Dimensionality Reduction • Keep the number of features as small as possible (but not too small) due to: • measurement cost • classification accuracy • Always some trade-off
Feature Extraction/Selection • Feature Extraction: extract features from the sensed data • Feature Selection: select (hopefully) the best subset of the input feature set. • Feature extraction usually precedes selection • Application-domain dependent
Example: Chernoff Faces • Three classes of face • Feature set: Nose length, mouth curvature, eye size, face shape. • 150 4-d patterns, 50 patterns per class.