Introduction to Pattern RecognitionChapter 1 (Duda et al.) CS479/679 Pattern RecognitionDr. George Bebis
What is Pattern Recognition? • Assign a pattern to one of several pre-specified categories (or classes). Male or Female?
What is Pattern Recognition? (cont’d) Photograph or not?
What is Pattern Recognition? (cont’d) Character Recognition Classes: All characters
What is Pattern Recognition? (cont’d) Speech Understanding Classes: phonemes
Classification vsClustering • Classification (known categories) • Clustering (creation of new categories) Category “A” Category “B” Clustering (Unsupervised Classification) Classification (Recognition) (Supervised Classification)
What is a Pattern? • A pattern could be an object or event. biometric patterns hand gesture patterns
What is a Pattern? (con’t) • Loan/Credit card applications • Income, # of dependents, mortgage amount, credit worthiness • Dating services • Age, hobbies, income, etc. establish your “desirability” • Web documents • Key-word based descriptions (e.g., documents containing “terrorism”, “Osama” are different from those containing “football”, “NFL”).
Pattern Class • A collection of “similar” objects • Challenges in modeling classes: • Intra-class variability • Inter-class variability The letter “T” in different typefaces Letters/Numbers that look similar
Modeling Pattern Classes • A description of the class, typically expressed in terms of a statistical model. • e.g., probability density function (Gaussian) gender classification male female
Key PR Objectives • Hypothesize the models that describe each pattern class (e.g., recover the process that generated the patterns). • Given a novel pattern, choose the best-fitting model for it and then assign it to the pattern class associated with the model.
Face Detection Unbalanced classes: faces vs non-faces
Gender Classification Balanced classes: male vs female
“Hot” Pattern Recognition Applications • Recommendation systems • Amazon, Netflix • Targeted advertising
The Netflix Prize • Movie recommendation system • Predict how much someone is going to enjoy a movie based on their movie preferences • $1M awarded in Sept. 2009 • Can software recommend movies to customers? • Not Rambo to Woody Allen fans • Not Saw VI if you’ve seen all previous Saw movies
Main PR Areas • Template matching • The pattern to be recognized is matched against a stored template while taking into account all allowable pose (translation and rotation) and scale changes. • Statistical pattern recognition • Focuses on the statistical properties of the patterns (i.e., probability densities). • Syntactic pattern recognition • Describe complicated objects in terms of simple primitives; decisions consist of logical rules or grammars. • Artificial Neural Networks • Inspired by biological neural network models.
Template Matching Template Input scene
Deformable Template (Corpus Callosum Segmentation) Prototype registration to the low-level segmented image Prototype and variation learning Shape training set Prototype warping
Statistical Pattern Recognition • Patterns are represented in some feature space. • Each class is modeled using a statistical model.
Syntactic Pattern Recognition • Represent patterns in terms of simple primitives. • Describe patterns using deterministic grammars or formal languages.
Artificial Neural Networks (ANNS) Human take only a few hundred ms for most cognitive tasks. Suggests that massive parallelism is essential for complex pattern recognition tasks (e.g., speech and image recognition) Biological networks attempt to achieve good performance via dense interconnection of simple computational elements (neurons) Number of neurons 1010 – 1012 Number of interconnections/neuron 103 – 104 Total number of interconnections 1014
Artificial Neural Nodes Nodes in neural networks are nonlinear: where is an internal threshold x1 w1 x2 Y (output) xd wd
Multilayer ANNs • Feed-forward nets with one or more layers (hidden) between the input and output nodes. • A three-layer net can generate arbitrary complex decision regions. • These nets can be trained by the back-propagation training algorithm. . . . . . . . . . c outputs d inputs First hidden layer NH1 input units Second hidden layer NH2 input units
Comparing DifferentPattern Recognition Approaches • Template Matching • Assumes small intra-class variability. • Learning is difficult for deformable templates. • Statistical • Assumption of statistical model for each class. • Syntactic • Primitive extraction is sensitive to noise. • Describing patterns in terms of primitives is difficult. • Artificial Neural Network • Parameter tuning and local minima in learning.
Complexity of PR – An Example Problem: Sorting incoming fish on a conveyor belt. Assumption: Two kind of fish: (1) sea bass (2) salmon
Pre-processing (1) image enhancement (2) separating touching or occluding fish (3) finding the boundary of each fish
Feature Extraction • Assume a fisherman told us that a sea bass is generally longer than a salmon. • We can use length as a feature and decide between sea bass and salmon according to a threshold on length. • How can we choose this threshold?
Decision Using Length • Histograms of “length” for two types of fish in training samples. • How can we choose the threshold l* to make a reliable decision?
Feature Extraction (cont’d) • Even though sea bass is longer than salmon on the average, there are many examples of fish where this observation does not hold. • Maybe use average lightness.
Decision Using Average Lightness • Histograms of the lightness feature for two types of fish in training samples. • It seems easier to choose the threshold x* but we still cannot make a perfect decision.
Cost of Error (Miss-classification) • There are two possible classification errors. (1) Deciding the fish was a sea bass when it was a salmon. (2) Deciding the fish was a salmon when it was a sea bass. • Which error is more important ?
Cost of Error (Miss-classification) • It depends; e.g., if the fish packing company knows that: • Customers who buy salmon will object vigorously if they see sea bass in their cans. • Customers who buy sea bass will not be unhappy if they occasionally see some expensive salmon in their cans. • How does this knowledge affect our decision?
Decision Using Multiple Features • To improve recognition, we might have to use more than one feature at a time. • Single features might not yield the best performance. • Combinations of features might yield better performance.
Decision Boundary • Scatter plot of lightness and width features for training samples. • We can partition the feature space into two regions by finding the decision boundary that minimizes the error.
How Many Features and Which? • Does adding more features always improve performance? • It might be difficult and computationally expensive to extract certain features. • Correlated features do not improve performance. • “Curse” of dimensionality …
Curse of Dimensionality • Adding too many features can, paradoxically, lead to a worsening of performance. • Divide each of the input features into a number of intervals, so that the value of a feature can be specified approximately by saying in which interval it lies. • If each input feature is divided into M divisions, then the total number of cells is Md (d: # of features) which grows exponentially with d. • Since each cell must contain at least one point, the number of training data grows exponentially!
Decision Boundary (Model) Complexity • We can get perfect classification performance on the training data by choosing complex models. • Complex models are tuned to the particular training samples, rather than on the characteristics of the true model. overfitting How well can we generalize to unknown samples?
Generalization • The ability of a classifier to produce correct results on novel patterns. • How can we improve generalization performance ? • More training examples (i.e., better model estimates). • Simpler models usually yield better performance. simpler model complex model
More on model complexity Regression example: plot of 10 sample points for the input variable x along with the corresponding target variable t (assuming some noise). Green curve is the true function that generated the data.
More on model complexity (cont’d) Polynomial curve fitting: plots of polynomials having various orders, shown as red curves, fitted to the set of 10 sample points.
More on complexity (cont’d) Polynomial curve fitting: plots of 9’th order polynomials fitted to 15 and 100 sample points. 9’th order polynomials
PR System Test Phase Training Phase
PR System (cont’d) • Sensing: • Use a sensor (camera or microphone) • PR depends on bandwidth, resolution, sensitivity, distortion of the sensor. • Pre-processing: • Removal of noise in data. • Segmentation (i.e., isolation of patterns of interest from background).