CS775: Advanced Pattern Recognition Slide set 1

CS775: Advanced Pattern RecognitionSlide set 1 Daniel Barbará

Chapter 1 Introduction to Pattern Recognition (Sections 1.1-1.6 DHS)

Machine Perception • Build a machine that can recognize patterns: • Speech recognition • Fingerprint identification • OCR (Optical Character Recognition) • DNA sequence identification

An Example • “Sorting incoming Fish on a conveyor according to species using optical sensing” Sea bass Species Salmon

Problem Analysis • Set up a camera and take some sample images to extract features • Length • Lightness • Width • Number and shape of fins • Position of the mouth, etc… • This is the set of all suggested features to explore for use in our classifier!

Preprocessing • Use a segmentation operation to isolate fishes from one another and from the background • Information from a single fish is sent to a feature extractor whose purpose is to reduce the data by measuring certain features • The features are passed to a classifier

Classification • Select the length of the fish as a possible feature for discrimination

The length is a poor feature alone! Select the lightness as a possible feature.

Threshold decision boundary and cost relationship • Move our decision boundary toward smaller values of lightness in order to minimize the cost (reduce the number of sea bass that are classified salmon!) Task of decision theory

Adopt the lightness and add the width of the fish Fish xT = [x1, x2] Lightness Width

We might add other features that are not correlated with the ones we already have. A precaution should be taken not to reduce the performance by adding such “noisy features” • Ideally, the best decision boundary should be the one which provides an optimal performance such as in the following figure:

However, our satisfaction is premature because the central aim of designing a classifier is to correctly classify novel input Issue of generalization!

Pattern Recognition Systems • Sensing • Use of a transducer (camera or microphone) • PR system depends of the bandwidth, the resolution sensitivity distortion of the transducer • Segmentation and grouping • Patterns should be well separated and should not overlap

Learning and Adaptation • Supervised learning • A teacher provides a category label or cost for each pattern in the training set • Unsupervised learning • The system forms clusters or “natural groupings” of the input patterns

Conclusion • Reader seems to be overwhelmed by the number, complexity and magnitude of the sub-problems of Pattern Recognition • Many of these sub-problems can indeed be solved • Many fascinating unsolved problems still remain

Chapter 2 DHS (Part 1): Bayesian Decision Theory(Sections 2.1-2.2) Introduction Bayesian Decision Theory–Continuous Features

Basic problem • Produce a machine capable of assigning x to one of many classes (w1,w2,…,wc) • Supervised learning: you are given a series of examples (xi,yi) where yi is one of wk

Inference and Decision Three distinct approaches to solve decision problems • Solve the inference problem determining P(x|wi) for each class. Apply Bayes to get P(wi|x). GENERATIVE MODELS • Solve the inference problem by determining P(wi|x) directly. DISCRIMINATIVE MODELS • Find a function f(x) –called discriminant function- which maps each input into a class label 1 and 2 are followed by the use of DECISION THEORY

Introduction • The sea bass/salmon example • State of nature, prior • State of nature is a random variable • The catch of salmon and sea bass is equiprobable • P(1) = P(2) (uniform priors) • P(1) + P( 2) = 1 (exclusivity and exhaustivity)

Decision rule with only the prior information • Decide 1 if P(1) > P(2) otherwise decide 2 • Use of the class –conditional information • P(x | 1) and P(x | 2) describe the difference in lightness between populations of sea and salmon

Bayes • Posterior, likelihood, evidence • P(j | x) = P(x | j) . P (j) / P(x) • Where in case of two categories • Posterior = (Likelihood. Prior) / Evidence

Minimum Misclassification Rate

Decision given the posterior probabilities X is an observation for which: if P(1 | x) > P(2 | x) True state of nature = 1 if P(1 | x) < P(2 | x) True state of nature = 2 Therefore: whenever we observe a particular x, the probability of error is : P(error | x) = P(1 | x) if we decide 2 P(error | x) = P(2 | x) if we decide 1

Bayesian Decision Theory – Continuous Features • Generalization of the preceding ideas • Use of more than one feature • Use more than two states of nature • Allowing actions and not only decide on the state of nature • Introduce a loss of function which is more general than the probability of error

Allowing actions other than classification primarily allows the possibility of rejection • Refusing to make a decision in close or bad cases! • The loss function states how costly each action taken is

Reject Option

Let {1, 2,…, c} be the set of c states of nature (or “categories”) Let {1, 2,…, a}be the set of possible actions Let (i | j)be the loss incurred for taking action i when the state of nature is j

Overall risk R = Sum of all R(i | x) for i = 1,…,a Minimizing R Minimizing R(i| x) for i = 1,…, a for i = 1,…,a Conditional risk

Select the action i for which R(i | x) is minimum R is minimum and R in this case is called the Bayes risk = best performance that can be achieved!

Two-category classification 1: deciding 1 2: deciding 2 ij = (i|j) loss incurred for deciding iwhen the true state of nature is j Conditional risk: R(1 | x) = 11P(1 | x) + 12P(2 | x) R(2 | x) = 21P(1 | x) + 22P(2 | x)

Our rule is the following: if R(1 | x) < R(2 | x) action 1: “decide 1” is taken This results in the equivalent rule : decide 1if: (21- 11) P(x | 1) P(1) > (12- 22) P(x | 2) P(2) and decide2 otherwise

Likelihood ratio Likelihood ratio: The preceding rule is equivalent to the following rule: Then take action 1 (decide 1) Otherwise take action 2 (decide 2)

Optimal decision property “If the likelihood ratio exceeds a threshold value independent of the input pattern x, we can take optimal actions”

Exercise Select the optimal decision where: • = {1, 2} P(x | 1) N(2, 0.5) (Normal distribution) P(x | 2) N(1.5, 0.2) P(1) = 2/3 P(2) = 1/3

Chapter 2 (Part 2): Bayesian Decision Theory(Sections 2.3-2.5) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density

Minimum-Error-Rate Classification • Actions are decisions on classes If action i is taken and the true state of nature is j then: the decision is correct if i = j and in error if i  j • Seek a decision rule that minimizes the probability of errorwhich is the error rate

Introduction of the zero-one loss function: Therefore, the conditional risk is: “The risk corresponding to this loss function is the average probability error” 

Minimize the risk requires maximize P(i | x) (since R(i | x) = 1 – P(i | x)) • For Minimum error rate • Decide i if P (i | x) > P(j | x) j  i

Regions of decision and zero-one loss function, therefore: • If  is the zero-one loss function which means:

CS775: Advanced Pattern Recognition Slide set 1

CS775: Advanced Pattern Recognition Slide set 1

Presentation Transcript

Pattern Recognition

Pattern recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern recognition

Pattern Recognition

Pattern Recognition