1 / 80

CS775: Advanced Pattern Recognition Slide set 1

CS775: Advanced Pattern Recognition Slide set 1. Daniel Barbará. Chapter 1 Introduction to Pattern Recognition. (Sections 1.1-1.6 DHS). Machine Perception. Build a machine that can recognize patterns: Speech recognition Fingerprint identification OCR (Optical Character Recognition)

irishunt
Download Presentation

CS775: Advanced Pattern Recognition Slide set 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS775: Advanced Pattern RecognitionSlide set 1 Daniel Barbará

  2. Chapter 1 Introduction to Pattern Recognition (Sections 1.1-1.6 DHS)

  3. Machine Perception • Build a machine that can recognize patterns: • Speech recognition • Fingerprint identification • OCR (Optical Character Recognition) • DNA sequence identification

  4. An Example • “Sorting incoming Fish on a conveyor according to species using optical sensing” Sea bass Species Salmon

  5. Problem Analysis • Set up a camera and take some sample images to extract features • Length • Lightness • Width • Number and shape of fins • Position of the mouth, etc… • This is the set of all suggested features to explore for use in our classifier!

  6. Preprocessing • Use a segmentation operation to isolate fishes from one another and from the background • Information from a single fish is sent to a feature extractor whose purpose is to reduce the data by measuring certain features • The features are passed to a classifier

  7. Classification • Select the length of the fish as a possible feature for discrimination

  8. The length is a poor feature alone! Select the lightness as a possible feature.

  9. Threshold decision boundary and cost relationship • Move our decision boundary toward smaller values of lightness in order to minimize the cost (reduce the number of sea bass that are classified salmon!) Task of decision theory

  10. Adopt the lightness and add the width of the fish Fish xT = [x1, x2] Lightness Width

  11. We might add other features that are not correlated with the ones we already have. A precaution should be taken not to reduce the performance by adding such “noisy features” • Ideally, the best decision boundary should be the one which provides an optimal performance such as in the following figure:

  12. However, our satisfaction is premature because the central aim of designing a classifier is to correctly classify novel input Issue of generalization!

  13. Pattern Recognition Systems • Sensing • Use of a transducer (camera or microphone) • PR system depends of the bandwidth, the resolution sensitivity distortion of the transducer • Segmentation and grouping • Patterns should be well separated and should not overlap

  14. Learning and Adaptation • Supervised learning • A teacher provides a category label or cost for each pattern in the training set • Unsupervised learning • The system forms clusters or “natural groupings” of the input patterns

  15. Conclusion • Reader seems to be overwhelmed by the number, complexity and magnitude of the sub-problems of Pattern Recognition • Many of these sub-problems can indeed be solved • Many fascinating unsolved problems still remain

  16. Chapter 2 DHS (Part 1): Bayesian Decision Theory(Sections 2.1-2.2) Introduction Bayesian Decision Theory–Continuous Features

  17. Basic problem • Produce a machine capable of assigning x to one of many classes (w1,w2,…,wc) • Supervised learning: you are given a series of examples (xi,yi) where yi is one of wk

  18. Inference and Decision Three distinct approaches to solve decision problems • Solve the inference problem determining P(x|wi) for each class. Apply Bayes to get P(wi|x). GENERATIVE MODELS • Solve the inference problem by determining P(wi|x) directly. DISCRIMINATIVE MODELS • Find a function f(x) –called discriminant function- which maps each input into a class label 1 and 2 are followed by the use of DECISION THEORY

  19. Introduction • The sea bass/salmon example • State of nature, prior • State of nature is a random variable • The catch of salmon and sea bass is equiprobable • P(1) = P(2) (uniform priors) • P(1) + P( 2) = 1 (exclusivity and exhaustivity)

  20. Decision rule with only the prior information • Decide 1 if P(1) > P(2) otherwise decide 2 • Use of the class –conditional information • P(x | 1) and P(x | 2) describe the difference in lightness between populations of sea and salmon

  21. Bayes • Posterior, likelihood, evidence • P(j | x) = P(x | j) . P (j) / P(x) • Where in case of two categories • Posterior = (Likelihood. Prior) / Evidence

  22. Minimum Misclassification Rate

  23. Decision given the posterior probabilities X is an observation for which: if P(1 | x) > P(2 | x) True state of nature = 1 if P(1 | x) < P(2 | x) True state of nature = 2 Therefore: whenever we observe a particular x, the probability of error is : P(error | x) = P(1 | x) if we decide 2 P(error | x) = P(2 | x) if we decide 1

  24. Minimizing the probability of error • Decide 1 if P(1 | x) > P(2 | x);otherwise decide 2 Therefore: P(error | x) = min [P(1 | x), P(2 | x)] (Bayes decision) Cannot do better than this!

  25. Bayesian Decision Theory – Continuous Features • Generalization of the preceding ideas • Use of more than one feature • Use more than two states of nature • Allowing actions and not only decide on the state of nature • Introduce a loss of function which is more general than the probability of error

  26. Allowing actions other than classification primarily allows the possibility of rejection • Refusing to make a decision in close or bad cases! • The loss function states how costly each action taken is

  27. Reject Option

  28. Let {1, 2,…, c} be the set of c states of nature (or “categories”) Let {1, 2,…, a}be the set of possible actions Let (i | j)be the loss incurred for taking action i when the state of nature is j

  29. Overall risk R = Sum of all R(i | x) for i = 1,…,a Minimizing R Minimizing R(i| x) for i = 1,…, a for i = 1,…,a Conditional risk

  30. Select the action i for which R(i | x) is minimum R is minimum and R in this case is called the Bayes risk = best performance that can be achieved!

  31. Two-category classification 1: deciding 1 2: deciding 2 ij = (i|j) loss incurred for deciding iwhen the true state of nature is j Conditional risk: R(1 | x) = 11P(1 | x) + 12P(2 | x) R(2 | x) = 21P(1 | x) + 22P(2 | x)

  32. Our rule is the following: if R(1 | x) < R(2 | x) action 1: “decide 1” is taken This results in the equivalent rule : decide 1if: (21- 11) P(x | 1) P(1) > (12- 22) P(x | 2) P(2) and decide2 otherwise

  33. Likelihood ratio Likelihood ratio: The preceding rule is equivalent to the following rule: Then take action 1 (decide 1) Otherwise take action 2 (decide 2)

  34. Optimal decision property “If the likelihood ratio exceeds a threshold value independent of the input pattern x, we can take optimal actions”

  35. Exercise Select the optimal decision where: • = {1, 2} P(x | 1) N(2, 0.5) (Normal distribution) P(x | 2) N(1.5, 0.2) P(1) = 2/3 P(2) = 1/3

  36. Chapter 2 (Part 2): Bayesian Decision Theory(Sections 2.3-2.5) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density

  37. Minimum-Error-Rate Classification • Actions are decisions on classes If action i is taken and the true state of nature is j then: the decision is correct if i = j and in error if i  j • Seek a decision rule that minimizes the probability of errorwhich is the error rate

  38. Introduction of the zero-one loss function: Therefore, the conditional risk is: “The risk corresponding to this loss function is the average probability error” 

  39. Minimize the risk requires maximize P(i | x) (since R(i | x) = 1 – P(i | x)) • For Minimum error rate • Decide i if P (i | x) > P(j | x) j  i

  40. Regions of decision and zero-one loss function, therefore: • If  is the zero-one loss function which means:

More Related