1 / 61

240-650 Principles of Pattern Recognition

240-650 Principles of Pattern Recognition. Montri Karnjanadecha montri@coe.psu.ac.th http://fivedots.coe.psu.ac.th/~montri. Chapter 2. Bayesian Decision Theory. Statistical Approach to Pattern Recognition. A Simple Example. Suppose that we are given two classes w 1 and w 2

tamas
Download Presentation

240-650 Principles of Pattern Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 240-650 Principles of Pattern Recognition Montri Karnjanadecha montri@coe.psu.ac.th http://fivedots.coe.psu.ac.th/~montri 240-650: Chapter 2: Bayesian Decision Theory

  2. Chapter 2 Bayesian Decision Theory 240-650: Chapter 2: Bayesian Decision Theory

  3. Statistical Approach to Pattern Recognition 240-650: Chapter 2: Bayesian Decision Theory

  4. A Simple Example • Suppose that we are given two classes w1 and w2 • P(w1) = 0.7 • P(w2) = 0.3 • No measurement is given • Guessing • What shall we do to recognize a given input? • What is the best we can do statistically? Why? 240-650: Chapter 2: Bayesian Decision Theory

  5. A More Complicated Example • Suppose that we are given two classes • A single measurement x • P(w1|x) and P(w2|x) are given graphically 240-650: Chapter 2: Bayesian Decision Theory

  6. A Bayesian Example • Suppose that we are given two classes • A single measurement x • We are given p(x|w1) and p(x|w2) this time 240-650: Chapter 2: Bayesian Decision Theory

  7. A Bayesian Example – cont. 240-650: Chapter 2: Bayesian Decision Theory

  8. Bayesian Decision Theory • Bayes formula • In case of two categories • In English, it can be expressed as 240-650: Chapter 2: Bayesian Decision Theory

  9. Bayesian Decision Theory – cont. • A posterior probability • The probability of the state of nature being given that feature value x has been measured • Likelihood • is the likelihood of with respect to x • Evidence • The evidence factor can be viewed as a scaling factor that guarantees that the posterior probabilities sum to one. 240-650: Chapter 2: Bayesian Decision Theory

  10. Bayesian Decision Theory – cont. • Whenever we observe a particular x, the prob. of error is • The average prob. of error is given by 240-650: Chapter 2: Bayesian Decision Theory

  11. Bayesian Decision Theory – cont. • Bayes decision rule Decide w1 if P(w1|x) > P(w2|x); otherwise decide w2 • Prob. of error P(error|x)=min[P(w1|x), P(w2|x)] • If we ignore the “evidence”, the decision rule becomes: Decide w1 if P(x|w1) P(w1) > P(x|w2) P(w2) Otherwise decide w2 240-650: Chapter 2: Bayesian Decision Theory

  12. Bayesian Decision Theory--continuous features • Feature space • In general, an input can be represented by a vector, a point in a d-dimensional Euclidean space Rd • Loss function • The loss function states exactly how costly each action is and is used to convert a probability determination into a decision • Written as 240-650: Chapter 2: Bayesian Decision Theory

  13. Loss Function • Describe the loss incurred for taking action ai when the state of nature is wj 240-650: Chapter 2: Bayesian Decision Theory

  14. Conditional Risk • Suppose we observe a particular x • We take action ai • If the true state of nature iswj • By definition we will incur the lossl(ai|wj) • We can minimize our expected loss by selecting the action that minimize the condition risk,R(ai|x) 240-650: Chapter 2: Bayesian Decision Theory

  15. Bayesian Decision Theory • Suppose that there are c categories {w1, w2, ..., wc} • Conditional risk • Risk is the average expected loss 240-650: Chapter 2: Bayesian Decision Theory

  16. Bayesian Decision Theory • Bayes decision rule • For a given x, select the action ai for which the conditional risk is minimum • The resulting minimum overall risk is called the Bayes risk, denoted as R*, which is the best performance that can be achieved 240-650: Chapter 2: Bayesian Decision Theory

  17. Two-Category Classification • Let lij = l(ai|wj) • Conditional risk • Fundamental decision rule Decide w1 if R(a1|x) < R(w2|x) 240-650: Chapter 2: Bayesian Decision Theory

  18. Two-Category Classification – cont. • The decision rule can be written in several ways • Decide w1 if one of the followings is true These rules are equivalent Likelihood Ratio 240-650: Chapter 2: Bayesian Decision Theory

  19. Minimum-Error-Rate Classification • A special case of the Bayes decision rule with the following zero-oneloss function • Assigns no loss to correct decision • Assigns unit loss to any error • All errors are equally costly 240-650: Chapter 2: Bayesian Decision Theory

  20. Minimum-Error-Rate Classification • Conditional risk 240-650: Chapter 2: Bayesian Decision Theory

  21. Minimum-Error-Rate Classification • We should select i that maximizes the posterior probability • For minimum error rate: Decide 240-650: Chapter 2: Bayesian Decision Theory

  22. Minimum-Error-Rate Classification 240-650: Chapter 2: Bayesian Decision Theory

  23. Classifiers, Discriminant Functions, and Decision Surfaces • There are many ways to represent pattern classifiers • One of the most useful is in terms of a set of discriminant functions gi(x), i=1,…,c • The classifier assigns a feature vector x to class if 240-650: Chapter 2: Bayesian Decision Theory

  24. The Multicategory Classifier 240-650: Chapter 2: Bayesian Decision Theory

  25. Classifiers, Discriminant Functions, and Decision Surfaces • There are many equivalent discriminant functions • i.e., the classification results will be the same even though they are different functions • For example, if f is a monotonically increasing function, then 240-650: Chapter 2: Bayesian Decision Theory

  26. Classifiers, Discriminant Functions, and Decision Surfaces • Some of discriminant functions are easier to understand or to compute 240-650: Chapter 2: Bayesian Decision Theory

  27. Decision Regions • The effect of any decision is to divide the feature space into c decision regions, R1, ..., Rc • The regions are separated with decision boundaries, where ties occur among the largest discriminant functions 240-650: Chapter 2: Bayesian Decision Theory

  28. Decision Regions – cont. 240-650: Chapter 2: Bayesian Decision Theory

  29. Two-Category Case (Dichotomizer) • Two-category case is a special case • Instead of two discriminant functions, a single one can be used 240-650: Chapter 2: Bayesian Decision Theory

  30. The Normal Density • Univariate Gaussian Density • Mean • Variance 240-650: Chapter 2: Bayesian Decision Theory

  31. The Normal Density 240-650: Chapter 2: Bayesian Decision Theory

  32. The Normal Density • Central Limit Theorem • The aggregate effect of the sum of a large number of small, independent random disturbances will lead to a Gaussian distribution • Gaussian is often a good model for the actual probability distribution 240-650: Chapter 2: Bayesian Decision Theory

  33. The Multivariate Normal Density • Multivariate Density (in d dimension) Abbreviation 240-650: Chapter 2: Bayesian Decision Theory

  34. The Multivariate Normal Density • Mean • Covariance matrix • The ijth component of 240-650: Chapter 2: Bayesian Decision Theory

  35. Statistically Independence • If xi and xj are statistically independence then • The covariance matrix will become a diagonal matrix where all off-diagonal elements are zero 240-650: Chapter 2: Bayesian Decision Theory

  36. Whitening Transform Diagonal matrix of the corresponding eigenvalues of matrix whose columns are the orthonormal eigenvectors of 240-650: Chapter 2: Bayesian Decision Theory

  37. Whitening Transform 240-650: Chapter 2: Bayesian Decision Theory

  38. Squared Mahalanobis Distance from x to m Constant density Principle axes of hyperellipsiods are given by the eigenvectors ofS Length of axes are determined by eigenvalues ofS 240-650: Chapter 2: Bayesian Decision Theory

  39. Discriminant Functions for the Normal Density • Minimum distance classifier • If the density are multivariate normal– i.e., if Then we have: 240-650: Chapter 2: Bayesian Decision Theory

  40. Discriminant Functions for the Normal Density • Case 1: • Features are statistically independence and each feature has the same variance • Where || . || denotes the Euclidean norm 240-650: Chapter 2: Bayesian Decision Theory

  41. Case 1: Si = s2I 240-650: Chapter 2: Bayesian Decision Theory

  42. Linear Discriminant Function • It is not necessary to compute distances • Expanding the form yields • The term is the same for all i • We have the following linear discriminant function 240-650: Chapter 2: Bayesian Decision Theory

  43. Linear Discriminant Function where and Threshold or bias for the ith category 240-650: Chapter 2: Bayesian Decision Theory

  44. Linear Machine • A classifier that uses linear discriminant functions is called a linear machine • Its decision surfaces are pieces of hyperplanes defined by the linear equations for the two categories with the highest posterior probabilities. For our case this equation can be written as 240-650: Chapter 2: Bayesian Decision Theory

  45. Linear Machine Where And If then the second term vanishes It is called a minimum-distance classifier 240-650: Chapter 2: Bayesian Decision Theory

  46. Priors change -> decision boundaries shift 240-650: Chapter 2: Bayesian Decision Theory

  47. Priors change -> decision boundaries shift 240-650: Chapter 2: Bayesian Decision Theory

  48. Priors change -> decision boundaries shift 240-650: Chapter 2: Bayesian Decision Theory

  49. Case 2: Si = S • Covariance matrices for all of the classes are identical but otherwise arbitrary • The cluster for the ith class is centered about mi • Discriminant function: Can be ignored if prior probabilities are the same for all classes 240-650: Chapter 2: Bayesian Decision Theory

  50. Case 2: Discriminant function Where and 240-650: Chapter 2: Bayesian Decision Theory

More Related