1 / 102

Discrimination and Classification

Discrimination and Classification. Discrimination. Situation: We have two or more populations p 1 , p 2 , etc (possibly p -variate normal). The populations are known (or we have data from each population)

janine
Download Presentation

Discrimination and Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discrimination and Classification

  2. Discrimination Situation: We have two or more populations p1, p2, etc (possibly p-variate normal). The populations are known (or we have data from each population) We have data for a new case (population unknown) and we want to identify the which population for which the new case is a member.

  3. The Basic Problem Suppose that the data from a new case x1, … , xphas joint density function either : p1:g(x1, … , xn) or p2:h(x1, … , xn) We want to make the decision to D1: Classify the case in p1 (g is the correct distribution) or D2: Classify the case in p2 (h is the correct distribution)

  4. The Two Types of Errors • Misclassifying the case inp1when it actually lies in p2. • Let P[1|2] = P[D1|p2] = probability of this type of error • Misclassifying the case inp2when it actually lies in p1. • Let P[2|1] = P[D2|p1] = probability of this type of error This is similar Type I and Type II errors in hypothesis testing.

  5. Note: A discrimination scheme is defined by splitting p –dimensional space into two regions. • C1 = the region were we make the decision D1.(the decision to classify the case inp1) • C2 = the region were we make the decision D2.(the decision to classify the case inp2)

  6. There can be several approaches to determining the regions C1 and C2. All concerned with taking into account the probabilities of misclassification P[2|1] and P[1|2] • Set up the regions C1 and C2 so that one of the probabilities of misclassification , P[2|1] say, is at some low acceptable value a. Accept the level of the other probability of misclassification P[1|2] = b.

  7. Set up the regions C1 and C2 so that the total probability of misclassification: P[Misclassification] = P[1] P[2|1] + P[2]P[1|2] is minimized P[1] = P[the case belongs to p1] P[2] = P[the case belongs to p2]

  8. Set up the regions C1 and C2 so that the total expected cost of misclassification: E[Cost of Misclassification] = ECM = c2|1P[1] P[2|1] + c1|2P[2]P[1|2] is minimized P[1] = P[the case belongs to p1] P[2] = P[the case belongs to p2] c2|1= the cost of misclassifying the case in p2 when the case belongs to p1. c1|2= the cost of misclassifying the case in p1 when the case belongs to p2.

  9. The Optimal Classification RuleThe Neyman-Pearson Lemma Suppose that the data x1, … , xphas joint density function f(x1, … , xp;q) where qis either q1 or q2. Let g(x1, … , xp) = f(x1, … , xn;q1) and h(x1, … , xp) = f(x1, … , xn;q2) We want to make the decision D1: q= q1 (g is the correct distribution) against D2: q=q2 (h is the correct distribution)

  10. then the optimal regions (minimizing ECM, expected cost of misclassification) for making the decisions D1 and D2 respectively are C1 and C2 and where

  11. ECM= E[Cost of Misclassification] = c2|1P[1] P[2|1] + c1|2P[2]P[1|2] Proof:

  12. Therefore Thus ECM is minimized if C1 contains all of the points (x1, …, xp) such that the integrand is negative

  13. The Neymann-Pearson Lemma(another proof) Suppose that the data x1, … , xnhas joint density function f(x1, … , xn;q) where qis either q1 or q2. Let g(x1, … , xn) = f(x1, … , xn;q1) and h(x1, … , xn) = f(x1, … , xn;q2) We want to test H0: q= q1 (g is the correct distribution) against HA: q=q2 (h is the correct distribution)

  14. The Neymann-Pearson Lemma states that the Uniformly Most Powerful (UMP) test of size a is to reject H0 if: and accept H0 if: where ka is chosen so that the test is of size a.

  15. Proof: Let Cbe the critical region of any test of size a. Let We want to show that Note:

  16. hence and Thus

  17. and

  18. Thus and when we add the common quantity Q.E.D. to both sides.

  19. Fishers Linear Discriminant Function. Suppose that x1, … , xpis either data from a p-variate Normal distribution with mean vector: The covariance matrix S is the same for both populations p1 and p2.

  20. The Neymann-Pearson Lemma states that we should classify into populations p1 and p2 using: That is make the decision D1 : population is p1 if l> k

  21. or or and

  22. Finally we make the decision D1 : population is p1 if where and Note: k = 1 and ln k = 0 if c1|2 = c2|1 and P[1] = P[2].

  23. The function Is called Fisher’s linear discriminant function

  24. In the case where the populations are unknown but estimated from data Fisher’s linear discriminant function

  25. Example2 Annual financial data are collected for firms approximately 2 years prior to bankruptcy and for financially sound firms at about the same point in time. The data on the four variables • x1 = CF/TD = (cash flow)/(total debt), • x2 = NI/TA = (net income)/(Total assets), • x3 = CA/CL = (current assets)/(current liabilties, and • x4 = CA/NS = (current assets)/(net sales) are given in the following table.

  26. The data are given in the following table:

  27. Examples using SPSSandMinitab

  28. Discriminant Anlysis Review

  29. The Optimal Classification Rule Suppose that the data x1, … , xphas joint density function f(x1, … , xp;q) where qis either q1 or q2. Let g(x1, … , xp) = f(x1, … , xn;q1) and h(x1, … , xp) = f(x1, … , xn;q2) We want to make the decision D1: q= q1 (g is the correct distribution) against D2: q=q2 (h is the correct distribution)

  30. then the optimal regions (minimizing ECM, expected cost of misclassification) for making the decisions D1 and D2 respectively are C1 and C2 and where

  31. Fishers Linear Discriminant Function. Suppose that x1, … , xpis either data from a p-variate Normal distribution with mean vector: The covariance matrix S is the same for both populations p1 and p2.

  32. The Optimal Rule states that we should classify into populations p1 and p2 using: That is make the decision D1 : population is p1 if l> k

  33. or or and

  34. Finally we make the decision D1 : population is p1 if Fisher’s Linear discriminant function where and Note: k = 1 and ln k = 0 if c1|2 = c2|1 and P[1] = P[2].

  35. Graphical illustration of Fisher’s linear discriminant function

  36. Note: k = 1 and ln k = 0 if c1|2 = c2|1 and P[1] = P[2]. Thus with is equivalent to or

  37. Thus we make the decision D1 : population is p1 if

  38. Thus we make the decision D2 : population is p2 if

  39. Thus we make the decision D1 : population is p1 if where and Note: k = 1 and ln k = 0 if c1|2 = c2|1 and P[1] = P[2].

  40. Discrimination of p-variate Normal distributions (unequal Covariance matrices) Suppose that x1, … , xpis either data from a p-variate Normal distribution with mean vector: and covariance matrices, S1 and S2 respectively.

  41. The optimal rule states that we should classify into populations p1 and p2 using: That is make the decision D1 : population is p1 if l≥ k

  42. or where and

  43. Summarizing we make the decision to classify in population p1 if: where and

  44. Discrimination of p-variate Normal distributions (unequal Covariance matrices)

  45. Discrimination amongst k populations

  46. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition p-dimensional space into k regions C1, C2 , …, Ck

More Related