1 / 56

COMP5331

COMP5331. Classification. Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong raywong@cse. root. child=yes. child=no. 100% Yes 0% No. Income=low. Income=high. 100% Yes 0% No. 0% Yes 100% No. Decision tree.

yoshi
Download Presentation

COMP5331

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMP5331 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan’s notes Presented by Raymond Wong raywong@cse

  2. root child=yes child=no 100% Yes 0% No Income=low Income=high 100% Yes 0% No 0% Yes 100% No Decision tree Classification Suppose there is a person.

  3. root child=yes child=no 100% Yes 0% No Income=low Income=high 100% Yes 0% No 0% Yes 100% No Decision tree Classification Test set Suppose there is a person. Training set

  4. Applications • Insurance • According to the attributes of customers, • Determine which customers will buy an insurance policy • Marketing • According to the attributes of customers, • Determine which customers will buy a product such as computers • Bank Loan • According to the attributes of customers, • Determine which customers are “risky” customers or “safe” customers

  5. Applications • Network • According to the traffic patterns, • Determine whether the patterns are related to some “security attacks” • Software • According to the experience of programmers, • Determine which programmers can fix some certain bugs

  6. Same/Difference • Classification • Clustering

  7. Classification Methods • Decision Tree • Bayesian Classifier • Nearest Neighbor Classifier

  8. Decision Trees Iterative Dichotomiser • ID3 • C4.5 • CART Classification Classification And Regression Trees

  9. Entropy • Example 1 • Consider a random variable which has a uniform distribution over 32 outcomes • To identify an outcome, we need a label that takes 32 different values. • Thus, 5 bit strings suffice as labels

  10. Entropy • Entropy is used to measure how informative is a node. • If we are given a probability distributionP = (p1, p2, …, pn) then the Information conveyed by this distribution, also called the Entropy of P, is:I(P) = - (p1 x log p1 + p2 x log p2 + …+ pn x log pn) • All logarithms here are in base 2.

  11. Entropy • For example, • If P is (0.5, 0.5), then I(P) is 1. • If P is (0.67, 0.33), then I(P) is 0.92, • If P is (1, 0), then I(P) is 0. • The entropy is a way to measure the amount of information. • The smaller the entropy, the more informative we have.

  12. Entropy Info(T) = - ½ log ½ - ½ log ½ = 1 For attribute Race, = 0.8113 Info(Tblack) = - ¾ log ¾ - ¼ log ¼ Info(Twhite) = - ¾ log ¾ - ¼ log ¼ = 0.8113 = 0.8113 Info(Race, T) = ½ x Info(Tblack) + ½ x Info(Twhite) = 0.1887 Gain(Race, T) = Info(T) – Info(Race, T) = 1 – 0.8113 Gain(Race, T) = 0.1887 For attribute Race,

  13. Entropy Info(T) = - ½ log ½ - ½ log ½ = 1 For attribute Income, Info(Thigh) = - 1 log 1 – 0 log 0 = 0 Info(Tlow) = - 1/3 log 1/3 – 2/3 log 2/3 = 0.9183 Info(Income, T) = ¼ x Info(Thigh) + ¾ x Info(Tlow) = 0.6887 = 0.3113 Gain(Income, T) = Info(T) – Info(Income, T) = 1 – 0.6887 Gain(Race, T) = 0.1887 For attribute Race, Gain(Income, T) = 0.3113 For attribute Income,

  14. root child=yes child=no 1 2 3 4 5 6 7 8 20% Yes 80% No 100% Yes 0% No {2, 3, 4} {1, 5, 6, 7, 8} Insurance: 1 Yes; 4 No Insurance: 3 Yes; 0 No Info(T) = - ½ log ½ - ½ log ½ = 1 For attribute Child, Info(Tyes) = - 1 log 1 – 0 log 0 = 0 Info(Tno) = - 1/5 log 1/5 – 4/5 log 4/5 = 0.7219 Info(Child, T) = 3/8 x Info(Tyes) + 5/8 x Info(Tno) = 0.4512 = 0.5488 Gain(Child, T) = Info(T) – Info(Child, T) = 1 – 0.4512 Gain(Race, T) = 0.1887 For attribute Race, Gain(Income, T) = 0.3113 For attribute Income, Gain(Child, T) = 0.5488 For attribute Child,

  15. 1 2 3 4 5 6 7 8 root child=yes child=no 20% Yes 80% No 100% Yes 0% No {2, 3, 4} {1, 5, 6, 7, 8} Insurance: 1 Yes; 4 No Insurance: 3 Yes; 0 No Info(T) = - 1/5 log 1/5 – 4/5 log 4/5 = 0.7219 For attribute Race, Info(Tblack) = - ¼ log ¼–¾ log ¾ = 0.8113 Info(Twhite) = - 0 log 0 – 1 log 1 = 0 Info(Race, T) = 4/5 x Info(Tblack) + 1/5 x Info(Twhite) = 0.6490 = 0.0729 Gain(Race, T) = Info(T) – Info(Race, T) = 0.7219 – 0.6490 Gain(Race, T) = 0.0729 For attribute Race,

  16. 1 2 3 4 5 6 7 8 root child=yes child=no 20% Yes 80% No 100% Yes 0% No {2, 3, 4} {1, 5, 6, 7, 8} Insurance: 1 Yes; 4 No Insurance: 3 Yes; 0 No Info(T) = - 1/5 log 1/5 – 4/5 log 4/5 = 0.7219 For attribute Income, Info(Thigh) = - 1 log 1 – 0 log 0 = 0 Info(Tlow) = - 0 log 0 – 1 log 1 = 0 Info(Income, T) = 1/5 x Info(Thigh) + 4/5 x Info(Tlow) = 0 = 0.7219 Gain(Income, T) = Info(T) – Info(Income, T) = 0.7219 – 0 Gain(Race, T) = 0.0729 For attribute Race, Gain(Income, T) = 0.7219 For attribute Income,

  17. 1 2 Income=low Income=high 3 4 5 6 7 8 root child=yes child=no 20% Yes 80% No 100% Yes 0% No 100% Yes 0% No 0% Yes 100% No {1} {5, 6, 7, 8} Info(T) = - 1/5 log 1/5 – 4/5 log 4/5 = 0.7219 Insurance: 0 Yes; 4 No Insurance: 1 Yes; 0 No For attribute Income, Info(Thigh) = - 1 log 1 – 0 log 0 = 0 Info(Tlow) = - 0 log 0 – 1 log 1 = 0 Info(Income, T) = 1/5 x Info(Thigh) + 4/5 x Info(Tlow) = 0 = 0.7219 Gain(Income, T) = Info(T) – Info(Income, T) = 0.7219 – 0 Gain(Race, T) = 0.0729 For attribute Race, Gain(Income, T) = 0.7219 For attribute Income,

  18. 1 2 3 4 5 6 7 8 root child=yes child=no 100% Yes 0% No Income=low Income=high 100% Yes 0% No 0% Yes 100% No Decision tree Suppose there is a new person.

  19. 1 2 3 4 5 6 7 8 root child=yes child=no 100% Yes 0% No Income=low Income=high 100% Yes 0% No 0% Yes 100% No Decision tree Termination Criteria? e.g., height of the tree e.g., accuracy of each node

  20. Decision Trees • ID3 • C4.5 • CART

  21. C4.5 • ID3 • Impurity Measurement • Gain(A, T) = Info(T) – Info(A, T) • C4.5 • Impurity Measurement • Gain(A, T) = (Info(T) – Info(A, T))/SplitInfo(A) • where SplitInfo(A) = -vA p(v) log p(v)

  22. Info(Tblack) = - ¾ log ¾ - ¼ log ¼ = 0.8113 Info(Twhite) = - ¾ log ¾ - ¼ log ¼ = 0.8113 = 0.8113 Info(Race, T) = ½ x Info(Tblack) + ½ x Info(Twhite) Entropy Info(T) = - ½ log ½ - ½ log ½ = 1 For attribute Race, SplitInfo(Race) = - ½ log ½ - ½ log ½ = 1 = 0.1887 Gain(Race, T) = (Info(T) – Info(Race, T))/SplitInfo(Race) = (1 – 0.8113)/1 Gain(Race, T) = 0.1887 For attribute Race,

  23. Info(Thigh) = - 1 log 1 – 0 log 0 = 0 Info(Tlow) = - 1/3 log 1/3 – 2/3 log 2/3 = 0.9183 Info(Income, T) = ¼ x Info(Thigh) + ¾ x Info(Tlow) = 0.6887 Entropy Info(T) = - ½ log ½ - ½ log ½ = 1 For attribute Income, SplitInfo(Income) = - 2/8 log 2/8 – 6/8 log 6/8 = 0.8113 Gain(Income, T)= (Info(T)–Info(Income, T))/SplitInfo(Income) = (1–0.6887)/0.8113 = 0.3837 Gain(Race, T) = 0.1887 For attribute Race, Gain(Income, T) = 0.3837 For attribute Income, Gain(Child, T) = ? For attribute Child,

  24. Decision Trees • ID3 • C4.5 • CART

  25. CART • Impurity Measurement • GiniI(P) = 1 –j pj2

  26. Gini Info(T) = 1 – (½)2– (½)2 = ½ For attribute Race, Info(Tblack) = 1 – (¾)2– (¼)2 = 0.375 = 0.375 Info(Twhite) = 1 – (¾)2– (¼)2 = 0.375 Info(Race, T) = ½ x Info(Tblack) + ½ x Info(Twhite) = 0.125 Gain(Race, T) = Info(T) – Info(Race, T) = ½– 0.375 Gain(Race, T) = 0.125 For attribute Race,

  27. Gini Info(T) = 1 – (½)2– (½)2 = ½ For attribute Income, = 0 Info(Thigh) = 1 – 12– 02 Info(Tlow) = 1 – (1/3)2– (2/3)2 = 0.444 = 0.333 Info(Income, T) = 1/4 x Info(Thigh) + 3/4 x Info(Tlow) = 0.167 Gain(Income, T) = Info(T) – Info(Race, T) = ½– 0.333 Gain(Race, T) = 0.125 For attribute Race, Gain(Race, T) = 0.167 For attribute Income, Gain(Child, T) = ? For attribute Child,

  28. Classification Methods • Decision Tree • Bayesian Classifier • Nearest Neighbor Classifier

  29. Bayesian Classifier • Naïve Bayes Classifier • Bayesian Belief Networks

  30. Naïve Bayes Classifier • Statistical Classifiers • Probabilities • Conditional probabilities

  31. P(AB) P(A | B) = P(B) Naïve Bayes Classifier • Conditional Probability • A: a random variable • B: a random variable

  32. Naïve Bayes Classifier • Bayes Rule • A : a random variable • B: a random variable P(B|A) P(A) P(A | B) = P(B)

  33. Naïve Bayes Classifier • Independent Assumption • Each attribute are independent • e.g., P(X, Y, Z | A) = P(X | A) x P(Y | A) x P(Z | A)

  34. = P(Race = white | Yes) x P(Income = high | Yes) = P(Race = white | No) x P(Income = high | No) x P(Child = no | No) x P(Child = no | Yes) Suppose there is a new person. Naïve Bayes Classifier Insurance = Yes For attribute Race, P(Race = black | Yes) = ¼ P(Yes) = ½ P(Race = white | Yes) = ¾ P(No) = ½ P(Race = black | No) = ¾ P(Race = white | No) = ¼ Naïve Bayes Classifier For attribute Income, P(Race = white, Income = high, Child = no| Yes) P(Income = high | Yes) = ½ P(Income = low | Yes) = ½ P(Income = high | No) = 0 = ¾ x ½ x ¼ P(Income = low | No) = 1 = 0.09375 For attribute Child, P(Race = white, Income = high, Child = no| No) P(Child = yes | Yes) = ¾ P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 = ¼ x 0 x 1 P(Child = no | No) = 1 = 0

  35. = P(Race = white | No) x P(Income = high | No) x P(Child = no | No) Suppose there is a new person. Naïve Bayes Classifier Insurance = Yes For attribute Race, P(Race = black | Yes) = ¼ P(Yes) = ½ P(Race = white | Yes) = ¾ P(No) = ½ P(Race = black | No) = ¾ P(Race = white | No) = ¼ Naïve Bayes Classifier For attribute Income, P(Race = white, Income = high, Child = no| Yes) P(Income = high | Yes) = ½ P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1 = 0.09375 For attribute Child, P(Race = white, Income = high, Child = no| No) P(Child = yes | Yes) = ¾ P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 = ¼ x 0 x 1 P(Child = no | No) = 1 = 0

  36. = P(Race = white | No) x P(Income = high | No) x P(Child = no | No) Suppose there is a new person. Naïve Bayes Classifier Insurance = Yes For attribute Race, P(Race = black | Yes) = ¼ P(Yes) = ½ P(Race = white | Yes) = ¾ P(No) = ½ P(Race = black | No) = ¾ P(Race = white | No) = ¼ Naïve Bayes Classifier For attribute Income, P(Race = white, Income = high, Child = no| Yes) P(Income = high | Yes) = ½ = 0.09375 P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1 For attribute Child, P(Race = white, Income = high, Child = no| No) P(Child = yes | Yes) = ¾ P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 = ¼ x 0 x 1 P(Child = no | No) = 1 = 0

  37. Suppose there is a new person. Naïve Bayes Classifier Insurance = Yes For attribute Race, P(Race = black | Yes) = ¼ P(Yes) = ½ P(Race = white | Yes) = ¾ P(No) = ½ P(Race = black | No) = ¾ P(Race = white | No) = ¼ Naïve Bayes Classifier For attribute Income, P(Race = white, Income = high, Child = no| Yes) P(Income = high | Yes) = ½ = 0.09375 P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1 For attribute Child, P(Race = white, Income = high, Child = no| No) P(Child = yes | Yes) = ¾ P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1 = 0

  38. Suppose there is a new person. Naïve Bayes Classifier Insurance = Yes For attribute Race, P(Race = black | Yes) = ¼ P(Yes) = ½ P(Race = white | Yes) = ¾ P(No) = ½ P(Race = black | No) = ¾ P(Race = white | No) = ¼ Naïve Bayes Classifier For attribute Income, P(Race = white, Income = high, Child = no| Yes) P(Income = high | Yes) = ½ = 0.09375 P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1 For attribute Child, P(Race = white, Income = high, Child = no| No) P(Child = yes | Yes) = ¾ = 0 P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1

  39. P(Yes | Race = white, Income = high, Child = no) P(Race = white, Income = high, Child = no| Yes) P(Yes) = 0.046875 0.09375 x 0.5 0.046875 P(Race = white, Income = high, Child = no) = = = P(Race = white, Income = high, Child = no) P(Race = white, Income = high, Child = no) P(Race = white, Income = high, Child = no) Suppose there is a new person. Naïve Bayes Classifier Insurance = Yes For attribute Race, P(Race = black | Yes) = ¼ P(Yes) = ½ P(Race = white | Yes) = ¾ P(No) = ½ P(Race = black | No) = ¾ P(Race = white | No) = ¼ Naïve Bayes Classifier For attribute Income, P(Race = white, Income = high, Child = no| Yes) P(Income = high | Yes) = ½ = 0.09375 P(Income = low | Yes) = ½ P(Race = white, Income = high, Child = no| No) P(Income = high | No) = 0 = 0 P(Income = low | No) = 1 P(Yes | Race = white, Income = high, Child = no) For attribute Child, P(Child = yes | Yes) = ¾ P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1

  40. P(No | Race = white, Income = high, Child = no) P(Yes | Race = white, Income = high, Child = no) P(Race = white, Income = high, Child = no| No) P(No) = 0 x 0.5 0.046875 0 P(Race = white, Income = high, Child = no) = = = P(Race = white, Income = high, Child = no) P(Race = white, Income = high, Child = no) P(Race = white, Income = high, Child = no) 0 = P(Race = white, Income = high, Child = no) Suppose there is a new person. Naïve Bayes Classifier Insurance = Yes For attribute Race, P(Race = black | Yes) = ¼ P(Yes) = ½ P(Race = white | Yes) = ¾ P(No) = ½ P(Race = black | No) = ¾ P(Race = white | No) = ¼ Naïve Bayes Classifier For attribute Income, P(Race = white, Income = high, Child = no| Yes) P(Income = high | Yes) = ½ = 0.09375 P(Income = low | Yes) = ½ P(Race = white, Income = high, Child = no| No) P(Income = high | No) = 0 = 0 P(Income = low | No) = 1 P(No | Race = white, Income = high, Child = no) For attribute Child, P(Child = yes | Yes) = ¾ P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1

  41. P(Yes | Race = white, Income = high, Child = no) P(No | Race = white, Income = high, Child = no) 0 0.046875 = = P(Race = white, Income = high, Child = no) P(Race = white, Income = high, Child = no) Suppose there is a new person. Naïve Bayes Classifier Insurance = Yes For attribute Race, P(Race = black | Yes) = ¼ P(Yes) = ½ P(Race = white | Yes) = ¾ P(No) = ½ P(Race = black | No) = ¾ P(Race = white | No) = ¼ Naïve Bayes Classifier For attribute Income, P(Income = high | Yes) = ½ Since P(Yes | Race = white, Income = high, Child = no)> P(No | Race = white, Income = high, Child = no). P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1 we predict the following new person will buy an insurance. For attribute Child, P(Child = yes | Yes) = ¾ P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1

  42. Bayesian Classifier • Naïve Bayes Classifier • Bayesian Belief Networks

  43. Bayesian Belief Network • Naïve Bayes Classifier • Independent Assumption • Bayesian Belief Network • Do not have independent assumption

  44. Exercise (E) Heart Disease Bayesian Belief Network Healthy/Unhealthy Yes/No High/Low Yes/No Yes/No Yes/No Some attributes are dependent on other attributes. e.g., doing exercises may reduce the probability of suffering from Heart Disease

  45. Exercise (E) Diet (D) Heart Disease (HD) Heartburn (Hb) Blood Pressure (BP) Chest Pain (CP) Bayesian Belief Network

  46. Bayesian Belief Network Let X, Y, Z be three random variables. X is said to be conditionally independent of Y given Z if the following holds. P(X | Y, Z) = P(X | Z) Lemma: If X is conditionally independent of Y given Z, P(X, Y | Z) = P(X | Z) x P(Y | Z) ?

  47. Exercise (E) Diet (D) Heart Disease (HD) Heartburn (Hb) Blood Pressure (BP) Chest Pain (CP) Bayesian Belief Network Let X, Y, Z be three random variables. X is said to be conditionally independent of Y given Z if the following holds. P(X | Y, Z) = P(X | Z) Property: A node is conditionally independent of its non-descendants if its parents are known. e.g., P(BP = High | HD = Yes, D = Healthy) = P(BP = High | HD = Yes) “BP = High” is conditionally independent of “D = Healthy” given “HD = Yes” e.g., P(BP = High | HD = Yes, CP=Yes) = P(BP = High | HD = Yes) “BP = High” is conditionally independent of “CP = Yes” given “HD = Yes”

  48. Bayesian Belief Network Healthy/Unhealthy Yes/No High/Low Yes/No Yes/No Yes/No Suppose there is a new person and I want to know whether he is likely to have Heart Disease.

  49. = 0.25 x 0.7 x 0.25 + 0.45 x 0.7 x 0.75 + 0.55 x 0.3 x 0.25 + 0.75 x 0.3 x 0.75 Bayesian Belief Network Suppose there is a new person and I want to know whether he is likely to have Heart Disease. P(HD = Yes) = x{Yes, No} y{Healthy, Unhealthy}P(HD=Yes|E=x, D=y) x P(E=x, D=y) = x{Yes, No} y{Healthy, Unhealthy}P(HD=Yes|E=x, D=y) x P(E=x) x P(D=y) = 0.49 P(HD = No) = 1- P(HD = Yes) = 1- 0.49 = 0.51

  50. P(BP = High|HD=Yes) x P(HD = Yes) P(HD = Yes|BP = High) = P(BP = High) = 0.85 x 0.49 0.5185 Bayesian Belief Network Suppose there is a new person and I want to know whether he is likely to have Heart Disease. P(BP = High) = x{Yes, No}P(BP = High|HD=x) x P(HD = x) = 0.85x0.49 + 0.2x0.51 = 0.5185 = 0.8033 P(HD = No|BP = High) = 1 – P(HD = Yes|BP = High) = 1 – 0.8033 = 0.1967

More Related