1 / 26

Classification and Regression Trees Chapter 9

Classification and Regression Trees Chapter 9. Example: Predicting Car Purchase -- Decision Tree. Age?.  30. 30-40. >40. Student?. YES. Credit Rating?. yes. yes. no. no. NO. YES. NO. YES. Another Example: will the customer default on her/his loan?.

sfrost
Download Presentation

Classification and Regression Trees Chapter 9

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification and Regression Trees Chapter 9

  2. Example: Predicting Car Purchase -- Decision Tree Age?  30 30-40 >40 Student? YES Credit Rating? yes yes no no NO YES NO YES

  3. Another Example: will the customer default on her/his loan? • Can we classify default based on balance and age?

  4. The Decision Tree • A series of nested tests. • Each node represents a test on one attribute (for decisions) • Nominal attribute • # of branches = # of possible values • Numeric attribute • discretized • Each leafis a class assignment • Default or Not default Employed? Yes No Balance? NOT DEFAULT >=50K <50K DEFAULT Age? >45 <=45 NOT DEFAULT DEFAULT

  5. Using the decision tree for prediction • Predict for Mark: • Age: 40 • Employment: None • Balance: 88k • Start at the root and traverse down until a leaf is reached Employed? Yes No Balance? NOT DEFAULT >=50K <50K DEFAULT Age? >45 <=45 NOT DEFAULT DEFAULT

  6. Non-Responder IFIncome=LowANDDebts=LowTHENNon-Responder Low Debts Responder IFIncome=LowANDDebts=HighTHENResponder High Low Income IFIncome=HighANDGender=MaleANDChildren=ManyTHENResponder Many Responder Children Male High Gender Few Non-Responder Female Non-Responder IFIncome=HighANDGender=FemaleTHENNon-Responder Trees Easily Converted To Rules (for coding) • Reading Rules from a Decision Tree IFIncome=HighANDGender=Male ANDChildren=FewTHENNon-Responder

  7. Decision Tree Construction • Basic algorithms are greedy • Tree constructed in a top-down recursive partitioningmanner • All training examples are at the root initially • Attributes are assumed to be categorical (discretized if necessary) • Examples partitioned recursively based on selected attributes • Which attribute to select? – Using Information gain • Some Popular Methods • ID3, C4.5, CART • Similar ideas • Differ in • How tree is grown • Splitting criteria • Pruning methods • Termination criteria

  8. Decision Tree Construction • Basic idea: • Partition training examples into purer and purer sub groups • Group A is “purer” than group B if more members in A are similar than members in B. • Tree constructed by recursively partitioning instances. Age  45 Balance 50K Age < 45 Entire Population Age  45 Balance < 50K Bad risk (Default) Good risk (Not default) Age < 45

  9. Attribute Selection • Which attribute should be used for a split? • Choose the attribute that best partitions the relevant population into purer groups at each decision node. • Many measures of impurity exist • (Optional) Gini index • Entropy and Information Gain - most common • Use the Information Gain measure to decide which attribute to use • How informative is the attribute in distinguishing among instances from different classes? • Developed by Shannon (1952) • Ideally, also try to minimize number of splits (nodes) in the tree. • More compact • Often more accurate

  10. Measurement of impurity: Entropy q = proportion of cases (out of m classes) in set A that belong to class k • Entropy ranges between 0 (most pure) and log2(m) (equal representation of classes) • Maximum value of Entropy is 1 in the binary case

  11. Entropy (Cont’d) • Set S with p elements of class P and n elements of class N. • Entropy of set S is Here k = 2 corresponding to P and N If p = credit worthy = 10, n =non-credit worthy = 20, E(S) = -(10/(10+20))log2(10/(10+20))-(20/(10+20))log2(20/(10+20)) =0.918296

  12. Exercise on Entropy • Calculate the entropy. • What is the value when p=n=15? Entire Population 16 Plus (p), 14 Green (n)? For practice: Entire Population 12 Plus (p), 18 Green (n) ?

  13. Information Gain • Information gain: expected reduction in entropy • Suppose node N gets partitioned into M child nodes {c1, c2, …, cm}, given attribute A • Information Gain = Entropy Reduction • = Entropy of N – Sum of entropies of c1…cm • Select attribute with the highest information gain • What does “highest information gain” mean intuitively?

  14. Example

  15. Root Node: Try (Hair Length  5”) no yes Hair Length <= 5” ?

  16. Exercise: Try (Age  10) no no yes yes Age <= 10? Age <= 36? Find: Entropies for all the splits Information Gain Try later: Age<= 36

  17. Try (Weight  160 lb) no yes Weight <= 160 lb?

  18. Root Node: Try (Weight  160 lb) no yes Weight <= 160 lb?

  19. Building The Tree • Splitting of Weight reduces entropy the most • People with weight  160 not perfectly classified, so recurse • Split on Hair Length  2” is the best option •  TREE no yes Weight <= 160 lb? no yes Hair Length <= 2”?

  20. Trees Easily Converted To Rules IF(weight > 160 lb) male ELSE IF(Hair Length  2”) male ELSEfemale no yes Weight <= 160 lb? no yes Hair Length <= 2”?

  21. Stopping • Stopping Criteria for splitting • When additional splits obtain no information gain • When maximum purity is obtained • if all attributes have been used

  22. The Overfitting Problem no yes Wears blue? • Many possible splitting rules that (near-)perfectly classify the data • May not generalize to future datasets • Particularly a problem with small datasets

  23. Overfitting • Too many branches (think about having only one data point at each leaf …) • End up fitting noise • Effect • Great fit for training data, poor accuracy for unseen samples

  24. Avoid Overfitting • Prepruning • Halt tree construction early • Do not split node if this would result in purity measure falling below threshold • Difficult to choose threshold • Postpruning • Remove branches from a “fully grown” tree • Get sequence of progressively pruned trees • Use different dataset to select best pruned tree (i.e. cross-validation)

  25. Postpruning

  26. Strengths & Weaknesses of Decision Trees • Strengths • Easy to understand and interpret – tree structure specifies entire decision structure • Easy to implement • Running time is low even with large data sets • Very popular method • Weaknesses • Volatile: small changes in underlying data result in very different models • Cannot capture interactions between variables • Can result in large error • How can we reduce volatility ? • “Bagging”

More Related