1 / 33

Decision Tree

Decision Tree. Rong Jin. Determine Milage Per Gallon. A Decision Tree for Determining MPG. mpg cylinders displacement horsepower weight acceleration modelyear maker 4 low low low high 75to78 asia. good. From slides of Andrew Moore. Decision Tree Learning. Extremely popular method

Download Presentation

Decision Tree

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decision Tree Rong Jin

  2. Determine Milage Per Gallon

  3. A Decision Tree for Determining MPG mpg cylinders displacement horsepower weight acceleration modelyear maker 4 low low low high 75to78 asia good From slides of Andrew Moore

  4. Decision Tree Learning • Extremely popular method • Credit risk assessment • Medical diagnosis • Market analysis • Good at dealing with symbolic feature • Easy to comprehend • Compared to logistic regression model and support vector machine

  5. Representational Power • Q: Can trees represent arbitrary Boolean expressions? • Q: How many Boolean functions are there over N binary attributes?

  6. How to Generate Trees from Training Data

  7. A Simple Idea • Enumerate all possible trees • Check how well each tree matches with the training data • Pick the one work best Too many trees How to determine the quality of decision trees? Problems ?

  8. Solution: A Greedy Approach • Choose the most informative feature • Split data set • Recursive until each data item is classified correctly

  9. How to Determine the Best Feature? • Which feature is more informative to MPG? • What metric should be used? Mutual Information ! From Andrew Moore’s slides

  10. Mutual Information for Selecting Best Features From Andrew Moore’s slides

  11. Another Example: Playing Tennis

  12. Example: Playing Tennis Humidity (9+, 5-) Wind (9+, 5-) High Norm Weak Strong (6+, 1-) (3+, 4-) (3+, 3-) (6+, 2-)

  13. Predication for Nodes What is the predication for each node? From Andrew Moore’s slides

  14. Predication for Nodes

  15. Recursively Growing Trees cylinders = 4 cylinders = 5 cylinders = 6 Original Dataset Partition it according to the value of the attribute we split on cylinders = 8 From Andrew Moore slides

  16. Build tree from These records.. Build tree from These records.. Build tree from These records.. Build tree from These records.. cylinders = 5 cylinders = 4 cylinders = 6 cylinders = 8 Recursively Growing Trees From Andrew Moore slides

  17. Recursively growing trees A Two Level Tree

  18. Should we split this node ? When should We Stop Growing Trees?

  19. Base Cases • Base Case One: If all records in current data subset have the same output then don’t recurse • Base Case Two: If all records have exactly the same set of input attributes then don’t recurse

  20. Base Cases: An idea • Base Case One: If all records in current data subset have the same output then don’t recurse • Base Case Two: If all records have exactly the same set of input attributes then don’t recurse Proposed Base Case 3: If all attributes have zero information gain then don’t recurse Is this a good idea?

  21. Old Topic: Overfitting

  22. Pruning What should We do ?

  23. Pruning Decision Tree • Stop growing trees in time • Build the full decision tree as before. • But when you can grow it no more, start to prune: • Reduced error pruning • Rule post-pruning

  24. Reduced Error Pruning • Split data into training and validation set • Build a full decision tree over the training set • Keep removing node that maximally increases validation set accuracy

  25. Original Decision Tree

  26. Pruned Decision Tree

  27. Reduced Error Pruning

  28. Rule Post-Pruning • Convert tree into rules • Prune rules by removing the preconditions • Sort final rules by their estimated accuracy Most widely used method (e.g., C4.5) Other methods: statistical significance test (chi-square)

  29. Real Value Inputs • What should we do to deal with real value inputs?

  30. Information Gain • x: a real value input • t: split value • Find the split value t such that the mutual information I(x, y: t) between x and the class label y is maximized.

  31. Conclusions • Decision trees are the single most popular data mining tool • Easy to understand • Easy to implement • Easy to use • Computationally cheap • It’s possible to get in trouble with overfitting • They do classification: predict a categorical output from categorical and/or real inputs

  32. Software • Most widely used decision tree C4.5 (or C5.0) http://www2.cs.uregina.ca/~hamilton/courses/831/notes/ml/dtrees/c4.5/tutorial.html • Source code, tutorial

  33. The End

More Related