1 / 21

Decision Trees

Decision Trees. What is a decision tree?. Input = assignment of values for given attributes Discrete (often Boolean) or continuous Output = predicated value Discrete - classification Continuous - regression Structure: Internal node - tests one attribute

owena
Download Presentation

Decision Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decision Trees

  2. What is a decision tree? • Input = assignment of values for given attributes • Discrete (often Boolean) or continuous • Output = predicated value • Discrete - classification • Continuous - regression • Structure: • Internal node - tests one attribute • Leaf node - output value (or linear function of attributes)

  3. Example Attributes • Alternate - Boolean • Bar - Boolean • Fri/Sat - Boolean • Hungry - Boolean • Patrons - {None, Some, Full} • Price - {$, $$, $$$} • Raining - Boolean • Reservation - Boolean • Type - {French, Italian, Thai, burger} • WaitEstimate - {0-10min, 10-30, 30-60, >60}

  4. Decision Tree for WillWait

  5. Decision Tree Properties • Propositional: attribute-value pairs • Not useful for relationships between objects • Universal: Space of possible decision trees includes every Boolean function • Not efficient for all functions (e.g., parity, majority) • Good for disjunctive functions

  6. Decision Tree Learning • From a training set, construct a tree

  7. Methods for Constructing Decision Trees One path for each example in training set • Not robust, little predictive value Rather look for “small” tree • Applying Ockham’s Razor ID3: simple non-backtracking search through space of decision trees

  8. Algorithm

  9. Choose-Attribute Greedy algorithm - use attribute that gives the greatest immediate information gain

  10. Information/Entropy • Information provided by knowing an answer: • Possible answers vi with probability P(vi) • I({P(vi)}) = i -P(vi) log2P(vi) • I({0.5, 0.5}) = -0.5*(-1)-0.5*(-1) = 1 • I({0.01,0.99}) = -0.01*(-6.6) -0.99*(-0.014) = 0.08 • Estimate probability from set of examples • Example: • 6 yes, 6 no, estimate P(yes)=P(no)=0.5 • 1 bit required • After testing an attribute, take weighted sum of information required for subsets of the examples

  11. After Testing “Type” 2/12*I(1/2,1/2) + 2/12*I(1/2,1/2) + 4/12*I(1/2,1/2) + 4/12*I(1/2,1/2) = 1

  12. After Testing “Patrons” 2/12*I(0,1) + 4/12*I(1,0) + 6/12*I(2/6,4/6) = 0.46

  13. Repeat adding tests • Note: induced tree not the same as “true” function • Best we can do given the examples

  14. Overfitting • Algorithm can find meaningless regularity • E.g., use date & time to predict roll of die • Approaches to fixing: • Stop the tree from growing too far • Allow the tree to overfit, then prune

  15. 2 -Pruning • Is a split irrelevant? • Information gain close to zero - how close? • Assume no underlying pattern (null hypothesis) • If statistical analysis shows <5% probability that null hypothesis is correct, then assume attribute is relevant • Pruning provides noise tolerance

  16. Rule Post-Pruning • Convert tree into a set of rules (one per path) • Prune preconditions (generalize) each rule if it improves accuracy • Rules may now overlap • Consider rules in order of accuracy when doing classification

  17. Continuous-valued attributes • Split based on some threshhold • X<97 vs X>=97 • Many possible split points

  18. Multi-valued attributes • Information gain of attribute with many values may be huge (e.g., Date) • Rather than absolute info gain use ratio of gain to SplitInformation -

  19. Continuous-valued outputs • Leaf is a linear function of attributes, not value • Regression Tree

  20. Missing Attribute Values • Attribute value in example may not be known • Assign most common value amongst comparable examples • Split into fractional examples based of observed distribution of values

  21. Credits • Diagrams from “Artificial Intelligence - A Modern Approach’’ by Russell and Norvig

More Related