1 / 43

Decision Trees

Decision Trees. General Learning Task. DEFINE: Set X of Instances (of n- tuples x = < x 1 , ..., x n >) E.g., days decribed by attributes (or features ): Sky, Temp, Humidity, Wind, Water, Forecast Target function y , e.g.:

moretti
Download Presentation

Decision Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decision Trees

  2. General Learning Task DEFINE: • Set X of Instances (of n-tuples x = <x1, ..., xn>) • E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast • Target function y, e.g.: • EnjoySport X Y = {0,1} (example of concept learning) • WhichSport X  Y = {Tennis, Soccer, Volleyball} • InchesOfRain X  Y = [0, 10] GIVEN: • Training examples D • positive and negative examples of the target function: <x , y(x)> FIND: • A hypothesish such that h(x) approximates y(x).

  3. Hypothesis Spaces • Hypothesis space H is a subset of all y: X  Y e.g.: • MC2, conjunction of literals: < Sunny ? ? Strong ? Same > • Decision trees, any function • 2-level decision trees (any function of two attributes, some of three) • Candidate-Elimination Algorithm: • Search H for a hypothesis that matches the training data • Exploits general-to-specific ordering of hypotheses • Decision Trees • Incrementally grow tree by splitting training examples on attribute values • Can be thought of as looping for i = 1,...,n: • Search Hi = i-level trees for hypothesis h that matches data

  4. Decision Trees represent disjunctions of conjunctions =(Sunny ^ Normal) v Overcast v (Rain ^ Weak)

  5. Decision Trees vs. MC2 MC2 can’t represent (Sunny v Cloudy) MC2 hypotheses must constrain to a single attribute value if at all Vs. Decision Trees: Yes Yes No

  6. Learning Parity with D-Trees • How to solve 2-bit parity: • Two step look-ahead • Split on pairs of attributes at once • For k attributes, why not just do k-step look ahead? Or split on k attribute values? • =>Parity functions are the “victims” of the decision tree’s inductive bias.

  7. = I(Y; xi)

  8. Overfitting is due to “noise” • Sources of noise: • Erroneous training data • concept variable incorrect (annotator error) • Attributes mis-measured • Much more significant: • Irrelevant attributes • Target function not deterministic in attributes

  9. Irrelevant attributes • If many attributes are noisy, information gains can be spurious, e.g.: • 20 noisy attributes • 10 training examples • =>Expected # of depth-3 trees that split the training data perfectly using only noisy attributes: 13.4 • Potential solution: statistical significance tests (e.g., chi-square)

  10. Non-determinism • In general: • We can’t measure all the variables we need to do perfect prediction. • => Target function is not uniquely determined by attribute values

  11. Non-determinism: Example Decent hypothesis: Humidity > 0.70  No Otherwise  Yes Overfit hypothesis: Humidity > 0.89  No Humidity > 0.80 ^ Humidity <= 0.89  Yes Humidity > 0.70 ^ Humidity <= 0.80  No Humidity <= 0.70  Yes

  12. Rule #2 of Machine Learning The best hypothesis almost never achieves 100% accuracy on the training data. (Rule #1 was: you can’t learn anything without inductive bias)

  13. Hypothesis Space comparisons • Task: concept learning with k binary attributes

  14. Decision Trees – Strengths • Very Popular Technique • Fast • Useful when • Instances are attribute-value pairs • Target Function is discrete • Concepts are likely to be disjunctions • Attributes may be noisy

  15. Decision Trees – Weaknesses • Less useful for continuous outputs • Can have difficulty with continuous input features as well… • E.g., what if your target concept is a circle in the x1, x2 plane? • Hard to represent with decision trees… • Very simple with instance-based methods we’ll discuss later…

  16. decision tree learning algorithm; along the lines of ID3

More Related