1 / 28

Machine Learning

Machine Learning. Lecture 10 Decision Trees. Trees. Node Root Leaf Branch Path Depth. Decision Trees. A hierarchical data structure that represents data by implementing a divide and conquer strategy Can be used as a non-parametric classification method .

pierce
Download Presentation

Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu

  2. Trees • Node • Root • Leaf • Branch • Path • Depth

  3. Decision Trees • A hierarchical data structure that represents data by implementing a divide and conquer strategy • Can be used as a non-parametric classification method. • Given a collection of examples, learn a decision tree that represents it. • Use this representation to classify new examples

  4. Decision Trees • Each node is associated with a feature (one of the elements of a feature vector that represent an object); • Each node test the value of its associated feature; • There is one branch for each value of the feature • Leaves specify the categories (classes) • Can categorize instances into multiple disjoint categories – multi-class Outlook Sunny Overcast Rain Humidity Wind Yes High Normal Strong Weak No Yes No Yes

  5. Decision Trees • Play Tennis Example • Feature Vector = (Outlook, Temperature, Humidity, Wind) Outlook Sunny Overcast Rain Humidity Wind Yes High Normal Strong Weak No Yes No Yes

  6. Decision Trees Node associated with a feature Node associated with a feature Outlook Sunny Overcast Rain Humidity Yes Wind High Normal Strong Weak No Yes No Yes Node associated with a feature

  7. Decision Trees • Play Tennis Example • Feature values: • Outlook = (sunny, overcast, rain) • Temperature =(hot, mild, cool) • Humidity = (high, normal) • Wind =(strong, weak)

  8. Decision Trees One branch for each value • Outlook = (sunny, overcast, rain) One branch for each value Outlook Sunny Overcast Rain Humidity Yes Wind One branch for each value High Normal Strong Weak No Yes No Yes

  9. Decision Trees • Class = (Yes, No) Outlook Sunny Overcast Rain Humidity Yes Wind High Normal Strong Weak No Yes No Yes Leaf nodes specify classes Leaf nodes specify classes

  10. Decision Trees • Design Decision Tree Classifier • Picking the root node • Recursively branching

  11. Decision Trees • Picking the root node • Consider data with two Boolean attributes (A,B) and two classes + and – { (A=0,B=0), - }: 50 examples { (A=0,B=1), - }: 50 examples { (A=1,B=0), - }: 3 examples { (A=1,B=1), + }: 100 examples

  12. B A 1 1 0 0 - - A B 1 1 0 0 - - + + Decision Trees • Picking the root node • Trees looks structurally similar; which attribute should we choose?

  13. Decision Trees • Picking the root node • The goal is to have the resulting decision tree as small as possible (Occam’s Razor) • The main decision in the algorithm is the selection of the next attribute to condition on (start from the root node). • We want attributes that split the examples to sets that are relatively pure in one label; this way we are closer to a leaf node. • The most popular heuristics is based on information gain, originated with the ID3 system of Quinlan.

  14. Entropy • S is a sample of training examples • p+ is the proportion of positive examples in S • p-is the proportion of negative examples in S • Entropy measures the impurity of S p+

  15. - - + + + - + - + + - + - + + + - - + - + - - + - + - - + - + - + - - - + - - - - + - - + - - - + + + + + + + + - - + - + - + - + + + - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + - - - - - + + + + + + + - - + + + - - + - + - + + - - + + + - - + - + - - + - - + - + - - + - + - + + - - + + - - - + - + - + + - - + + + - - + - + - + + - - + - + Highly Disorganized High Entropy Much Information Required Highly Organized Low Entropy Little Information Required

  16. Information Gain • Gain (S, A) = expected reduction in entropy due to sorting on A • Values (A) is the set of all possible values for attribute A, Sv is the subset of S which attribute A has value v, |S| and | Sv| represent the number of samples in set S and set Sv respectively • Gain(S,A) is the expected reduction in entropy caused by knowing the value of attribute A.

  17. Information Gain Example: Choose A or B ? Split on A Split on B B A 1 0 1 0 100 + 50- 53- 100 + 3 - 100 -

  18. Example Play Tennis Example

  19. Example Humidity High Normal 3+,4- 6+,1- E=.985 E=.592 Gain(S, Humidity) = .94 - 7/14 * 0.985 - 7/14 *.592 = 0.151

  20. Example Wind Weak Strong 6+2- 3+,3- E=.811 E=1.0 Gain(S, Wind) = .94 - 8/14 * 0.811 - 6/14 * 1.0 = 0.048

  21. Example Outlook Sunny Overcast Rain 1,2,8,9,11 3,7,12,13 4,5,6,10,14 2+,3- 4+,0- 3+,2- 0.0 0.970 0.970 Gain(S, Outlook) = 0.246

  22. Example Pick Outlook as the root Outlook Gain(S, Humidity) = 0.151 Gain(S, Wind) = 0.048 Sunny Overcast Rain Gain(S, Temperature) = 0.029 Gain(S, Outlook) = 0.246

  23. Example Pick Outlook as the root Outlook Overcast Sunny Rain Yes 4,5,6,10,14 1,2,8,9,11 3,7,12,13 3+,2- 2+,3- 4+,0- ? ? Continue until: Every attribute is included in path,or, all examples in the leaf have same label

  24. Example Outlook Overcast Sunny Rain Yes 1,2,8,9,11 3,7,12,13 2+,3- 4+,0- ? Gain (Ssunny, Humidity) = .97-(3/5) * 0-(2/5) * 0 = .97 • Gain (Ssunny, Temp) = .97- 0-(2/5) *1 = .57 • Gain (Ssunny, Wind) = .97-(2/5) *1 - (3/5) *.92 = .02

  25. Example Outlook Overcast Sunny Rain Yes Humidity High Normal No Yes Gain (Ssunny, Humidity) = .97-(3/5) * 0-(2/5) * 0 = .97 • Gain (Ssunny, Temp) = .97- 0-(2/5) *1 = .57 • Gain (Ssunny, Wind) = .97-(2/5) *1 - (3/5) *.92 = .02

  26. Example Outlook Overcast Sunny Rain Yes ? Humidity 4,5,6,10,14 High Normal 3+,2- No Yes Gain (Srain, Humidity) = • Gain (Srain, Temp) = • Gain (Srain, Wind) =

  27. Example Outlook Overcast Sunny Rain Yes Wind Humidity Strong Weak High Normal No Yes No Yes

  28. Tutorial/Exercise Questions An experiment has produced the following 3d feature vectors X = (x1, x2, x3) belonging to two classes. Design a decision tree classifier to class an unknown feature vector X = (1, 2, 1). X = (x1, x2, x3) x1 x2 x3Classes 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 2 2 1 2 1 2 2 2 2 2 2 2 1 2 2 1 2 1 2 2 2 1 1 2 1 1 2 1 = ? G53MLE Machine Learning Dr Guoping Qiu

More Related