1 / 22

CSC 4510 – Machine Learning

Lecture 4: Decision Trees, continued. CSC 4510 – Machine Learning. Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/. color. green. red. blue. shape. pos. neg. circle. triangle. square. neg. neg.

elvina
Download Presentation

CSC 4510 – Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 4: Decision Trees, continued CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ CSC 4510 - M.A. Papalaskari - Villanova University

  2. color green red blue shape pos neg circle triangle square neg neg pos Tiny Example of Decision Tree Learning • Instance attributes: <size, color, shape> • C = {positive, negative} • D: CSC 4510 - M.A. Papalaskari - Villanova University

  3. color color green green red red blue blue shape shape pos C B neg circle circle triangle triangle square square neg B C neg pos A Decision Trees • Tree-based classifiers for instances represented as feature-vectors. Nodes test features, there is one branch for each value of the feature, and leaves specify the category. • Can represent arbitrary conjunction and disjunction. Can represent any classification function over discrete feature vectors. • Can be rewritten as a set of rules, i.e. disjunctive normal form (DNF). • red  circle → pos • red  circle → A blue → B; red  square → B green → C; red  triangle → C CSC 4510 - M.A. Papalaskari - Villanova University

  4. color green red blue Top-Down Decision Tree Creation <big, red, circle>: + <small, red, circle>: + <small, red, square>:  <big, blue, circle>:  <big, red, circle>: + <small, red, circle>: + <small, red, square>:  CSC 4510 - M.A. Papalaskari - Villanova University

  5. color green red blue shape circle triangle square Top-Down Decision Tree Creation <big, red, circle>: + <small, red, circle>: + <small, red, square>:  <big, blue, circle>:  <big, red, circle>: + <small, red, circle>: + <small, red, square>:  neg neg <big, blue, circle>:  pos neg pos <big, red, circle>: + <small, red, circle>: + <small, red, square>:  CSC 4510 - M.A. Papalaskari - Villanova University

  6. color green red blue First step in detail Start Value Examples Count Probability positive: <big, red, circle> 2 .5 <small, red, circle> negative: <small, red, square> 2 .5 <big, blue, circle> Color=red Value/Examples Count Probability positive: 2 .667 <big, red, circle> <small, red, circle> negative: 1 .333 <small, red, square> Color=green Value /Examples Count Probability positive: none 0 0 negative: none 0 0 <small, red, square> Color=blue Value /Examples Count Probability positive: none 0 0 negative: 1 1 <small, red, square> CSC 4510 - M.A. Papalaskari - Villanova University

  7. color green red blue First step in detail Entropy=1 Start Value Examples Count Probability positive: <big, red, circle> 2 .5 <small, red, circle> negative: <small, red, square> 2 .5 <big, blue, circle> Entropy=0.92 Color=red Value/Examples Count Probability positive: 2 .667 <big, red, circle> <small, red, circle> negative: 1 .333 <small, red, square> Color=green Value /Examples Count Probability positive: none 0 0 negative: none 0 0 Entropy=0 Color=blue Value /Examples Count Probability positive: none 0 0 negative: 1 1 <small, red, square> CSC 4510 - M.A. Papalaskari - Villanova University

  8. color green red blue First step in detail Start Value Examples Count Probability positive: <big, red, circle> 2 .5 <small, red, circle> negative: <small, red, square> 2 .5 <big, blue, circle> Entropy=1 Gain(Color) = Entropy(start) – (Entropy(red)*(3/4) – Entropy(blue)*(1/4) = 1 – 0.69 – 0 = 0.31 Entropy=0.92 Entropy=0 Color=red Value/Examples Count Probability positive: 2 .667 <big, red, circle> <small, red, circle> negative: 1 .333 <small, red, square> Color=green Value /Examples Count Probability positive: none 0 0 negative: none 0 0 Entropy=0 Color=blue Value /Examples Count Probability positive: none 0 0 negative: 1 1 <small, red, square> CSC 4510 - M.A. Papalaskari - Villanova University

  9. size small BIG First step in detail- Split on Size? Start Value Examples Count Probability positive: <big, red, circle> 2 .5 <small, red, circle> negative: <small, red, square> 2 .5 <big, blue, circle> Size=BIG Value/Examples Count Probability positive: 1 .5 <big, red, circle> negative: 1 .5 <big, blue, circle> Size = small Value /Examples Count Probability positive: 1 .5 <small, red, circle> negative: <small, red, square>1 .5 CSC 4510 - M.A. Papalaskari - Villanova University

  10. size small BIG First step in detail- Split on Size Start Value Examples Count Probability positive: <big, red, circle> 2 .5 <small, red, circle> negative: <small, red, square> 2 .5 <big, blue, circle> Entropy=1 Entropy=1 Entropy=1 Size=BIG Value/Examples Count Probability positive: 1 .5 <big, red, circle> negative: 1 .5 <big, blue, circle> Size = small Value /Examples Count Probability positive: 1 .5 <small, red, circle> negative: <small, red, square>1 .5 CSC 4510 - M.A. Papalaskari - Villanova University

  11. size small BIG First step in detail- Split on Size Start Value Examples Count Probability positive: <big, red, circle> 2 .5 <small, red, circle> negative: <small, red, square> 2 .5 <big, blue, circle> Entropy=1 Gain(Size) = Entropy(start) – (Entropy(BIG)*(2/4) – Entropy(small)*(2/4) = 1 – 1/2 – 1/2 = 0 Entropy=1 Entropy=1 Size=BIG Value/Examples Count Probability positive: 1 .5 <big, red, circle> negative: 1 .5 <big, blue, circle> Size = small Value /Examples Count Probability positive: 1 .5 <small, red, circle> negative: <small, red, square>1 .5 CSC 4510 - M.A. Papalaskari - Villanova University

  12. color green red blue shape circle triangle square Top-Down Decision Tree Induction • Build a tree top-down by divide and conquer. <big, red, circle>: + <small, red, circle>: + <small, red, square>:  <big, blue, circle>:  <big, red, circle>: + <small, red, circle>: + <small, red, square>:  neg neg <big, blue, circle>:  pos neg pos <big, red, circle>: + <small, red, circle>: + <small, red, square>:  CSC 4510 - M.A. Papalaskari - Villanova University

  13. Another view (Aispace) CSC 4510 - M.A. Papalaskari - Villanova University

  14. Picking a Good Split Feature • Goal is to have the resulting tree be as small as possible, per Occam’s razor. • Finding a minimal decision tree (nodes, leaves, or depth) is an NP-hard optimization problem. • Top-down divide-and-conquer method does a greedy search for a simple tree but does not guarantee to find the smallest. • General lesson in ML: “Greed is good.” • Want to pick a feature that creates subsets of examples that are relatively “pure” in a single class so they are “closer” to being leaf nodes. • There are a variety of heuristics for picking a good test, a popular one is based on information gain that originated with the ID3 system of Quinlan (1979). CSC 4510 - M.A. Papalaskari - Villanova University

  15. Entropy • Entropy (disorder, impurity) of a set of examples, S, relative to a binary classification is: where p1 is the fraction of positive examples in S and p0 is the fraction of negatives. • If all examples are in one category, entropy is zero (we define 0log(0)=0) • If examples are equally mixed (p1=p0=0.5), entropy is a maximum of 1. • Entropy can be viewed as the number of bits required on average to encode the class of an example in S where data compression (e.g. Huffman coding) is used to give shorter codes to more likely cases. • For multi-class problems with c categories, entropy generalizes to: CSC 4510 - M.A. Papalaskari - Villanova University

  16. Entropy Plot for Binary Classification CSC 4510 - M.A. Papalaskari - Villanova University

  17. 2+, 2 : E=1 shape 2+, 2 : E=1 size 2+, 2  : E=1 color big small 1+,1 1+,1 E=1 E=1 red blue 2+,1 0+,1 E=0.918 E=0 circle square 2+,1 0+,1 E=0.918 E=0 Gain=1(0.51 + 0.51) = 0 Gain=1(0.750.918 + 0.250) = 0.311 Gain=1(0.750.918 + 0.250) = 0.311 Information Gain • The information gain of a feature F is the expected reduction in entropy resulting from splitting on this feature. where Sv is the subset of S having value v for feature F. • Entropy of each resulting subset weighted by its relative size. • Example summary • <big, red, circle>: + <small, red, circle>: + • <small, red, square>:  <big, blue, circle>:  CSC 4510 - M.A. Papalaskari - Villanova University

  18. Bias in Decision-Tree Induction • Information-gain gives a bias for trees with minimal depth. Recall: • Bias • Any criteria other than consistency with the training data that is used to select a hypothesis. CSC 4510 - M.A. Papalaskari - Villanova University

  19. History of Decision-Tree Research • Hunt and colleagues use exhaustive search decision-tree methods (CLS) to model human concept learning in the 1960’s. • In the late 70’s, Quinlan developed ID3 with the information gain heuristic to learn expert systems from examples. • Simulataneously, Breiman and Friedman and colleagues develop CART (Classification and Regression Trees), similar to ID3. • In the 1980’s a variety of improvements are introduced to handle noise, continuous features, missing features, and improved splitting criteria. Various expert-system development tools results. • Quinlan’s updated decision-tree package (C4.5) released in 1993. • Weka includes Java version of C4.5 called J48. CSC 4510 - M.A. Papalaskari - Villanova University

  20. Additional Decision Tree Issues • Better splitting criteria • Information gain prefers features with many values. • Continuous features • Predicting a real-valued function (regression trees) • Missing feature values • Features with costs • Misclassification costs • Incremental learning • ID4 • ID5 • Mining large databases that do not fit in main memory CSC 4510 - M.A. Papalaskari - Villanova University

  21. Class Exercise • Practice using decision tree learning on some of the sample datasets available in AISpace • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ CSC 4510 - M.A. Papalaskari - Villanova University

  22. Resources: Datasets • UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html • UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html • Statlib: http://lib.stat.cmu.edu/ • Delve: http://www.cs.utoronto.ca/~delve/ CSC 4510 - M.A. Papalaskari - Villanova University

More Related