1 / 45

Data Mining and Machine Learning Decision Trees and ID3

Data Mining and Machine Learning Decision Trees and ID3. David Corne, dwcorne@gmail.com. Decision Trees. Real world applications of DTs. See here for a list: http://www.cbcb.umd.edu/~salzberg/docs/murthy_thesis/survey/node32.html

shauna
Download Presentation

Data Mining and Machine Learning Decision Trees and ID3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining and Machine LearningDecision Trees and ID3 David Corne, dwcorne@gmail.com

  2. Decision Trees

  3. Real world applications of DTs See here for a list: http://www.cbcb.umd.edu/~salzberg/docs/murthy_thesis/survey/node32.html Includes: Agriculture, Astronomy, BiomedicalEngineering, Control Systems, Financial analysis, Manufacturing and Production, Medicine, Molecular biology, Object recognition, Pharmacology, Physics, Plant diseases, Power systems, Remote Sensing, Software development,Text processing:

  4. Field names

  5. Field names Field values

  6. Field names Field values Class values

  7. Why decision trees? Popular, since they are interpretable ... and correspond to human reasoning/thinking about decision-making Can perform quite well in accuracy when compared with other approaches ... and there are good algorithms to learn decision trees from data

  8. Figure 1. Binary Strategy as a tree model. Mohammed MA, Rudge G, Wood G, Smith G, et al. (2012) Which Is More Useful in Predicting Hospital Mortality -Dichotomised Blood Test Results or Actual Test Values? A Retrospective Study in Two Hospitals. PLoS ONE 7(10): e46860. doi:10.1371/journal.pone.0046860 http://www.plosone.org/article/info:doi/10.1371/journal.pone.0046860

  9. Figure 1. Binary Strategy as a tree model.

  10. We will learn the ‘classic’ algorithm to learn a DT from categorical data:

  11. We will learn the ‘classic’ algorithm to learn a DT from categorical data: ID3

  12. Suppose we want a tree that helps us predict someone’s politics, given their gender, age, and wealth

  13. Choose a start node (field) at random

  14. Choose a start node (field) at random Age

  15. Add branches for each value of this field Age old young mid

  16. Check to see what has filtered down Age old young mid 1 L, 2 R 1 L, 1 R 0 L, 1 R

  17. Where possible, assign a class value Age old young mid 1 L, 2 R 1 L, 1 R 0 L, 1 R Right-Wing

  18. Otherwise, we need to add further nodes Age old young mid 1 L, 2 R 1 L, 1 R 0 L, 1 R ? ? Right-Wing

  19. Repeat this process every time we need a new node Age old young mid 1 L, 2 R 1 L, 1 R 0 L, 1 R ? ? Right-Wing

  20. Starting with first new node – choose field at random Age old young mid 1 L, 2 R 1 L, 1 R 0 L, 1 R wealth ? Right-Wing

  21. Check the classes of the data at this node… Age old young mid 1 L, 2 R 1 L, 1 R 0 L, 1 R wealth ? rich Right-Wing poor 1 L, 0 R 1 L, 1 R

  22. And so on … Age old young mid 1 L, 2 R 1 L, 1 R 0 L, 1 R wealth ? rich Right-Wing poor Right-wing 1 L, 1 R

  23. But we can do better than randomly chosen fields!

  24. This is the tree we get if first choice is `gender’

  25. This is the tree we get if first choice is `gender’ gender female male Right-Wing Left-Wing

  26. Algorithms for building decision trees (of this type) Initialise: tree T contains one ‘unexpanded’ node Repeat until no unexpanded nodes remove an unexpanded node U from T expand U by choosing a field add the resulting nodes to T

  27. Algorithms for building decision trees (of this type) – expanding a node ?

  28. Algorithms for building decision trees (of this type) – the essential step Field Value = X Value = Z Value = Y ? ? ?

  29. So, which field? Field Value = X Value = Z Value = Y ? ? ?

  30. Three choices: gender, age, or wealth

  31. Suppose we choose age(table now sorted by age values) Two of the values have a mixture of classes

  32. Suppose we choose wealth(table now sorted by wealth values) One of the values has a mixture of classes - this choice is a bit less mixed up than age?

  33. Suppose we choose gender(table now sorted by gender values) The classes are not mixed up at all within the values

  34. So, at each step where we choose a node to expand, we make the choice where the relationship between the field values and the class values is least mixed up

  35. Measuring ‘mixed-up’ness: Shannon’s entropy measure Suppose you have a bag of N discrete things, and there T different types of things. Where, pT is the proportion of things in the bag that are type T, the entropy of the bag is:

  36. Examples: This mixture: { left leftleftright right} has entropy: − ( 0.6 log(0.6) + 0.4 log(0.4)) = 0.292 This mixture: { A AAAAAAABC } has entropy: − ( 0.8 log(0.8) + 0.1 log(0.1) + 0.1 log(0.1)) =0.278 This mixture: {same samesamesamesamesame} has entropy: − ( 1.0 log(1.0) ) = 0 Lower entropy = less mixed up

  37. Information Gain ID3 chooses the field that maximises Information gain, defined as follows: H(D) is the entropy of the dataset at this point in the tree, just before the new choice of node. H(D|f) is the weighted entropy of the data after choosing field f. It is: e.g. If we choose wealth, and findthat the entropy of poor is 0.2, the entropy of rich is 0.1, and the relative proportions of remaining instances that are rich and poor is 0.6 and 0.4 respectively, then H(D|wealth) would be: 0.2x0.4 + 0.1x0.6 = 0.14 Information Gain is H(D) − H( D | f ) :

  38. Information Gain ... but that’s overly complicated. ... Just remember that, in ID3, you calculate the weighted entropy of each field and choose the field that minimises this

  39. Back here gender, age, or wealth

  40. Suppose we choose age(table now sorted by age values) Weighted entropy is: 0.3333 x − ( 0.5 x log(0.5) + 0.5 x log(0.5) ) + 0.1666 x − ( 1 x log(1) ) + x − ( 0.33 x log(0.33) + 0.66 xlog(0.66) ) 0.3333 0.16666 0.5

  41. Suppose we choose wealth(table now sorted by wealth values) Weighted entropy is: 0.3333 x − ( 0.5 x log(0.5) + 0.5 x log(0.5) ) + x − ( 1 x log(1) ) 0.6666 0.3333

  42. Suppose we choose gender(table now sorted by gender values) Weighted entropy is: 0.3333 x − ( 1 x log (1) ) + x − ( 1 x log (1) ) 0.3333 0.6666 This is the one we would choose ...

  43. Alternatives to Information Gain- all, somehow or other, give a measure of mixed-upnessand have been used in building DTs • Chi Square • Gain Ratio, • Symmetric Gain Ratio, • Gini index • Modified Gini index • Symmetric Gini index • J-Measure • Minimum Description Length, • Relevance • RELIEF • Weight of Evidence

  44. Decision Trees Further reading is on google Interesting topics in context are: Pruning: close a branch down before you hit 0 entropy ( why?) Discretization and regression: trees that deal with real valued fields Decision Forests: what do you think these are?

More Related