1 / 13

Understanding Decision Trees for Effective Data Interpretation

This chapter delves into the complexities of decision trees and their impact on data interpretation. It covers representational bias, generalization accuracy, and the efficiency of learning procedures, using a simple weather data example. The dumb algorithm, heuristic DT algorithm, and properties of decision trees are analyzed, shedding light on their accuracy and limitations. The concept of entropy, Shannon entropy, majority function, and converting real-valued attributes into solved problems are also discussed. The chapter concludes with insights on rules generation from decision trees and their effectiveness in data analysis.

brita
Download Presentation

Understanding Decision Trees for Effective Data Interpretation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decision Trees Chapter 18 From Data to Knowledge

  2. Concerns • Representational Bias: • Hyperrectangles – does it match domain • Generalization Accuracy • Is the learned concept correct? • Comprehensibility • Medical diagnosis • Efficiency of Learning • Efficiency of Learned Procedure

  3. Simple Example: Weather Data: • Four Features: windy, play, outlook: nominal • Temperature: numeric outlook = sunny | humidity <= 75: yes (2.0) | humidity > 75: no (3.0) outlook = overcast: yes (4.0) outlook = rainy | windy = TRUE: no (2.0) | windy = FALSE: yes (3.0)

  4. Dumb DT Algorithm Build tree: ( discrete features only) If all entries below node are homogenous, stop Else pick a feature at random, create a node for feature and form subtrees for each of the values of the feature. Recurse on each subtree. Will this work?

  5. Properties of Dumb Algorithm • Complexity • Homogeneity cost is O(DataSize) • Splitting is O(DataSize) • Times number of node in tree = bd on work • Accuracy on training set • perfect • Accuracy on test set • Not great. almost random

  6. Many DT models • Random selection worked – • If n-binary features then: • N * 2*(N-1)*2*(N-2).. = O(2^N*N!) UGH! • Which trees are best? • Occam’s razor: small ones (testable?) • Exhaustive search impossible, so maybe Heuristic Search. But what heuristic? • Goal: replace random with heuristic selection

  7. Heuristic DT algorithm • Entropy Set with mixed classes c1, c2,..ck • Entropy(S) = - sum pi* lg(pi) where pi is probability of class ci. • Sum weighted entropies of each subtrees, where weight is proportion of examples in the subtree. • This defines a quality measure on features.

  8. Heuristic score of a feature • Say split on feature f yields: (4+, 4-) and ( 1+, 3-) quality of f = 8/12*E({4+,4-}+ 4/12*E({1+,3-}) = 8/13* 2 + 4/12* (- 1/4*log(1/4) -3/4*log(3/4)) • Do this for every feature! • J48 is roughly dumb + entropy heuristic

  9. Shannon Entropy • Entropy is the only function that: • Is 0 when only 1 class present • Is k if 2^k classes, equally present • Is “additive” ie. • E(X,Y) = E(X)+E(Y) if X and Y are independent. • Entropy sometimes called uncertainty and sometimes information. • Uncertainty defined on RV where “draws” are from the set of classes.

  10. Majority Function • Suppose 2n boolean features. • Class defined by n or more features are on. • How big is the tree? • At least 2n choose n leaves. • Prototype Function: At least k of n are true is a common medical concept. • Concepts that are prototypical do not match the representational bias of DTS.

  11. Dts with real valued attributes • Idea: convert to solved problem • For each real valued attribute f with values v1, v2,… vn (sorted) and binary features: f1< (v1+v2)/2 f2 < (v2+v3/2) etc • Other approaches possible. • E.g. fi<any vj so no sorting needed

  12. DTs ->Rules (Part) • For each leaf, we make a rule by collecting the tests to the leaf. • Number of rules = number of leaves • Simplification: test each condition on a rule and see if dropping it harms accuracy. • Can we go from Rules to DTs • Not easily. Hint: no root.

  13. Summary • Comprehensible if tree is not large. • Effective if small number of features sufficient. Bias. • Does multi-class problems naturally. • Easily generates rules (expert system) • And measures of confidence (count) • Can be extended for regression. • Easy to implement and low complexity

More Related