Error Analysis and Overfitting Implications in Decision Trees

Assume that a decision tree has been constructed from training data, and it includes a node that tests on V at the frontier of the tree, with it left child yielding a prediction of class C1 (because the only two training data there are C1), and the right child predicting C2 (because the only training datum there is C2). The situation is illustrated here: Suppose that during subsequent use, it is found that i) a large # of items (N > 1000) are classified to the node (with the test on V to the right) ii) 50% of these have V= -1 and 50% of these have V = 1 iii) post classification analysis shows that of the N items reaching the node during usage, 67% were C1 and 33% were C2 iv) of the 0.5 * N items that went to the left leaf during usage, 67% were C1 and 33% were C2 v) of the 0.5 * N items that went to the right leaf during usage, 67% were also C1 and 33% were C2 V -1 1 C1 (2) C2 (0) C1 (0) C2 (1) a) What was the error rate on the sample of N items that went to the sub-tree shown above?

pruned to b) What would the error rate on the same sample of N items have been if the sub-tree on previous page (and reproduced here) had been pruned to not include the final test on V, but to rather be a leaf that predicted C1? V C1 (2) C2 (1) -1 1 C1 (2) C2 (0) C1 (0) C2 (1) c) Suppose that the decision tree were being used to predict whether a pixel in a brain image represented part of tumor tissue or part of healthy brain tissue, for purposes of delineating the portions of the brain that would be removed through surgery (e.g., any tissue predicted to be tumor would be removed, and any predicted to be healthy would be remain). Explain the implications of a decision tree that is overfitting in this scenario?

Error Analysis and Overfitting Implications in Decision Trees

Error Analysis and Overfitting Implications in Decision Trees

Presentation Transcript

Decision tree

It has been…

Data Mining and Decision Tree

It has always been common knowledge,

Decision tree

DATA MINING: DEFINITIONS AND DECISION TREE EXAMPLES

Processing a Decision Tree

It has been a busy time for our athletes!

PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

It has been seen/noticed/ observed that:

It is a word that has been misunderstood in our modern era.

CAMCC is a registered charity that has been

The Federal Big Data Initiative: Where it has been and where it is going

DECISION TREE

PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

DECISION TREE

There has been a fire!

All day long (All day long) I've been with Jesus, It has been (it has been)

A Lawn Deterioration Model Constructed from Image Data

Football And Betting Online - Has It Been Lucrative?

PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

Decision Tree