Assume that a decision tree has been constructed from training data, and it includes

1 / 2

# Assume that a decision tree has been constructed from training data, and it includes - PowerPoint PPT Presentation

Assume that a decision tree has been constructed from training data, and it includes a node that tests on V at the frontier of the tree, with it left child yielding a prediction of class C1 (because the only two training data there are C1), and the right child predicting

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Assume that a decision tree has been constructed from training data, and it includes' - chinara

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Assume that a decision tree has been constructed from training data, and it includes

a node that tests on V at the frontier of the tree, with it left child yielding a prediction of class C1 (because the only two training data there are C1), and the right child predicting

C2 (because the only training datum there is C2). The situation is illustrated here:

Suppose that during subsequent use, it is found that

i) a large # of items (N > 1000) are classified to the node (with

the test on V to the right)

ii) 50% of these have V= -1 and 50% of these have V = 1

iii) post classification analysis shows that of the N items

reaching the node during usage, 67% were C1 and 33%

were C2

iv) of the 0.5 * N items that went to the left leaf during usage,

67% were C1 and 33% were C2

v) of the 0.5 * N items that went to the right leaf during

usage, 67% were also C1 and 33% were C2

V

-1

1

C1 (2)

C2 (0)

C1 (0)

C2 (1)

a) What was the error rate on the sample of N items that went to the sub-tree

shown above?

pruned

to

b) What would the error rate on the same sample

of N items have been if the sub-tree on previous

page (and reproduced here) had been pruned to

not include the final test on V, but to rather be

a leaf that predicted C1?

V

C1 (2)

C2 (1)

-1

1

C1 (2)

C2 (0)

C1 (0)

C2 (1)

c) Suppose that the decision tree were being used to predict whether a pixel in a brain

image represented part of tumor tissue or part of healthy brain tissue, for purposes of

delineating the portions of the brain that would be removed through surgery (e.g., any tissue

predicted to be tumor would be removed, and any predicted to be healthy would be remain).

Explain the implications of a decision tree that is overfitting in this scenario?