Loading in 5 sec....

Decision Trees with Minimal Test Costs (ICML 2004, Banff, Canada)PowerPoint Presentation

Decision Trees with Minimal Test Costs (ICML 2004, Banff, Canada)

- 278 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Cost Sensitive Decision Trees' - MikeCarlo

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Decision Trees with Minimal Test Costs(ICML 2004, Banff, Canada)

Charles X. Ling, Univ of Western Ontario, Canada

Qiang Yang, HK UST, Hong Kong

Etc.

Goal

- When test examples contain missing values
- Decide what to do?
- Do a test to obtain a value
- Skip the attribute and continue with the rest of the decision making

- Ultimate decision: minimize total cost
- Total cost=mis-classification cost + test cost

- Decide what to do?

Assumptions

- Mis-classification cost, for one case
- FP=false positive
- FN=false negative

- Test cost
- = C(i) for attribute A_i, regardless of values

- Examples in weather domain

Expected results

- Let there be a total of P+N training cases
- If P*FN+N*FP < C*(P+N)
- Then?

- IF P*FN+N*FP >> C*(P+N)
- Then?

- If P*FN+N*FP < C*(P+N)

Leaf Labeling

- How to label a node?
- P=positive examples
- N=negative examples
- P*FN=
- N*FP=
- Label the node as + if …

Splitting Criteria on Attribute A

- Total Cost=?
- Say two branches: 1 and 2
- (P1, N1) on branch 1, and (P2,N2) on 2
- Test cost=
- Training data may contain missing values on this attribute
- Ratio = P0/N0

- Total=? (page 3)

Properties

- The relative difference of misclassification cost and test cost tree!
- Attributes with small test cost is always chosen first
- When test cost is increased, the attribute goes down the tree

Test Strategies

- OST=optimal sequential test
Always test when value is missing, following the order given by tree

- Stops whenever meeting missing value
Uses the ratio of P0/N0 to make decision at node

Experiments

- Comparison with C4.5

Costs in Machine Learning

- Most inductive learning algorithms: minimizing classification errors
- Different types of misclassification have different costs, e.g. FP and FN

- In this talk:
- Test costs should also be considered
- Cost sensitive learning considers a variety of costs; see survey by Peter Turney (2000)

Applications

- Medical Practice
- Doctors may ask a patient to go through a number of tests (e.g., Blood tests, X-rays)
- Which of these new tests will bring about higher value?

- Biological Experimental Design
- When testing a new drug, new tests are costly
- which experiments to perform?

Previous Work

- Many previous works consider the two types of cost separately – an obvious oversight
- (Turney 1995): ICET, uses genetic algorithm to build trees to minimize the total cost
- (Zubek and Dieterrich 2002): a Markov Decision Process (MDP), searches in a state space for optimal policies
- (Greiner et al. 2002): PAC learning

An Example of The Problem

Training: with ?, cannot obtain values

Goal 1: build a tree that minimizes the total cost

Test: with many ?, may obtain values at a cost

Goal 2: obtain test values at a cost to minimize the total cost

pressure

essay

?

?

?

temperature

cardiogram

?

39oc

Example – Medical DiagnosisIs the patient healthy?

Which test should be taken first?

Which test to perform next?

Concern: cost the patient as little as possible while maintaining low mis-diagnosis risk

What are Total Costs?

- Assumption: binary classes, costs: FP and FN
- Goal: minimize total cost
- Total cost = misclassification cost + test cost

- Previous Work
- Information Gain as a attribute selection criterion

- In this work, need a new attribute selection criterion
- Total cost = Sum_i { Probability(i)*Cost(i)}

Attribute Selection Criterion: C4.5 How to label a leaf node?

Select the attribute with minimal total cost

- C4.5: minimal entropy
- If growing a tree has a smaller total cost
- then choose an attribute with minimal total cost
- else stop and form a leaf

- Label leaf according to minimal total cost
- If (P×FN N×FP) then class = positiveelse class = negative
- Example: {P, P, P, N, N, N, N} FN=$10, FP=$1
- Information Gain+Majority class: Predict N, 3 mistakes.
- Total cost: if we predict P, 4*FP=4*1=$4;
- If we predict N, 3*FN = $30. Conclusion: Predict P.

Minimal cost: summary

- Attribute selection criterion: minimal total cost(Ctotal = Cmc + Ctest) instead of minimal entropy in C4.5
- If growing a tree has a smaller total cost, then choose an attribute with minimal total cost. Otherwise, stop and form a leaf.
- Label leaf also according to minimal total cost:
- Suppose the leaf have P positive examples and N negative examples
- FP denotes the cost of a false positive example and FN false negative
- If (P×FN N×FP)THEN label = positive ELSE label = negative

A Tree Building Example

Cmc = min(P×FN, N×FP)

Ctest = 0Ctotal = Cmc + Ctest

P:N

Attribute A with a test cost C

Consider attribute A for a potential splitting attribute

A = v1

A = v2

C’mc= min(P1×FN, N1×FP) + min(P2×FN, N2×FP)

C’test = (P1 + N1 + P2 + N2) × C

C’total = C’mc + C’test

P2:N2

P1:N1

- If C’total < Ctotal, splitting on A would reduce the total cost Choose an attribute with the minimal total cost for splitting
- If C’totalCtotal for all remaining attributes, no further sub-tree will be built, and the set will become a leaf.

Missing values: ? values

- First, how to handle ? values in training data
- Previous work
- built ? branch;
- problematic

- We will
- deal with unknown values in the training set:
- no branch for ? will be built,
- examples are “gathered” inside the internal nodes

A1

P

All test costs are 300

A6

A6

P

N

P

N

P

P

P

P

P

P

All test costs are 20

P

N

N

P

N

N

All test costs are 0

Desirable Properties1. Effect of difference between misclassification costs and the test costs

A2

A3

A4

A5

A6

# 1

20

20

20

20

20

20

# 2

200

20

100

100

200

200

# 3

200

100

100

100

20

200

A1

A2

A5

A6

A6

A1

N

P

P

P

P

P

P

A1

P

P

N

N

P

N

N

P

N

P

N

P

P

P

N

P

N

P

P

2. Prefer attribute with smaller test costs

A6

A1

P

P

A1

N

P

P

A2

N

A6

A6

P

P

P

P

P

N

N

P

N

N

N

N

P

P

N

P

N

Cost of A1=20

Cost of A1=50

Cost of A1=80

3. If test cost increases, attribute tends to be “pushed” down and “falls out” of the tree

Cost Reduction

P:N

A

P1:N1

P2:N2

- Equivalent to Information Gain
- Min{P*FN, N*FP} – [cost(A) + Min{P1*FN,N1*FP} + Min{P2*FN, N2*FP}]

- We choose the attribute with the largest such value as a splitting attribute

Missing values in test cases

A New patient arrives:

OST: Intuition

- Follow the test-cost sensitive decision tree
- When reaching a node, make a decision
- If the node is a leaf node…
- If the node is an internal node, and we have missing value, then obtain the value

- Evaluation: calculate the total cost when the ground truth is known

A6

A6

P

P

P

P

P

N

N

P

N

N

A1

P

N

P

N

P

P

Four Testing Strategies- First: Optimal Sequential Test (OST)(Simple batch test: do all tests)
- Second: No test will be performed, predict with internal node
- Third: No test will be performed, predict with weighted sum of subtrees
- Fourth: A new tree is built dynamically for each test case using only the known attributes

Experiment - settings

- Five dataset, binary-class
- 60/40 for training/testing, repeat 5 times
- Unknown values for training/test examples are selected randomly by a specific probability
- Also compare to C4.5 tree, using OST for testing

No test, lazy tree

C4.5 tree, OST

Results with different % of unknown- OST is best; M4 and C4.5 next; M3 is worst
- OST not increase with more ?; others do overall

No test, distributed

No test, lazy tree

C4.5 tree, OST

Results with different test costs- With large test costs, OST = M2 = M3 = M4
- C4.5 is much worse (tree building is cost-insensitive)

No test, distributed

No test, distributed

No test, lazy tree

C4.5 tree, OST

Results with unbalanced class costs- With large test costs, OST = M2 = M4
- C4.5 is much worse (tree building is cost-insensitive)
- M3 is worse than M2… (M3 is used in C4.5)

Comparing OST/C4.5 cross 6 datasets

- OST always outperforms C4.5

Conclusions

- New tree building algorithm for minimal costs
- Desirable properties
- Computationally efficient (similar to C4.5)

- Test strategies (OST and batch) are very effective
- Can solve many real-world diagnosis problems

Future Work

- More intelligent “Batch Test” methods
- Consider cost of additional batch test
- Optimal sequential batch testbatch 1 = (test1, test 2)batch 2 = (test 3, test 4, test 5), …

- Other learning algorithms with minimal total cost
- A wrapper that works for any “black box”

Download Presentation

Connecting to Server..