Decision trees with minimal test costs icml 2004 banff canada
Download
1 / 33

Cost Sensitive Decision Trees - PowerPoint PPT Presentation


  • 271 Views
  • Uploaded on

Decision Trees with Minimal Test Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Etc. Goal When test examples contain missing values Decide what to do? Do a test to obtain a value

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Cost Sensitive Decision Trees' - MikeCarlo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Decision trees with minimal test costs icml 2004 banff canada l.jpg

Decision Trees with Minimal Test Costs(ICML 2004, Banff, Canada)

Charles X. Ling, Univ of Western Ontario, Canada

Qiang Yang, HK UST, Hong Kong

Etc.


Slide2 l.jpg
Goal

  • When test examples contain missing values

    • Decide what to do?

      • Do a test to obtain a value

      • Skip the attribute and continue with the rest of the decision making

    • Ultimate decision: minimize total cost

      • Total cost=mis-classification cost + test cost


Assumptions l.jpg
Assumptions

  • Mis-classification cost, for one case

    • FP=false positive

    • FN=false negative

  • Test cost

    • = C(i) for attribute A_i, regardless of values

  • Examples in weather domain


Expected results l.jpg
Expected results

  • Let there be a total of P+N training cases

    • If P*FN+N*FP < C*(P+N)

      • Then?

    • IF P*FN+N*FP >> C*(P+N)

      • Then?


Leaf labeling l.jpg
Leaf Labeling

  • How to label a node?

    • P=positive examples

    • N=negative examples

    • P*FN=

    • N*FP=

    • Label the node as + if …


Splitting criteria on attribute a l.jpg
Splitting Criteria on Attribute A

  • Total Cost=?

    • Say two branches: 1 and 2

    • (P1, N1) on branch 1, and (P2,N2) on 2

    • Test cost=

    • Training data may contain missing values on this attribute

      • Ratio = P0/N0

    • Total=? (page 3)


Properties l.jpg
Properties

  • The relative difference of misclassification cost and test cost  tree!

  • Attributes with small test cost is always chosen first

  • When test cost is increased, the attribute goes down the tree


Test strategies l.jpg
Test Strategies

  • OST=optimal sequential test

    Always test when value is missing, following the order given by tree

  • Stops whenever meeting missing value

    Uses the ratio of P0/N0 to make decision at node


Experiments l.jpg
Experiments

  • Comparison with C4.5


Costs in machine learning l.jpg
Costs in Machine Learning

  • Most inductive learning algorithms: minimizing classification errors

    • Different types of misclassification have different costs, e.g. FP and FN

  • In this talk:

    • Test costs should also be considered

    • Cost sensitive learning considers a variety of costs; see survey by Peter Turney (2000)


Applications l.jpg
Applications

  • Medical Practice

    • Doctors may ask a patient to go through a number of tests (e.g., Blood tests, X-rays)

    • Which of these new tests will bring about higher value?

  • Biological Experimental Design

    • When testing a new drug, new tests are costly

    • which experiments to perform?


Previous work l.jpg
Previous Work

  • Many previous works consider the two types of cost separately – an obvious oversight

  • (Turney 1995): ICET, uses genetic algorithm to build trees to minimize the total cost

  • (Zubek and Dieterrich 2002): a Markov Decision Process (MDP), searches in a state space for optimal policies

  • (Greiner et al. 2002): PAC learning


An example of the problem l.jpg
An Example of The Problem

Training: with ?, cannot obtain values

Goal 1: build a tree that minimizes the total cost

Test: with many ?, may obtain values at a cost

Goal 2: obtain test values at a cost to minimize the total cost


Example medical diagnosis l.jpg

blood test

pressure

essay

?

?

?

temperature

cardiogram

?

39oc

Example – Medical Diagnosis

Is the patient healthy?

Which test should be taken first?

Which test to perform next?

Concern: cost the patient as little as possible while maintaining low mis-diagnosis risk


What are total costs l.jpg
What are Total Costs?

  • Assumption: binary classes, costs: FP and FN

  • Goal: minimize total cost

    • Total cost = misclassification cost + test cost

  • Previous Work

    • Information Gain as a attribute selection criterion

  • In this work, need a new attribute selection criterion

    • Total cost = Sum_i { Probability(i)*Cost(i)}


Attribute selection criterion c4 5 l.jpg
Attribute Selection Criterion: C4.5

Select the attribute with minimal total cost

  • C4.5: minimal entropy

  • If growing a tree has a smaller total cost

    • then choose an attribute with minimal total cost

    • else stop and form a leaf

  • How to label a leaf node?

    • Label leaf according to minimal total cost

    • If (P×FN  N×FP) then class = positiveelse class = negative

    • Example: {P, P, P, N, N, N, N} FN=$10, FP=$1

      • Information Gain+Majority class: Predict N, 3 mistakes.

      • Total cost: if we predict P, 4*FP=4*1=$4;

        • If we predict N, 3*FN = $30. Conclusion: Predict P.


  • Minimal cost summary l.jpg
    Minimal cost: summary

    • Attribute selection criterion: minimal total cost(Ctotal = Cmc + Ctest) instead of minimal entropy in C4.5

    • If growing a tree has a smaller total cost, then choose an attribute with minimal total cost. Otherwise, stop and form a leaf.

    • Label leaf also according to minimal total cost:

      • Suppose the leaf have P positive examples and N negative examples

      • FP denotes the cost of a false positive example and FN false negative

      • If (P×FN N×FP)THEN label = positive ELSE label = negative


    A tree building example l.jpg
    A Tree Building Example

    Cmc = min(P×FN, N×FP)

    Ctest = 0Ctotal = Cmc + Ctest

    P:N

    Attribute A with a test cost C

    Consider attribute A for a potential splitting attribute

    A = v1

    A = v2

    C’mc= min(P1×FN, N1×FP) + min(P2×FN, N2×FP)

    C’test = (P1 + N1 + P2 + N2) × C

    C’total = C’mc + C’test

    P2:N2

    P1:N1

    • If C’total < Ctotal, splitting on A would reduce the total cost Choose an attribute with the minimal total cost for splitting

    • If C’totalCtotal for all remaining attributes, no further sub-tree will be built, and the set will become a leaf.


    Missing values values l.jpg
    Missing values: ? values

    • First, how to handle ? values in training data

    • Previous work

      • built ? branch;

      • problematic

    • We will

      • deal with unknown values in the training set:

      • no branch for ? will be built,

      • examples are “gathered” inside the internal nodes


    Desirable properties l.jpg

    A1

    A1

    P

    All test costs are 300

    A6

    A6

    P

    N

    P

    N

    P

    P

    P

    P

    P

    P

    All test costs are 20

    P

    N

    N

    P

    N

    N

    All test costs are 0

    Desirable Properties

    1. Effect of difference between misclassification costs and the test costs


    Slide21 l.jpg

    A1

    A2

    A3

    A4

    A5

    A6

    # 1

    20

    20

    20

    20

    20

    20

    # 2

    200

    20

    100

    100

    200

    200

    # 3

    200

    100

    100

    100

    20

    200

    A1

    A2

    A5

    A6

    A6

    A1

    N

    P

    P

    P

    P

    P

    P

    A1

    P

    P

    N

    N

    P

    N

    N

    P

    N

    P

    N

    P

    P

    P

    N

    P

    N

    P

    P

    2. Prefer attribute with smaller test costs


    Slide22 l.jpg

    A6

    A6

    A1

    P

    P

    A1

    N

    P

    P

    A2

    N

    A6

    A6

    P

    P

    P

    P

    P

    N

    N

    P

    N

    N

    N

    N

    P

    P

    N

    P

    N

    Cost of A1=20

    Cost of A1=50

    Cost of A1=80

    3. If test cost increases, attribute tends to be “pushed” down and “falls out” of the tree


    Cost reduction l.jpg
    Cost Reduction

    P:N

    A

    P1:N1

    P2:N2

    • Equivalent to Information Gain

      • Min{P*FN, N*FP} – [cost(A) + Min{P1*FN,N1*FP} + Min{P2*FN, N2*FP}]

    • We choose the attribute with the largest such value as a splitting attribute


    Missing values in test cases l.jpg
    Missing values in test cases

    A New patient arrives:


    Ost intuition l.jpg
    OST: Intuition

    • Follow the test-cost sensitive decision tree

    • When reaching a node, make a decision

      • If the node is a leaf node…

      • If the node is an internal node, and we have missing value, then obtain the value

    • Evaluation: calculate the total cost when the ground truth is known


    Four testing strategies l.jpg

    A1

    A6

    A6

    P

    P

    P

    P

    P

    N

    N

    P

    N

    N

    A1

    P

    N

    P

    N

    P

    P

    Four Testing Strategies

    • First: Optimal Sequential Test (OST)(Simple batch test: do all tests)

    • Second: No test will be performed, predict with internal node

    • Third: No test will be performed, predict with weighted sum of subtrees

    • Fourth: A new tree is built dynamically for each test case using only the known attributes


    Experiment settings l.jpg
    Experiment - settings

    • Five dataset, binary-class

    • 60/40 for training/testing, repeat 5 times

    • Unknown values for training/test examples are selected randomly by a specific probability

    • Also compare to C4.5 tree, using OST for testing


    Results with different of unknown l.jpg

    No test, internal

    No test, lazy tree

    C4.5 tree, OST

    Results with different % of unknown

    • OST is best; M4 and C4.5 next; M3 is worst

    • OST not increase with more ?; others do overall

    No test, distributed


    Results with different test costs l.jpg

    No test, internal

    No test, lazy tree

    C4.5 tree, OST

    Results with different test costs

    • With large test costs, OST = M2 = M3 = M4

    • C4.5 is much worse (tree building is cost-insensitive)

    No test, distributed


    Results with unbalanced class costs l.jpg

    No test, internal

    No test, distributed

    No test, lazy tree

    C4.5 tree, OST

    Results with unbalanced class costs

    • With large test costs, OST = M2 = M4

    • C4.5 is much worse (tree building is cost-insensitive)

    • M3 is worse than M2… (M3 is used in C4.5)


    Comparing ost c4 5 cross 6 datasets l.jpg
    Comparing OST/C4.5 cross 6 datasets

    • OST always outperforms C4.5


    Conclusions l.jpg
    Conclusions

    • New tree building algorithm for minimal costs

      • Desirable properties

      • Computationally efficient (similar to C4.5)

    • Test strategies (OST and batch) are very effective

    • Can solve many real-world diagnosis problems


    Future work l.jpg
    Future Work

    • More intelligent “Batch Test” methods

    • Consider cost of additional batch test

      • Optimal sequential batch testbatch 1 = (test1, test 2)batch 2 = (test 3, test 4, test 5), …

    • Other learning algorithms with minimal total cost

    • A wrapper that works for any “black box”


    ad