1 / 27

Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Decision Trees with Minimal Costs (ICML 2004, Banff, Canada). Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang, Univ of Western Ontario , Canada Shichao Zhang, UTS, Australia Contact: cling@csd.uwo.ca. Outline. Introduction

kendall
Download Presentation

Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decision Trees with Minimal Costs(ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang, Univ of Western Ontario, Canada Shichao Zhang, UTS, Australia Contact: cling@csd.uwo.ca

  2. Outline • Introduction • Building Trees with Minimal Total Costs • Testing Strategies • Experiments and Results • Conclusions

  3. Costs in Machine Learning • Most inductive learning algorithms: minimizing classification errors • Different types of misclassification have different costs, e.g. FP and FN • In this talk: • Test costs should also be considered • Cost sensitive learning considers a variety of costs; see survey by Peter Turney (2000)

  4. Applications • Medical Practice • Doctors may ask a patient to go through a number of tests (e.g., Blood tests, X-rays) • Which of these new tests will bring about higher value? • Biological Experimental Design • When testing a new drug, new tests are costly • which experiments to perform?

  5. Previous Work • Many previous works consider the two types of cost separately – an obvious oversight • (Turney 1995): ICET, uses genetic algorithm to build trees to minimize the total cost • (Zubek and Dieterrich 2002): a Markov Decision Process (MDP), searches in a state space for optimal policies • (Greiner et al. 2002): PAC learning

  6. An Example of Our Problem Training: with ?, cannot obtain values Goal 1: build a tree that minimizes the total cost Test: with many ?, may obtain values at a cost Goal 2: obtain test values at a cost to minimize the total cost

  7. Outline • Introduction • Building Trees with Minimal Total Costs • Testing Strategies • Experiments and Results • Conclusions

  8. Building Trees with Minimal Total Costs • Assumption: binary classes, costs: FP and FN • Goal: minimize total cost • Total cost = misclassification cost + test cost • Previous Work • Information Gain as a attribute selection criterion • In this work, need a new attribute selection criterion

  9. Attribute Selection Criterion: C4.5 Minimal total cost (C4.5: minimal entropy) • If growing a tree has a smaller total costthen choose an attribute with minimal total costelse stop and form a leaf

  10. Label leaf according to minimal total cost If (P×FN  N×FP)then class = positiveelse class = negative

  11. Difference on ? values • First, how to handle ? values in training data • Previous work • built ? branch; • problematic • This work • deal with unknown values in the training set: • no branch for ? will be built, • examples are “gathered” inside the internal nodes

  12. A1 A1 P All test costs are 300 A6 A6 P N P N P P P P P P All test costs are 20 P N N P N N All test costs are 0 Desirable Properties 1. Effect of difference between misclassification costs and the test costs

  13. A1 A2 A3 A4 A5 A6 # 1 20 20 20 20 20 20 # 2 200 20 100 100 200 200 # 3 200 100 100 100 20 200 A1 A2 A5 A6 A6 A1 N P P P P P P A1 P P N N P N N P N P N P P P N P N P P 2. Prefer attribute with smaller test costs

  14. A6 A6 A1 P P A1 N P P A2 N A6 A6 P P P P P N N P N N N N P P N P N Cost of A1=20 Cost of A1=50 Cost of A1=80 3. If test cost increases, attribute tends to be “pushed” down and “falls out” of the tree

  15. Outline • Introduction • Building Trees with Minimal Total Costs • Testing Strategies • Experiments and Results • Conclusions

  16. Missing values in test cases A New patient arrives:

  17. OST: Intuition • Explain the intuition of OST here

  18. A1 A6 A6 P P P P P N N P N N A1 P N P N P P Four Testing Strategies • First: Optimal Sequential Test (OST)(Simple batch test: do all tests) • Second: No test will be performed, predict with internal node • Third: No test will be performed, predict with weighted sum of subtrees • Fourth: A new tree is built dynamically for each test case using only the known attributes

  19. Outline • Introduction • Building Trees with Minimal Total Costs • Testing Strategies • Experiments and Results • Conclusions

  20. Experiment - settings • Five dataset, binary-class • 60/40 for training/testing, repeat 5 times • Unknown values for training/test examples are selected randomly by a specific probability • Also compare to C4.5 tree, using OST for testing

  21. No test, internal No test, lazy tree C4.5 tree, OST Results with different % of unknown • OST is best; M4 and C4.5 next; M3 is worst • OST not increase with more ?; others do overall No test, distributed

  22. No test, internal No test, lazy tree C4.5 tree, OST Results with different test costs • With large test costs, OST = M2 = M3 = M4 • C4.5 is much worse (tree building is cost-insensitive) No test, distributed

  23. No test, internal No test, distributed No test, lazy tree C4.5 tree, OST Results with unbalanced class costs • With large test costs, OST = M2 = M4 • C4.5 is much worse (tree building is cost-insensitive) • M3 is worse than M2… (M3 is used in C4.5)

  24. Comparing OST/C4.5 cross 6 datasets • OST always outperforms C4.5

  25. Outline • Introduction • Building Trees with Minimal Total Costs • Testing Strategies • Experiments and Results • Conclusions

  26. Conclusions • New tree building algorithm for minimal costs • Desirable properties • Computationally efficient (similar to C4.5) • Test strategies (OST and batch) are very effective • Can solve many real-world diagnosis problems

  27. Future Work • More intelligent “Batch Test” methods • Consider cost of additional batch test • Optimal sequential batch testbatch 1 = (test1, test 2)batch 2 = (test 3, test 4, test 5), … • Other learning algorithms with minimal total cost • A wrapper that works for any “black box”

More Related