1 / 33

Learning and Making Decisions When Costs and Probabilities are Both Unknown

Learning and Making Decisions When Costs and Probabilities are Both Unknown by B. Zadrozny and C. Elkan. Contents. Introduce the Problem Previous work Direct Cost Sensitive Decision Making The Dataset Estimating Class Membership Probabilities Estimating Costs Results and Conclusions.

Download Presentation

Learning and Making Decisions When Costs and Probabilities are Both Unknown

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning and Making Decisions When Costs and Probabilities are Both Unknown by B. Zadrozny and C. Elkan

  2. Contents • Introduce the Problem • Previous work • Direct Cost Sensitive Decision Making • The Dataset • Estimating Class Membership Probabilities • Estimating Costs • Results and Conclusions

  3. Introduction • Costs/Benefits are the values assigned to classification decisions. • Cost are often different for different examples • Often we are interested in the rare class in cost-sensitive learning • Hence the problem of unbalanced data

  4. Cost Sensitive Decisions • Each training and test example has associated cost • General optimal prediction • Methods differ w.r.t. and • Previous literature has assumed cost are known in advance and independent of examples.

  5. MetaCost • Estimation of • Estimated in training only. • Estimation of • Example independent • Training changes labelling to its optimum • Learns a classifier to predict labeling of test examples.

  6. Direct Cost-Sensitive-Decision-Making • Estimation of • Average of Naïve Bayes and Decision Trees . • Estimated on training and test sets. • Estimation of • Multiple Linear Regression. • Unbiased estimate using Heckman procedure • Example dependant. • Evaluation • Evaluate against MetaCost and KDD competition results. • Using a large and difficult dataset, KDD.

  7. MetaCost Implementation • For evaluation MetaCost is adapted • Probability class estimates found by simple methods using decision trees • Cost are made example dependant during training • Adapted MetaCost vs. DCSDM • DCSDM uses two models on test example MetaCost one. • Estimation was made in both training and test examples in DCSDM.

  8. The Data Mining Task • Data on persons who have donated in the past to a certain charity, KDD '98 competition • Non-donor and donor labelling based on last campaign • The task is to choose which donors to ask for new donations • Training/Test set • 95,412 records, labelled donors or non-donors and donation amount • 96,367 unlabelled records from same donation campaign

  9. Data Mining Task cont. • Cost of soliciting $0.68 • Donations range from $1-200 • 5% donors and 95% non-donors • Very low response rate and varying donations make hard to beat soliciting to everyone. • The dataset set is hard • Already been filtered to be a reasonable set of prospects • The task is to improve upon the unknown method that produced the set

  10. Applied DCSDM • In KDD we will change C(i,j,x) to B(i,j,x) • Costs become benefits • B(1,1,x) is example dependant • Replaced by a constant by previous literature

  11. Optimal policy • The expected benefit of not soliciting, i = 0 • Expected benefit of soliciting, i = 1 • Optimal policy:

  12. Optimal decisions require • Class sizes may be highly unbalanced • Two methods proposed • Decision Trees - Smoothing Curtailment • Naïve Bayes - Binning

  13. Problems w. Decision Trees • Decision trees assign as a score to each leaf the raw training frequency of that leaf. • High Bias • Decision trees growing methods try to make leaves Homogeneous. p’s tend to be over or under estimates • High Variance • When n is small p not to be trusted.

  14. Smoothing • Pruning is no good for us. • To make the estimates less extreme lets replace: • b – base rate, m – heuristic value (smoothing strength) • Effect • where k, n small p’ essentially just base rate. • If k, n larger then p’ is ‘combination’ of base rate and original score

  15. Smoothed Scores

  16. Curtailment • What if the leaves have enough training examples to be statistically reliable? • Then smoothing seems to be unnecessary. • Curtailment searches through the tree and removes nodes where n < v. • V chosen either through cross-validation, or a heuristic, like b.v = 10.

  17. Curtailed Tree

  18. Curtailed Scores

  19. Naïve Bayes Classifiers • Naïve Bayes • Assumes that within any class the attribute values are independent variables. • This assumption gives inaccurate probability estimates • But, attributes tend to be positively correlated so naïve Bayes estimates tend to be too extreme, i.e. close to zero or one. • So, they do rank examples well:

  20. Calibrating Naïve Bayes Scores • The Histogram method: • Sort training examples by n.b. scores • Divide sorted set into b subsets of equal size, called bins • For each bin compute lower and upper boundary n.b. scores • Given a new data point x • Calculate and find the associated bin • Let = fraction of positive training examples in that bin

  21. Averaging Probability Estimates • If probability estimates are partially uncorrelated then it follows that averaging these estimates will reduce their variance. • Assuming all estimates have the same variance the average estimate will have a variance given by: The individual classifier variances The number of classifiers The correlation factor among all classifiers

  22. Estimating Donation Amount • Solicit the person based on policy. • Policy is estimated donation amount

  23. Cost and Probability • Good Decisions • Estimating Cost well is more important than estimating probabilities. • Why? • Relative variation of cost across different examples is much greater than the relative variation of probabilities • Probability • Estimating Donation probability is difficult. • Estimating donation amount are easier because past amount are excellent predictor of future amounts.

  24. Training and Test data • Two random process • Donate or not to. • How much to donate? Donation Amount. • Method used for estimating donation amount is called as Multiple Linear regression (MLR).

  25. Multiple Linear Regression Two attributes are used • lastgift : dollar amount of most recent gift. • ampergift : average gift amount in response to the last 22 promotions • Linear Regression equation is used to estimate donation amount. • 46 of 4843 donations recorded have donation amount more than $50. • Donors that have donated at most $50 are used as input for linear regression.

  26. Problem of Sample Selection Bias • Reasoning outside your learning space. • Donation Amount • Estimating Donation Amount • Any donation estimator is learned on the basis of people who actually donated. • This estimator is applied to different population consisting of donors and non-donors.

  27. Donation Amount and Probability Estimates are Negatively Correlated

  28. Solution to Sample Selection Bias • Heckman’s procedure • Estimate conditional probability p( j=1 | x) using linear probit model. • Estimate y(x) on training dataset for which j (x) = 1 by including a transformation for each x using the estimated values of conditional probability. • Their own procedure • conditional probability is learned using decision tree or Naïve bayes classifier. • These probability estimates are added as additional attributes by estimating y(x).

  29. Experimental Results Direct cost sensitive Decision Making Meta cost

  30. Experimental Results Interpretation • With Heckman • profits on test set increases by $484, in all probability estimation methods. • Systematic improvement indicates that Heckman’s procedure solves the problem of Sample Selection Bias • Meta cost • Best result of Meta cost is $14113. • Best result of Direct cost sensitive method is $15329. • On an average, profit achieved in Meta Cost on test set is $1751 lower than the profit achieved in case of direct cost-sensitive decision making.

  31. Statistical Significance of Results • 4872 donors in fixed test set • Average donation of $15.62 • Different Test set drawn randomly from same probability distribution would expect a standard deviation of sqrt(4872) • Fluctuation will cause a change of about $1090. sqrt(4872) * 15.62 = $1090. • Profit Difference between two methods less than $1090 is not significant.

  32. Conclusions • Cost sensitive learning is better than Meta cost. • Provides solution to fundamental problem of cost being example dependent. • Identify and solves the problem of Sample Selection Bias for KDD’98 dataset

  33. Questions?

More Related