Loading in 2 Seconds...

Learning and Making Decisions When Costs and Probabilities are Both Unknown

Loading in 2 Seconds...

- 111 Views
- Uploaded on

Download Presentation
## Learning and Making Decisions When Costs and Probabilities are Both Unknown

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Learning and Making Decisions When Costs and Probabilities are Both Unknown

by B. Zadrozny and C. Elkan

Contents

- Introduce the Problem
- Previous work
- Direct Cost Sensitive Decision Making
- The Dataset
- Estimating Class Membership Probabilities
- Estimating Costs
- Results and Conclusions

Introduction

- Costs/Benefits are the values assigned to classification decisions.
- Cost are often different for different examples
- Often we are interested in the rare class in cost-sensitive learning
- Hence the problem of unbalanced data

Cost Sensitive Decisions

- Each training and test example has associated cost
- General optimal prediction
- Methods differ w.r.t. and
- Previous literature has assumed cost are known in advance and independent of examples.

MetaCost

- Estimation of
- Estimated in training only.
- Estimation of
- Example independent
- Training changes labelling to its optimum
- Learns a classifier to predict labeling of test examples.

Direct Cost-Sensitive-Decision-Making

- Estimation of
- Average of Naïve Bayes and Decision Trees .
- Estimated on training and test sets.
- Estimation of
- Multiple Linear Regression.
- Unbiased estimate using Heckman procedure
- Example dependant.
- Evaluation
- Evaluate against MetaCost and KDD competition results.
- Using a large and difficult dataset, KDD.

MetaCost Implementation

- For evaluation MetaCost is adapted
- Probability class estimates found by simple methods using decision trees
- Cost are made example dependant during training
- Adapted MetaCost vs. DCSDM
- DCSDM uses two models on test example MetaCost one.
- Estimation was made in both training and test examples in DCSDM.

The Data Mining Task

- Data on persons who have donated in the past to a certain charity, KDD '98 competition
- Non-donor and donor labelling based on last campaign
- The task is to choose which donors to ask for new donations
- Training/Test set
- 95,412 records, labelled donors or non-donors and donation amount
- 96,367 unlabelled records from same donation campaign

Data Mining Task cont.

- Cost of soliciting $0.68
- Donations range from $1-200
- 5% donors and 95% non-donors
- Very low response rate and varying donations make hard to beat soliciting to everyone.
- The dataset set is hard
- Already been filtered to be a reasonable set of prospects
- The task is to improve upon the unknown method that produced the set

Applied DCSDM

- In KDD we will change C(i,j,x) to B(i,j,x)
- Costs become benefits
- B(1,1,x) is example dependant
- Replaced by a constant by previous literature

Optimal policy

- The expected benefit of not soliciting, i = 0
- Expected benefit of soliciting, i = 1
- Optimal policy:

Optimal decisions require

- Class sizes may be highly unbalanced
- Two methods proposed
- Decision Trees - Smoothing Curtailment
- Naïve Bayes - Binning

Problems w. Decision Trees

- Decision trees assign as a score to each leaf the raw training frequency of that leaf.
- High Bias
- Decision trees growing methods try to make leaves Homogeneous. p’s tend to be over or under estimates
- High Variance
- When n is small p not to be trusted.

Smoothing

- Pruning is no good for us.
- To make the estimates less extreme lets replace:
- b – base rate, m – heuristic value (smoothing strength)
- Effect
- where k, n small p’ essentially just base rate.
- If k, n larger then p’ is ‘combination’ of base rate and original score

Curtailment

- What if the leaves have enough training examples to be statistically reliable?
- Then smoothing seems to be unnecessary.
- Curtailment searches through the tree and removes nodes where n < v.
- V chosen either through cross-validation, or a heuristic, like b.v = 10.

Naïve Bayes Classifiers

- Naïve Bayes
- Assumes that within any class the attribute values are independent variables.
- This assumption gives inaccurate probability estimates
- But, attributes tend to be positively correlated so naïve Bayes estimates tend to be too extreme, i.e. close to zero or one.
- So, they do rank examples well:

Calibrating Naïve Bayes Scores

- The Histogram method:
- Sort training examples by n.b. scores
- Divide sorted set into b subsets of equal size, called bins
- For each bin compute lower and upper boundary n.b. scores
- Given a new data point x
- Calculate and find the associated bin
- Let = fraction of positive training examples in that bin

Averaging Probability Estimates

- If probability estimates are partially uncorrelated then it follows that averaging these estimates will reduce their variance.
- Assuming all estimates have the same variance the average estimate will have a variance given by:

The individual classifier variances

The number of classifiers

The correlation factor among all classifiers

Cost and Probability

- Good Decisions
- Estimating Cost well is more important than estimating probabilities.
- Why?
- Relative variation of cost across different examples is much greater than the relative variation of probabilities
- Probability
- Estimating Donation probability is difficult.
- Estimating donation amount are easier because past amount are excellent predictor of future amounts.

Training and Test data

- Two random process
- Donate or not to.
- How much to donate?

Donation Amount.

- Method used for estimating donation amount is called as Multiple Linear regression (MLR).

Multiple Linear Regression

Two attributes are used

- lastgift : dollar amount of most recent gift.
- ampergift : average gift amount in response to the last 22 promotions
- Linear Regression equation is used to estimate donation amount.
- 46 of 4843 donations recorded have donation amount more than $50.
- Donors that have donated at most $50 are used as input for linear regression.

Problem of Sample Selection Bias

- Reasoning outside your learning space.
- Donation Amount
- Estimating Donation Amount
- Any donation estimator is learned on the basis of people who actually donated.
- This estimator is applied to different population consisting of donors and non-donors.

Solution to Sample Selection Bias

- Heckman’s procedure
- Estimate conditional probability p( j=1 | x) using linear probit model.
- Estimate y(x) on training dataset for which j (x) = 1

by including a transformation for each x using the estimated values of conditional probability.

- Their own procedure
- conditional probability is learned using decision tree or Naïve bayes classifier.
- These probability estimates are added as additional attributes by estimating y(x).

Experimental Results Interpretation

- With Heckman
- profits on test set increases by $484, in all probability estimation methods.
- Systematic improvement indicates that Heckman’s procedure solves the problem of Sample Selection Bias
- Meta cost
- Best result of Meta cost is $14113.
- Best result of Direct cost sensitive method is $15329.
- On an average, profit achieved in Meta Cost on test set is $1751 lower than the profit achieved in case of direct cost-sensitive decision making.

Statistical Significance of Results

- 4872 donors in fixed test set
- Average donation of $15.62
- Different Test set drawn randomly from same probability distribution would expect a standard deviation of sqrt(4872)
- Fluctuation will cause a change of about $1090.

sqrt(4872) * 15.62 = $1090.

- Profit Difference between two methods less than $1090 is not significant.

Conclusions

- Cost sensitive learning is better than Meta cost.
- Provides solution to fundamental problem of cost being example dependent.
- Identify and solves the problem of Sample Selection Bias for KDD’98 dataset

Download Presentation

Connecting to Server..