Lecture 10

1 / 18

# Lecture 10 - PowerPoint PPT Presentation

Lecture 10. MARK2039 Summer 2006 George Brown College Wednesday 9-12. Assignment 8: Geocoding example. Example: A retailer has the following information: Name and address of its customers Address of its stores Stats Can Information

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Lecture 10' - rafael-kim

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Lecture 10

MARK2039

Summer 2006

George Brown College

Wednesday 9-12

Assignment 8: Geocoding example
• Example:
• A retailer has the following information:
• Name and address of its customers
• Stats Can Information
• As a marketer, how would you intelligently use this information
• Get Postal codes of customers and stores
• Get geocodes(latitude and longitude numbers of each postal code)
• Calculate distance between each customer and neares store
• Create trading area around store to determine relevant customers for store
• Identify best stores and calculate demographics of best stores vs. the remaining stores
• Use above learning to either promote non performing stores with similar customer demographic makeup of best stores
• Use above info to determine where to open up or perhaps close stores
Assignment 8
• Why do we look at correlation analysis as our first statistical exercise in the data mining process
• Allows us to initially use statistics as a prescreen tool in eliminating variables from the data mining exercise
Assignment 8
• Give me an example of a correlation table of 5 variables where two variables aresignificant and three variables are not significant. Provide correlation values that support your results
Recapping from last week
• Geocoding
• What are key things to think of.
• Look at answer from two slides ago.Geo coding gives us numbers to calculate distance between two postal codes
• More Material on correlation analysis
• How do EDA reports tie into the correlation analysis
• They are trend-like reports which demonstrate why a given variable has a strong relationship with the objective function.
• How should we present the final results of a model?

How is the above derived?

From the partial R2 of each variable divided by the total R2 of the equation.

Notion of Lift
• What is Lift: the performance of a group relative to the performance of the benchmark
• Examples:

Untargetted/

Targetted/

Type of Activity

Benchmark

Challenger

Lift

Acquisition Campaign

Response Rate

1%

2%

200.

Retention Campaign

Churn Rate

15%

25%

166

Credit Card Loss Rate

5%

8%

160

Product Affinity Rate

10%

30%

300

The targetted group represents those names as determined by a

data mining tool such as a predictive model.

Notion of Lift
• Examples of cases where lift is below 100

Untargetted/

Targetted/

Type of Activity

Benchmark

Challenger

Lift

Acquisition

Campaign

Response Rate

1%

.5%

50

Retention

Campaign

Churn Rate

15%

10%

66

Credit Card

Loss Rate

5%

2%

40

Product Affinity

Rate

10%

6%

60

Validating the Model: Example of a Gains Chart
• Revenue per order is \$60.
• Cost of 1 mail piece is \$.855
• Benefits of modelling are the foregone promotion costs by promoting fewer names to achieve a given # of orders at a higher response rate.
• Listed below are the hard numbers that might comprise a lift curve

% of List

Validation

Cum.

Cum. %

Cum.

Interval

Benefits

(Ranked by

Mail

Resp.

of all

Lift

ROI

Model

Quantity

Rate

Resp

Score)

0

-

10%

20000

3.50%

23.33%

233

145%

\$22799

10

-

20%

40000

3.00%

40%

200

75%

\$34200

20

-

30%

60000

2.75%

55%

183

58%

\$42750

30

-

40%

80000

2.50%

67%

167

23%

\$45600

40

-

50%

100000

2.25%

75%

150

-

12.2%

\$42750

.

.

.

90

-

100%

20,0000

1.50%

100%

100

-

58%

\$0

How might this be plotted?-in class we saw this as a straight decreasing linear slope if we were plotting interval resp. rate against the deciles. If we plot the Cum % of responders, then the shape would be a parobola type curve with a larger parobola representing a better model. Meanwhile, a steeper slope if we plotted interval response rate against deciles would represent a stronger model.

• Cum. % of Responders in top 10%:
• Total Responders: 200000 X 1.5%: 3000
• # of responders in top 10%:20000X3.5%: 700
• Cum. % in top 10%: 700/3000: 23%
• Cum. Lift in top 10%:
• Average Response Rate: 1.5%
• Cum. Response Rate in top 10%: 3.5%
• Cum .Lift: 233
Calculating the metrics on the gainscharts.
• Interval ROI in 10%-20%
• # of persons mailed: 20000
• # of responders in 10%-20%(40%-23.33%)*3000: 500
• Net revenue: (500*60)-.855*20000: 12900
• Costs: 17100
• ROI:(12900/17100): 75%
• Calculating Benefits Column at 30%:
• Mailed costs to achieve 1650 responders without modelling:
• ((.0275*60000)/.015) * .855= 94050
• Mailed costs with modelling=60000*.855= 51300
• Benefits: 94050-51300= \$42750

Cum. # of Names

Cum. Response

Mailed

Rate

Interval Resp.Rate

Interval Lift

Benefits

Interval ROI

10000

2.50%

20000

2.25%

30000

2.10%

40000

1.80%

.

.

.

.

100000

1%

Gains Chart Examples

1

25%

0

-10%

-55%

\$15,000

\$25,000

\$33,000

\$32,000

2.5%

250

2.5%

200

2.5%

1.8%

180

0.9%

90

Assume a mail cost of \$1.00 per piece and a revenue per order of \$50.00.

IntervalResp.Rate

10,000*0.025=250=2.5%

20,000*0.

Please fill in the blanks for the first 4 rows.

Lift Curve with Zero Model Effectiveness

What does this look like if we plot it on a lift curve

A line rather than a parobola if we plot cum % of responders

Gains Chart Examples

What is the best model?-Model 1

What is the worst model?-Model 4

What are the Model 3 results telling you. –we have some rank ordering all the way down to 70000 names and then the model flattens out-may need a strategy herefor this bottom segment.

Gains Chart Examples
• In each response model case, answer the following questions:
• Where would you cutoff be with a budget of \$80000 and a cost per piece of \$2.00
• 40000 names
• Where would you cutoff be if you needed to attain a forecasted order qty of 350.
• Between 10000 and 20000 names-model 1 and 2, between 20000 and 30000 for model 3 and between 30000 and 40000 formodel 4
• Where would your optimum cutoff be presuming that budget nor forecasted order model quantities were constraints? 50000-model 1,2, and 60000 for model 3 –it does not matter for model 4
Gains Chart Examples
• Calculate the Following: -Interval Names Mailed -Cum. Response Rate
• Assuming a cost per name of \$1.50 and revenue perresponder of \$75, calculate the interval ROI foreach intervaland modelling benefits for each interval?
Tracking of Models
• Two models are used in two campaigns. In campaign A, the overall response rate is 3.5% which is above the breakeven response rate of 2%. In campaign B, the overall response rate is 1.2% which is below the breakeven response rate of 2%. Yet, the model in campaign B is more effective. Explain Why?

Model is rank ordering names quite well for campaign B(1.2% overall) while the better campaign overall(3.5%) exhibits no rank ordering of response rate between deciles.

CHAID
• CHAID” is an acronym for Chi-square Automatic Interaction Detection
• Produces decision-tree like report
• Branches and Nodes
• Non parametric approach
• Output of routine is a segment or groupas opposed to a score
• Uses Chi-Square statistics to determine statistically significant breaks
• Conceptual Interpretation:(Observed-Expected)/Expected
CHAID

What criteria determine the end nodes?