Lecture 10
This presentation is the property of its rightful owner.
Sponsored Links
1 / 18

Lecture 10 PowerPoint PPT Presentation


  • 44 Views
  • Uploaded on
  • Presentation posted in: General

Lecture 10. MARK2039 Summer 2006 George Brown College Wednesday 9-12. Assignment 8: Geocoding example. Example: A retailer has the following information: Name and address of its customers Address of its stores Stats Can Information

Download Presentation

Lecture 10

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Lecture 10

Lecture 10

MARK2039

Summer 2006

George Brown College

Wednesday 9-12


Assignment 8 geocoding example

Assignment 8: Geocoding example

  • Example:

    • A retailer has the following information:

      • Name and address of its customers

      • Address of its stores

      • Stats Can Information

    • As a marketer, how would you intelligently use this information

      • Get Postal codes of customers and stores

      • Get geocodes(latitude and longitude numbers of each postal code)

      • Calculate distance between each customer and neares store

      • Create trading area around store to determine relevant customers for store

      • Identify best stores and calculate demographics of best stores vs. the remaining stores

      • Use above learning to either promote non performing stores with similar customer demographic makeup of best stores

      • Use above info to determine where to open up or perhaps close stores


Assignment 8

Assignment 8

  • Why do we look at correlation analysis as our first statistical exercise in the data mining process

  • Allows us to initially use statistics as a prescreen tool in eliminating variables from the data mining exercise


Assignment 81

Assignment 8

  • Give me an example of a correlation table of 5 variables where two variables aresignificant and three variables are not significant. Provide correlation values that support your results


Recapping from last week

Recapping from last week

  • Geocoding

    • What are key things to think of.

      • Look at answer from two slides ago.Geo coding gives us numbers to calculate distance between two postal codes

  • More Material on correlation analysis

  • How do EDA reports tie into the correlation analysis

    • They are trend-like reports which demonstrate why a given variable has a strong relationship with the objective function.

  • How should we present the final results of a model?

How is the above derived?

From the partial R2 of each variable divided by the total R2 of the equation.


Notion of lift

Notion of Lift

  • What is Lift: the performance of a group relative to the performance of the benchmark

  • Examples:

Untargetted/

Targetted/

Type of Activity

Benchmark

Challenger

Lift

Acquisition Campaign

Response Rate

1%

2%

200.

Retention Campaign

Churn Rate

15%

25%

166

Credit Card Loss Rate

5%

8%

160

Product Affinity Rate

10%

30%

300

The targetted group represents those names as determined by a

data mining tool such as a predictive model.


Notion of lift1

Notion of Lift

  • Examples of cases where lift is below 100

Untargetted/

Targetted/

Type of Activity

Benchmark

Challenger

Lift

Acquisition

Campaign

Response Rate

1%

.5%

50

Retention

Campaign

Churn Rate

15%

10%

66

Credit Card

Loss Rate

5%

2%

40

Product Affinity

Rate

10%

6%

60


Validating the model example of a gains chart

Validating the Model: Example of a Gains Chart

  • Revenue per order is $60.

  • Cost of 1 mail piece is $.855

  • Benefits of modelling are the foregone promotion costs by promoting fewer names to achieve a given # of orders at a higher response rate.

  • Listed below are the hard numbers that might comprise a lift curve

% of List

Validation

Cum.

Cum. %

Cum.

Interval

Benefits

(Ranked by

Mail

Resp.

of all

Lift

ROI

Model

Quantity

Rate

Resp

Score)

0

-

10%

20000

3.50%

23.33%

233

145%

$22799

10

-

20%

40000

3.00%

40%

200

75%

$34200

20

-

30%

60000

2.75%

55%

183

58%

$42750

30

-

40%

80000

2.50%

67%

167

23%

$45600

40

-

50%

100000

2.25%

75%

150

-

12.2%

$42750

.

.

.

90

-

100%

20,0000

1.50%

100%

100

-

58%

$0

How might this be plotted?-in class we saw this as a straight decreasing linear slope if we were plotting interval resp. rate against the deciles. If we plot the Cum % of responders, then the shape would be a parobola type curve with a larger parobola representing a better model. Meanwhile, a steeper slope if we plotted interval response rate against deciles would represent a stronger model.


Validating the model calculating the metrics on the gains charts

Validating the Model: Calculating the metrics on the gains charts.

  • Cum. % of Responders in top 10%:

    • Total Responders: 200000 X 1.5%: 3000

    • # of responders in top 10%:20000X3.5%: 700

    • Cum. % in top 10%: 700/3000: 23%

  • Cum. Lift in top 10%:

    • Average Response Rate: 1.5%

    • Cum. Response Rate in top 10%: 3.5%

    • Cum .Lift: 233


Calculating the metrics on the gains charts

Calculating the metrics on the gainscharts.

  • Interval ROI in 10%-20%

    • # of persons mailed: 20000

    • # of responders in 10%-20%(40%-23.33%)*3000: 500

    • Net revenue: (500*60)-.855*20000: 12900

    • Costs: 17100

    • ROI:(12900/17100): 75%

  • Calculating Benefits Column at 30%:

    • Mailed costs to achieve 1650 responders without modelling:

      • ((.0275*60000)/.015) * .855=94050

    • Mailed costs with modelling=60000*.855=51300

    • Benefits: 94050-51300= $42750


Gains chart examples

Cum. # of Names

Cum. Response

Mailed

Rate

Interval Resp.Rate

Interval Lift

Benefits

Interval ROI

10000

2.50%

20000

2.25%

30000

2.10%

40000

1.80%

.

.

.

.

100000

1%

Gains Chart Examples

1

25%

0

-10%

-55%

$15,000

$25,000

$33,000

$32,000

2.5%

250

2.5%

200

2.5%

1.8%

180

0.9%

90

Assume a mail cost of $1.00 per piece and a revenue per order of $50.00.

IntervalResp.Rate

10,000*0.025=250=2.5%

20,000*0.

Please fill in the blanks for the first 4 rows.


Lift curve with zero model effectiveness

Lift Curve with Zero Model Effectiveness

What does this look like if we plot it on a lift curve

A line rather than a parobola if we plot cum % of responders


Gains chart examples1

Gains Chart Examples

What is the best model?-Model 1

What is the worst model?-Model 4

What are the Model 3 results telling you. –we have some rank ordering all the way down to 70000 names and then the model flattens out-may need a strategy herefor this bottom segment.


Gains chart examples2

Gains Chart Examples

  • In each response model case, answer the following questions:

  • Where would you cutoff be with a budget of $80000 and a cost per piece of $2.00

    • 40000 names

  • Where would you cutoff be if you needed to attain a forecasted order qty of 350.

    • Between 10000 and 20000 names-model 1 and 2, between 20000 and 30000 for model 3 and between 30000 and 40000 formodel 4

  • Where would your optimum cutoff be presuming that budget nor forecasted order model quantities were constraints? 50000-model 1,2, and 60000 for model 3 –it does not matter for model 4


Gains chart examples3

Gains Chart Examples

  • Calculate the Following: -Interval Names Mailed -Cum. Response Rate

  • Assuming a cost per name of $1.50 and revenue perresponder of $75, calculate the interval ROI foreach intervaland modelling benefits for each interval?


Tracking of models

Tracking of Models

  • Two models are used in two campaigns. In campaign A, the overall response rate is 3.5% which is above the breakeven response rate of 2%. In campaign B, the overall response rate is 1.2% which is below the breakeven response rate of 2%. Yet, the model in campaign B is more effective. Explain Why?

Model is rank ordering names quite well for campaign B(1.2% overall) while the better campaign overall(3.5%) exhibits no rank ordering of response rate between deciles.


Chaid

CHAID

  • CHAID” is an acronym for Chi-square Automatic Interaction Detection

  • Produces decision-tree like report

    • Branches and Nodes

  • Non parametric approach

    • Output of routine is a segment or groupas opposed to a score

  • Uses Chi-Square statistics to determine statistically significant breaks

  • Conceptual Interpretation:(Observed-Expected)/Expected


Chaid1

CHAID

What criteria determine the end nodes?


  • Login