predictive methods n.
Skip this Video
Loading SlideShow in 5 Seconds..
Predictive methods PowerPoint Presentation
Download Presentation
Predictive methods

Loading in 2 Seconds...

play fullscreen
1 / 56

Predictive methods - PowerPoint PPT Presentation

  • Uploaded on

Predictive methods. Understanding customer preferences. Agenda. Introduction to predictive analytics Logistic regression Case study: Japanese car manufacturer exporting in the US Modelling interdependent consumer preferences Causality estimation

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Predictive methods' - irish

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
predictive methods

Predictive methods

Understanding customer preferences

  • Introduction to predictive analytics
    • Logistic regression
  • Case study: Japanese car manufacturer exporting in the US
  • Modelling interdependent consumer preferences
  • Causality estimation
    • Propensity scores to estimate effectiveness of marketing interventions of a pharmaceutical company
  • Web Analytics
predictive methods for marketing
Predictive methods for marketing
  • Predictive methods exploit patterns found on historical data to estimate the probability for a certain individual to make a decision
  • Three main categories:
    • Scoring models -rank customers by their probability of making a decision
    • Descriptive models -categorize customers by their preferences and life style
    • Decision models - describe the relationship between all the elements of a decision
predictive methods applications
Predictive methods applications
  • Marketing planning and campaign optimization
  • Customer relationship management
  • Market basket analysis
  • Customer retention
  • Direct marketing
  • Fraud detection
  • Web click stream analysis
questions to be answered
Questions to be answered
  • What’s the probability of a given customer to purchase a product?
  • How can I categorize the customer base in homogeneous groups?
  • Which potential customer should a promotion be offered to?
  • Which website should I advertise on?
  • Which search keywords should I invest in?
logistic regression
Logistic regression
  • Is used for prediction of the probability of occurrence of an event by fitting data to a logistic curve
  • Makes use of several explanatory variables that can be either numerical or categorical
  • Used to predict dichotomous (0 = event doesn’t occur, 1 = event does occur) or categorical values (0=event a occurs, 1=event b occurs, 2=event c occurs)
logistic regression1
Logistic regression
  • Logistic regression
    • Linear relation between the predictors and the Logistic function
    • P(Yi=1) is the probability of an event to occur
logistic regression2
Logistic regression
  • Logistic curve
    • Input values: any real number
    • Output values:
      • (from 0 to 1)
  • A sample of PhD students are asked to decide whether to stop a research project considered unethical by an animal rights’ group

Dependent variable:

student’s answer

0 = “Stop research”

1 = “Continue research

Explanatory variable:


0 = Female

1 = Male

model summary
Model summary
  • -2 Log likelihood stat: the smaller the number the better the model
  • Cox & Snell and Nagelkerke R Squares: the higher the number the better the model
model summary1
Model summary
  • The model predicts that the probability for a woman to decide to continue research is 30% while for a man is 60%
  • The ODDS of deciding to continue research are 3.4 times higher for men than for women
  • Classify subjects with respect to what decision we think they will make ex. predict that men will continue research and women will stop
    • 66% of correct predictions, false positive rate 41%, false negative 30%
case study a japanese car manufacturer exporting in the usa
Case study: a Japanese car manufacturer exporting in the USA
  • Modelling consumer preferences
    • What drives US consumers to purchase a Japanese car versus a non Japanese one?
    • What is the probability for an individual to buy a Japanese car?
    • Which people should be targeted ?
    • Are consumer preferences interdependent?
the data
The data
  • Purchase of mid-sized cars (1 = Japanese, 0 = non Japanese)
    • Difference in price (k$),Difference in options (k$)
    • Age of buyer (years)
    • Annual income of buyer ($)
    • Ethnic origins(1=Asian, 0=non-Asian)
    • Education(1=College, 0= below College)
    • Latitude & Longitude
interpreting model s output
Interpreting model’s output
  • Young and Asian people from south west of the city are more likely to buy a Japanese car
  • Does price coefficient make sense ?
  • People with higher education are less likely to buy a Japanese car
model diagnostics
Model diagnostics

Very strong in predicting Japanese cars purchases

but weak in predicting non-Japanese cars purchases

model diagnostics1
Model diagnostics

Area under the curve:


Gini coefficient = 0.5

modelling interdependent consumer preferences
Modelling interdependent consumer preferences
  • An individual preference can be influenced by preferences of others
    • Psychological benefits
    • Social identification
  • People who identify with a particular group often adopt the preferences of the group
  • Incorporate these dependences into the model
looking at models residuals
Looking at models’ residuals

The residuals represents what is not explained by the model


Group 1

looking at models residuals1
Looking at models’ residuals
  • The presence of interdependent networks create preferences that are mutually dependent resulting in covariance matrix with non zero off diagonal elements.
  • Residuals of people belonging to the same group are positively correlated
    • Correlation (Residual Person 1 , residual Person 2)>0
looking at models residuals2
Looking at models’ residuals
  • Creating groups by splitting individuals into neighbours
    • AGE (16-25 , 26-40, over 40)
    • Demographic (Combination of Age, education, ethnic)
    • Geographic influence (Postal code)
  • Analyze average residuals for each group
including customer interdependences
Including customer interdependences

Adding a group dummy which is equal to 1 if the individual fits in the group and 0 otherwise


Asian – 26-40

Group dummy

Group dummy

spurious statistics
Spurious statistics
  • A high correlation between sales and TV could mean:
    • Either media causes sales
    • or sales causes media
    • or a third variable causes both sales and TV




What is the truth?

taking causality seriously
Taking causality seriously
  • Using least squares regressions and data mining could lead to unreliable results:
    • Polishing the Ferraris rather than the Jeeps can cause Ferraris to win more races than Jeeps
  • Propensity scores to estimate the casual effects of marketing interventions
propensity scores
Propensity scores
  • Pharmaceutical company is to promote a life-style drug and evaluate the market
  • The scope is to rank a list of doctors according to their likelihood to prescribe a certain drug
  • Marketing interventions:
    • Visiting a doctor describing the drug
    • Dining the doctor at a nice restaurant
    • Offering free samples of the drug
impact of marketing interventions
Impact of marketing interventions
  • The marketing interventions are designed to increase the number of prescriptions written by the doctors
  • But how to quantify the number of prescriptions generated by the intervention?
    • Compare the number of prescription written after been visited with the number of prescriptions that would have been written without the intervention
prediction vs causal estimation
Prediction VS causal estimation


Doctor A

15 scripts


10 scripts

15 scripts

No visit



prediction vs causal estimation1
Prediction VS causal estimation


Doctor B

5 scripts


1 script

2 scripts

No visit



propensity scores1
Propensity scores
  • We should make investment decisions comparing the expected returns when making the investment and when not making the investment
  • Short stop: lack of data
    • For a doctor who is visited by the salesperson, the number of scripts after the visit is measureable but we cannot measure the number of scripts written if the doctor wouldn’t have been visited
how do we do it
How do we do it?
  • Finding clones: create matched pairs of doctors where one member of the pair has been exposed to the intervention and the other has not. The doctors must be “identical” or very similar before the time of exposure
  • Clones are found through the propensity score method
the data1
The data
  • 250000 Doctors and for each one:
    • Number of prescription written at time 1
    • Number of prescription at time 2
    • Was the doctor visited by the salesperson from time 1 to time 2?(Y/N)
    • Doctor’s characteristics: specialty, region, date of degree and more than a hundred of such factors
the data2
The data

Doctor 1

Doctor 2

Doctor 3

Doctor 4

cloning by propensity score
Cloning by propensity score
  • Propensity score (Doctor A)= predicted probability of logistic regression
  • Dependent variable (1=“Visited by salesperson, 0=“Not Visited”
  • Factors: Characteristics
the data3
The data

Doctor 1

Doctor 2

Doctor 3

Doctor 4

What’s the causal effect ?

campaign execution
Campaign execution
  • Doctors are ranked accordingly to their causal increase in prescriptions
  • The first % of doctors in the list are then contacted and/or offered free samples
    • Priority is given to the doctors who have not been contacted yet
  • The % is chosen accordingly to company’s budget
propensity scores on e commerce
Propensity scores on e-commerce
  • Online store with membership database
    • Estimate the effectiveness of promotional mails and identify people to be targeted
  • Calculate the ROI of a free shipping initiative
    • Individuals receive free shipping but pay an annual fee for it
  • A pharmaceutical company offering on its website a coupon to encourage trial use of a drug
purchase path
Purchase path

Custom Attribution Algorithms

Mathematical Attribution Models

Rules Based Attribution

Purchase Path™


Even Attribution

Last Click

purchase path1
Purchase path

In this example we drilled into the AdWords > AdWords path to see the specific ads that were clicked on en route to purchase.


To further increase the accuracy of attribution, an advertiser is able to choose the maximum log window.

consumer decision
Consumer decision
  • Build a model to predict consumer decisions
    • Using data on influencers that we are able to track and measure
    • Representing data on influencers that we can’t yet track and measure - our uncertainty - through a statistical distribution
  • Calibrate the model on observed consumer decisions
    • Purchase - yes/no , Purchase size - dollar volume, # of units
    • Repeat purchases, word of mouth
decision model
Decision model

Consumer’s decision is a function of

Our communications, Consumer Search, Competitor communications,

Other sources

Paid Search, Banner Ads, e-mail, On-site Promotions, Comparison shopping, Affiliate ads

Site visits to us


data availability
Data availability

We are barely scratching the surface of the potential of path data with the attribution models!!!

  • Introduction to predictive analytics
    • Logistic regression
  • Case study: Japanese car manufacturer exporting in the USA
  • Modelling interdependent consumer preferences
  • Causality estimation
    • Propensity scores to estimate effectiveness of marketing interventions of a pharmaceutical company
  • Web Analytics