slide1 l.
Download
Skip this Video
Download Presentation
Some slide material taken from: SAS Education

Loading in 2 Seconds...

play fullscreen
1 / 39

Some slide material taken from: SAS Education - PowerPoint PPT Presentation


  • 172 Views
  • Uploaded on

DSCI 4520/5240 (DATA MINING). DSCI 4520/5240 Lecture 6 Regression Modeling. Some slide material taken from: SAS Education. Objectives. Overview of Linear Regression Models The Stepwise Procedure Overview of Logistic Regression Models Interpretation of Logistic Regression coefficients.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Some slide material taken from: SAS Education' - errol


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

DSCI 4520/5240 (DATA MINING)

DSCI 4520/5240 Lecture 6

Regression Modeling

Some slide material taken from: SAS Education

objectives
Objectives
  • Overview of Linear Regression Models
  • The Stepwise Procedure
  • Overview of Logistic Regression Models
  • Interpretation of Logistic Regression coefficients
data mining at work telstra mobile combats churn with sas
DATA MINING AT WORK:Telstra Mobile Combats Churn with SAS®

As Australia's largest mobile service provider, Telstra Mobile is reliant on highly effective churn management.

In most industries the cost of retaining a customer, subscriber or client is substantially less than the initial cost of obtaining that customer. Protecting this investment is the essence of churn management. It really boils down to understanding customers -- what they want now and what they're likely to want in the future, according to SAS.

"With SAS Enterprise Miner we can examine customer behaviour on historical and predictive levels, which can then show us what 'group' of customers are likely to churn and the causes," says Trish Berendsen, Telstra Mobile's head of Customer Relationship Management (CRM).

data mining in the telecom industry ringaling telecom
Data Mining in the telecom industry: RingaLing Telecom

Until recently, RingaLing, a large public telecommunications company, held the monopoly for the entire telecommunications market. Now privatized and without the advantages of the monopolistic situation, competition is coming from consortiums of foreign denationalized companies, new entrants, and cable companies who are offering very tempting proposals to consumers.

RingaLing is losing 40,000 customers every month, and only winning a few of those customers back. They are painfully aware that the cost of keeping an existing customer can be up to ten times lower than the cost of acquiring a new one. They desperately need a cost effective way of decreasing customer churn rate.

data mining in the telecom industry ringaling telecom5
Data Mining in the telecom industry: RingaLing Telecom

The CEO of RingaLing is worried by his company's falling share price and the rate at which customers are leaving. Despite substantial general price reductions, loyal customers are leaving by the thousands. The CEO gives the marketing director six months to bring the situation under control.

Subsequently, Martin Miner, a young and promising marketing analyst is summoned to the Marketing Director’s office and asked to solve this problem. Working with the IT department, Martin explains that he needs a way to be able to access and analyze all the company data.

getting the lines crossed the difficulty of data access
Getting the lines crossed -- the difficulty of data access

RingaLing has over 50 million customer files in addition to billions of call records, and data from both the customer service and the billing departments. They also have some competitive information, including competitor pricing policies and market share by area. The data resides in different offices across the globe, in 12 different file formats, and on seven different platforms.

Martin decides the first step is the development of a SAS data warehouse, enabling him to have access to all the data he needs in one place. Using SAS Institute's Rapid Warehousing Methodology, this takes only a matter of weeks. Thanks to the data warehouse, the quality and consistency of the data is much better. All the data is summarized and grouped in a way that makes it easy to get a singular view of individual RingaLing customers. Even if a customer has multiple accounts, for example a mobile phone as well as a fixed phone, the database is smart enough to know that this is one customer instead of two.

the data mining process
The Data Mining Process

Now, Martin is ready to start mining.

Initially, he is interested in the probability that a given customer will cancel their contract within the next year and the controllable variables that might influence them. If he knows this, he will be able to manage the churn rate (the rate at which customers cancel and subscribe).

A sample of the data is taken using the sampling capabilities of SAS software. This ensures that the 10% sample accurately represents the customer base as a whole.

Initially, Martin plots the probability that a customer will leave over the next year, and finds it to be fairly consistent and somewhat depressing. Using geographical visualization he is able to highlight certain areas where the customers are most likely to leave, which appear to be around certain major cities. He then decides to explore the data using a 3D scatter plot to see the relation between the size of bill, area they live in, and likelihood to leave. He notes that most of the customers at risk for leaving tend to be those who have the highest and the lowest bills.

the data mining process8
The Data Mining Process

Martin decides to integrate some more data. First, he looks at his company's competitors and areas in which they provide services, as well as the range of services provided (e.g. business, domestic, international). He then looks at their pricing policy and product details.

Martin now uses several data mining techniques to model this information so he can predict whether a customer is likely to leave or not. He uses decision trees to eliminate variables which are not important, and surprisingly finds that income plays a much less important role than he would have expected. Having identified several key variables, he then uses neural networks to build a model which will predict whether a person is likely to leave or not, given their characteristics.

Following this, he identifies that the people who are likely to leave are typically either those who have very large bills or very low ones. It appears that those who make international calls are more likely to leave. Another point is that people who make frequent calls to the same numbers are more prone to leaving.

At this stage he decides to review his findings and concludes that he needs to introduce more data on the pricing policies of the competitors for different types of products.

the data mining process9
The Data Mining Process
  • Martin then creates a new model and suggests the following strategies:
  • Special tariff for frequent international callers, enabling them to pay a lower cost per time unit.
  • Low usage tariffs, giving a lower fixed price for the line rental and then possibly higher costs for the actual calls.
  • High user tarriffs, giving higher fixed price for line rental and lower costs for the actual calls.
  • Special prices for frequently called numbers

Following the implementation of these strategies, there was a drastic reduction in the number of customers leaving. After only three months, customer churn fell from 40,000 to only 10,000. On top of this, they were able to target products to customers who fit certain optimal profiles. This resulted in a gain of another 20,000 customers a month.

Martin Miner has played a key role in this process and is rewarded with a large bonus and pay raise. He is subsequently promoted, and it becomes obvious that the marketing director is grooming him to become his replacement.

simple linear regression
Simple Linear Regression

Model with one predictor variable:

Y = b0 + b1X + residual

Y

unexplained part of Y (error)

explained part of Y

The fitted line is a straight line, since the model is of 1st order:

X

Ŷ = b0 + b1X

quadratic regression

Y

X

Quadratic Regression

Quadratic Regression model:

Y = b0 + b1X + b2X2

polynomial regression

Y

X

Polynomial Regression

3rd-order Regression model:

Y = b0 + b1X + b2X2 + b3X3

indicator dummy variables

For example:

if female

0 if male

I1 =

Indicator (Dummy) Variables

Dummy, or indicator, variables allow for the inclusion of qualitative variables in the model

indicator dummy variables15

Y

X

Indicator (Dummy) Variables

Model with Indicator variable:

Y = b0 + b1X + b2I

  • Rewrite the model as:
  • For I = 0,
  • For I= 1,

Y = b0 + b1X

Y = (b0+ b2) + b1X

indicator variables with interaction

Y

X

Indicator Variables with interaction

Model with Indicator variable:

Y = b0 + b1X + b2I + b3 XI

  • Rewrite the model as:
  • For I = 0,
  • For I= 1,

Y = b0 + b1X

Y = (b0+ b2) + (b1 + b3)X

hypothesis test on the slope of the regression line

Two-Tailed Test

Ho: 1 = 0 (X provides no information)

Ha: 1 ≠ 0 (X does provide information)

Test Statistic:

b1

sb

t =

1

Hypothesis Test on theSlope of the Regression Line

For large data sets, reject Ho if |t| > 2

model assumptions and residual analysis
Model Assumptions and Residual Analysis

Residuals should have…

Residual Plot

^

Y - Y

  • Randomness
  • Constant Variance
  • Normal Distribution

^

Y

residual analysis
Residual Analysis
  • Violation of the constant variance assumption
  • How to fix it: Transformation

Residual Plot

^

Y - Y

^

Y

residual analysis20

Sequence Plot of Residuals

^

Y - Y

Y

Residual Analysis
  • Violation of the randomness assumption
  • How to fix it: Add more predictor variables to explain patterns.
    • In time series data, add lags of Y or X as predictors:
    • Yt-1 , X1t-1 , X1t-2 , X2t-1 , etc.

^

residual analysis21
Residual Analysis

Frequency Histogram of the residuals

  • Violation of the normality assumption
  • How to fix it: Transformation (start with easy transformations, such as Log(Y), then continue with bucket transformations, etc.)
linear versus logistic regression
Linear versus Logistic Regression

Linear Regression

Logistic Regression

Target is an interval variable.

Target is a discrete (binary or ordinal) variable.

Input variables have any measurement level.

Input variables have any measurement level.

Predicted values are the mean of the target variable at the given values of the input variables.

Predicted values are the probability of a particular level(s) of the target variable at the given values of the input variables.

parametric models

E(Y | X=x)

=

g(x;w)

w0 + w1x1 +…+ wpxp)

E(Y | X=x)

p (x)

=

g(x;w)

w2

g-1(

)

w1

Parametric Models

Generalized Linear Model

Training Data

logistic regression models the logit transformation

( )

1 - p

logit(p )

log

p

p

w0 + w1x1 +…+ wpxp

g-1(

)

logit(p)

1.0

p

0.5

0.0

0

Logistic Regression Models:the logit transformation

log(odds)

=

Training Data

changing the odds

( )

( )

( )

1 - p

1 - p

1 - p

log

log

log

p

p

p

w0 + w1x1 +…+ wpxp

´

=

w0 + w1(x1+1)+…+ wpxp

w1+w0+w1x1+…+wpxp

exp(w1)

´

Changing the Odds

=

odds

ratio

Training Data

logistic regression assumption

logit

transformation

Logistic Regression Assumption

Assumption: The logit transformation of the probabilities of the target value results in a linear relationship with the input variables.

changing the odds28

( )

( )

( )

1 - p

1 - p

1 - p

log

log

log

p

p

p

w0 + w1x1 +…+ wpxp

´

=

w0 + w1(x1+1)+…+ wpxp

w1+w0+w1x1+…+wpxp

exp(w1)

´

Changing the Odds

=

odds

ratio

Training Data

slide29

Interpretation of Parameter Estimates

The logit link function provides the most natural interpretation of the estimated coefficients:

  • The odds of a reference event is the ratio of P(event) to P(not event). The estimated coefficient of a predictor is the estimated change in the log of P(event)/P(not event) for each unit change in the predictor, assuming the other predictors remain constant
  • Therefore, the odds ratio coefficients are Multipliers that modify the odds ratio P(event)/P(not event), when a certain predictor variable is increased by one unit
stepwise procedures
Stepwise Procedures

Procedures either choose or eliminate variables, one at a time, in an effort to avoid including variables with either no predictive ability or are highly correlated with other predictor variables

  • Forward selection Add one variable at a time until contribution is insignificant
  • Backward elimination Remove one variable at a time starting with the “worst” until R2 drops significantly
  • Stepwise selection Forward regression with the ability to remove variables that become insignificant
stepwise regression

Include X3

Include X6

Remove X2

(When X5 was inserted into the model

X2 became unnecessary)

Include X2

Remove X7 - it is insignificant

Include X5

Include X7

Stop

Final model includes X3, X5 and X6

Stepwise Regression

An example implementation of the stepwise procedure:

forward selection

Entry Cutoff

Forward Selection

Input p-value

Profit

0.80

Training

Validation

0.50

Step

backward selection

Stay Cutoff

Backward Selection

Input p-value

Profit

0.80

Training

Validation

0.50

Step

stepwise selection

Entry Cutoff

Stay Cutoff

Stepwise Selection

Input p-value

Profit

0.80

Training

Validation

0.50

Step

logistic regression in enterprise miner
Logistic Regression in Enterprise Miner

Refer to our DONOR_RAW data. A Logistic Regression model for TARGET_B was fit (see PR3 assignment for details). Stepwise selection was applied. The data set was split into Training and Validation sets

24.99% of validation data were misclassified, i.e. classified as donor when in fact a non-donor, or vice-versa

Average profit in validation data was $0.26 per person included in that set. This yielded a total profit of $2,280.76 from all persons in the validation set

logistic regression in enterprise miner36
Logistic Regression in Enterprise Miner

The Lift and Response charts below compare the model’s performance using (1) Training and (2) Validation data. The baseline (Lift = 1) corresponds to a 5% response.

interpretation of regression coefficients linear regression
Interpretation of Regression Coefficients:Linear Regression

When the target is continuous (Linear Regression), the standard interpretation of a slope coefficient is as follows:

The slope tells you the change in Y when a particular input X increases by one unit, while all other inputs are kept constant.

Example: If a linear regression coefficient is equal to -2.5, that indicates that when the predictor increases, the target variable decreases (since the coefficient is negative). Specifically, for each unit increase in the predictor variable, the target variable decreases by 2.5 units

interpretation of regression coefficients logistic regression
Interpretation of Regression Coefficients:Logistic Regression

When the target is binary (Logistic Regression), the standard interpretation of an odds ratio coefficient is as follows:

The odds ratio coefficient is a multiplier on the odds ratio for the target event T (=probability for T / probability for non-T) when a particular input X increases by one unit, while all other inputs are kept constant.

Example: If an odds ratio coefficient is equal to 1.05, that indicates that when the predictor increases, the probability for the target event also increases. Specifically, for each unit increase in the predictor variable, the odds ratio P(event/nonevent) gets multiplied by 1.05

suggested readings
Suggested readings
  • Read the SAS GSEM 5.3 text, chapter 5 (pp. 103-134)
  • Read the Sarma text, chapter 6 (pp. 235-304)