DSCI 4520/5240 (DATA MINING). DSCI 4520/5240 Lecture 6 Regression Modeling. Some slide material taken from: SAS Education. Objectives. Overview of Linear Regression Models The Stepwise Procedure Overview of Logistic Regression Models Interpretation of Logistic Regression coefficients.
DSCI 4520/5240 Lecture 6
Some slide material taken from: SAS Education
As Australia's largest mobile service provider, Telstra Mobile is reliant on highly effective churn management.
In most industries the cost of retaining a customer, subscriber or client is substantially less than the initial cost of obtaining that customer. Protecting this investment is the essence of churn management. It really boils down to understanding customers -- what they want now and what they're likely to want in the future, according to SAS.
"With SAS Enterprise Miner we can examine customer behaviour on historical and predictive levels, which can then show us what 'group' of customers are likely to churn and the causes," says Trish Berendsen, Telstra Mobile's head of Customer Relationship Management (CRM).
Until recently, RingaLing, a large public telecommunications company, held the monopoly for the entire telecommunications market. Now privatized and without the advantages of the monopolistic situation, competition is coming from consortiums of foreign denationalized companies, new entrants, and cable companies who are offering very tempting proposals to consumers.
RingaLing is losing 40,000 customers every month, and only winning a few of those customers back. They are painfully aware that the cost of keeping an existing customer can be up to ten times lower than the cost of acquiring a new one. They desperately need a cost effective way of decreasing customer churn rate.
The CEO of RingaLing is worried by his company's falling share price and the rate at which customers are leaving. Despite substantial general price reductions, loyal customers are leaving by the thousands. The CEO gives the marketing director six months to bring the situation under control.
Subsequently, Martin Miner, a young and promising marketing analyst is summoned to the Marketing Director’s office and asked to solve this problem. Working with the IT department, Martin explains that he needs a way to be able to access and analyze all the company data.
RingaLing has over 50 million customer files in addition to billions of call records, and data from both the customer service and the billing departments. They also have some competitive information, including competitor pricing policies and market share by area. The data resides in different offices across the globe, in 12 different file formats, and on seven different platforms.
Martin decides the first step is the development of a SAS data warehouse, enabling him to have access to all the data he needs in one place. Using SAS Institute's Rapid Warehousing Methodology, this takes only a matter of weeks. Thanks to the data warehouse, the quality and consistency of the data is much better. All the data is summarized and grouped in a way that makes it easy to get a singular view of individual RingaLing customers. Even if a customer has multiple accounts, for example a mobile phone as well as a fixed phone, the database is smart enough to know that this is one customer instead of two.
Now, Martin is ready to start mining.
Initially, he is interested in the probability that a given customer will cancel their contract within the next year and the controllable variables that might influence them. If he knows this, he will be able to manage the churn rate (the rate at which customers cancel and subscribe).
A sample of the data is taken using the sampling capabilities of SAS software. This ensures that the 10% sample accurately represents the customer base as a whole.
Initially, Martin plots the probability that a customer will leave over the next year, and finds it to be fairly consistent and somewhat depressing. Using geographical visualization he is able to highlight certain areas where the customers are most likely to leave, which appear to be around certain major cities. He then decides to explore the data using a 3D scatter plot to see the relation between the size of bill, area they live in, and likelihood to leave. He notes that most of the customers at risk for leaving tend to be those who have the highest and the lowest bills.
Martin decides to integrate some more data. First, he looks at his company's competitors and areas in which they provide services, as well as the range of services provided (e.g. business, domestic, international). He then looks at their pricing policy and product details.
Martin now uses several data mining techniques to model this information so he can predict whether a customer is likely to leave or not. He uses decision trees to eliminate variables which are not important, and surprisingly finds that income plays a much less important role than he would have expected. Having identified several key variables, he then uses neural networks to build a model which will predict whether a person is likely to leave or not, given their characteristics.
Following this, he identifies that the people who are likely to leave are typically either those who have very large bills or very low ones. It appears that those who make international calls are more likely to leave. Another point is that people who make frequent calls to the same numbers are more prone to leaving.
At this stage he decides to review his findings and concludes that he needs to introduce more data on the pricing policies of the competitors for different types of products.
Following the implementation of these strategies, there was a drastic reduction in the number of customers leaving. After only three months, customer churn fell from 40,000 to only 10,000. On top of this, they were able to target products to customers who fit certain optimal profiles. This resulted in a gain of another 20,000 customers a month.
Martin Miner has played a key role in this process and is rewarded with a large bonus and pay raise. He is subsequently promoted, and it becomes obvious that the marketing director is grooming him to become his replacement.
Model with one predictor variable:
Y = b0 + b1X + residual
unexplained part of Y (error)
explained part of Y
The fitted line is a straight line, since the model is of 1st order:
Ŷ = b0 + b1X
Ho: 1 = 0 (X provides no information)
Ha: 1 ≠ 0 (X does provide information)
1Hypothesis Test on theSlope of the Regression Line
For large data sets, reject Ho if |t| > 2
Residuals should have…
Y - Y
Y - Y
Frequency Histogram of the residuals
Target is an interval variable.
Target is a discrete (binary or ordinal) variable.
Input variables have any measurement level.
Input variables have any measurement level.
Predicted values are the mean of the target variable at the given values of the input variables.
Predicted values are the probability of a particular level(s) of the target variable at the given values of the input variables.
1 - p
w0 + w1x1 +…+ wpxp
0Logistic Regression Models:the logit transformation
The logit link function provides the most natural interpretation of the estimated coefficients:
Procedures either choose or eliminate variables, one at a time, in an effort to avoid including variables with either no predictive ability or are highly correlated with other predictor variables
Refer to our DONOR_RAW data. A Logistic Regression model for TARGET_B was fit (see PR3 assignment for details). Stepwise selection was applied. The data set was split into Training and Validation sets
24.99% of validation data were misclassified, i.e. classified as donor when in fact a non-donor, or vice-versa
Average profit in validation data was $0.26 per person included in that set. This yielded a total profit of $2,280.76 from all persons in the validation set
The Lift and Response charts below compare the model’s performance using (1) Training and (2) Validation data. The baseline (Lift = 1) corresponds to a 5% response.
When the target is continuous (Linear Regression), the standard interpretation of a slope coefficient is as follows:
The slope tells you the change in Y when a particular input X increases by one unit, while all other inputs are kept constant.
Example: If a linear regression coefficient is equal to -2.5, that indicates that when the predictor increases, the target variable decreases (since the coefficient is negative). Specifically, for each unit increase in the predictor variable, the target variable decreases by 2.5 units
When the target is binary (Logistic Regression), the standard interpretation of an odds ratio coefficient is as follows:
The odds ratio coefficient is a multiplier on the odds ratio for the target event T (=probability for T / probability for non-T) when a particular input X increases by one unit, while all other inputs are kept constant.
Example: If an odds ratio coefficient is equal to 1.05, that indicates that when the predictor increases, the probability for the target event also increases. Specifically, for each unit increase in the predictor variable, the odds ratio P(event/nonevent) gets multiplied by 1.05