Logistic Regression

1 / 75

# Logistic Regression - PowerPoint PPT Presentation

Logistic Regression. overview. Remember?. Applications: Prediction vs. Explanatory Analysis. The terms in the model, the values of their coefficients, and their statistical significance are of secondary importance .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Logistic Regression' - ashlyn

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Remember?

Applications: Predictionvs. Explanatory Analysis

• The terms in the model, the values of their coefficients, and their statistical significance are of secondary importance.
• The focus is on producing a model that is the best at predicting future values of Y as a function of the Xs. The predicted value of Y is given by this formula:
• The focus is on understanding the relationship between the dependent variable and the independent variables.
• Consequently, the statistical significance of the coefficients is important as well as the magnitudes and signs of the coefficients.

Target Marketing

Attrition Prediction

Credit Scoring

Fraud Detection

Logistic Regression

Примеры задач

Logistic regression

Regression and other models

Logistic regression

Types of Logistic Regression

Logistic regression

Supervised (binary) Classification

(Binary) Target

Input Variables

y

x1

x2

x3

x4

x5

x6

...

xk

1

...

2

...

3

...

Cases

4

...

5

...

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

n

...

Logistic regression

Задача и данные

Other product usage in a three month period

Did customer purchase variable annuity product?

1= yes

0= no

Demographics

~32’000 obs

47 vars

Logistic regression

Задача и данные

Analytical Challenges

Opportunistic Data

Operational / Observational

Massive

• Analytical data preparation step:
• BENCHMARK: 80/20
• [MY] LIFE: 99/1

Errors and Outliers

2+2=5

Missing Values

Analytical Challenges

Mixed Measurement Scales

sales, executive, homemaker, ...

88.60, 3.92, 34890.50, 45.01, ...

F, D, C, B, A

0, 1, 2, 3, 4, 5, 6, ...

M, F

27513, 21737, 92614, 10043, ...

Analytical Challenges

High Dimensionality

Analytical Challenges

Rare Target Event

Event

respond

churn

default

fraud

No Event

not respond

stay

pay off

legitimate

Analytical Challenges

Nonlinearities and Interactions

E(y)

E(y)

x1

x1

x2

x2

Nonlinear

Linear

Analytical Challenges

Model Selection

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Underfitting

Overfitting

Just Right

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Logistic REGRESSION

Why not linear?

• OLS Reg: Yi=0+1X1i+i

Linear Prob. Model: pi=0+1X1i

• Probabilities are bounded, but linear functions can take on any value. (Once again, how do you interpret a predicted value of -0.4 or 1.1?)
• Given the bounded nature of probabilities, can you assume a linear relationship between X and p throughout the possible range of X?
• Can you assume a random error with constant variance?
• What is the observed probability for an observation?
• If the response variable is categorical, then how do you code the response numerically?
• If the response is coded (1=Yes and 0=No) and your regression equation predicts 0.5 or 1.1 or -0.4, what does that mean practically?
• If there are only two (or a few) possible response levels, is it reasonable to assume constant variance and normality?
Logistic REGRESSION

Functional Form

posterior probability

parameter

input

Logistic REGRESSION

pi = 1

pi = 0

smaller    larger

Logistic REGRESSION

The Fitted Surface

Logistic REGRESSION

LOGISTIC Procedure

proclogisticdata=develop plots(only)=(effect(clbandx=(ddabaldepamt checks res))

oddsratio (type=horizontalstat));

class res (param=ref ref='S');

model ins(event='1') =

unitsddabal=1000depamt=1000 / default=1;

oddsratio'Comparisons of Residential Classification' res / diff=allcl=pl;

run;

Logistic REGRESSION

Properties of the Odds Ratio

No Association

Группа в числителеимеет более высокие шансы

Группа в знаменателеимеет более высокие шансы наступления события

0 1

Estimated logistic regression model:

logit(p) = .7567 + .4373*(gender)

where females are coded 1 and males are coded 0

Estimated odds ratio (Females to Males):

odds ratio = (e-.7567+.4373)/(e-.7567) = 1.55

Logistic REGRESSION

Results from oddsratio

oddsratio'Comparisons of Residential Classification' res / diff=allcl=pl;

Logistic REGRESSION

Results from PLOTS =(EFFECT(…

plots(only)=(effect(clbandx=(ddabaldepamt checks res))

Logistic REGRESSION

Logistic Discrimination

oversampling

Sampling Designs

Joint

(x,y),(x,y),(x,y),

(x,y),(x,y),(x,y),

(x,y),(x,y),(x,y),

(x,y),(x,y),...

{(x,y),(x,y),(x,y),(x,y)}

Separate

x,x,x,

x,x,x,

x,x,x,

x,x,...

x,x,x,

x,x,x,

x,x,x,

x,x,...

{(x,0),(x,0),(x,1),(x,1)}

y = 0

y = 1

oversampling

The Effect of Oversampling

oversampling

Offset

Два способа корректировки

Включить параметр «сдвига» в модель

Скорректировать вероятности на выходе модели

model… / offset=X

- в действительности

- в выборке

oversampling

Корректировка вероятностей

/* Specify the prior probability */

/* to correct for oversampling */

%let pi1=.02;

/* Correct predicted probabilities */

proclogisticdata=develop;

scoredata = pmlr.newout=scored priorevent=&pi1;

run;

14

2

2

67

1

4

?

3

1

33

1

7

18

2

1

6

0

1

31

3

8

51

1

8

Missing values

Does Pr(missing) Depend on the Data?

• No
• MCAR (missing completely at random)
• Yes
• that unobserved value
• other unobserved values
• other observed values(including the target)
Missing values

Complete Case Analysis

Input Variables

Cases

...

Missing values

Complete Case Analysis

Input Variables

Cases

Missing values

New Missing Values

Fitted Model:

New Case:

Predicted Value:

Missing values

Missing Value Imputation

6

03

2.6

0

8.3

42

66

C03

12

04

1.8

0

0.5

86

65

C14

6.5

01

2.3

.33

4.8

37

66

C00

8

01

2.1

1

4.8

37

64

C08

6

01

2.8

1

9.6

22

66

C99

3

01

2.7

0

1.1

28

64

C00

2

02

2.1

1

5.9

21

63

C03

10

03

2.0

0

0.8

0

63

C99

7

01

2.5

0

5.5

62

67

C12

6.5

01

2.4

0

0.9

29

63

C05

Missing values

Imputation + Indicators

Incomplete

Data

Completed

Data

Missing

Indicator

34

63

.

22

26

54

18

.

47

20

34

63

30

22

26

54

18

30

49

20

0

0

1

0

0

0

0

1

0

0

Median = 30

Missing values

Imputation + Indicators

datadevelop1; /* Create missing indicators */

set develop;

/* name the missing indicator variables */

array mi{*} MIAcctAgMIPhone … MICRScor;

/* select variables with missing values */

array x{*} acctage phone … crscore;

doi=1to dim(mi);

mi{i}=(x{i}=.);

end;

run;

procstdizedata=develop1

reponly

method=median /* Impute missing values with the median */

out=imputed;

var &inputs;

run;

X1 =

X2 = ?

Missing values

Cluster Imputation [at later lectures]

Categorical Inputs

Dummy Variables

X

DA

DB

DC

DD

D

B

C

C

A

A

D

C

A

.

.

.

0

0

0

0

1

1

0

0

1

.

.

.

0

1

0

0

0

0

0

0

0

.

.

.

0

0

1

1

0

0

0

1

0

.

.

.

1

0

0

0

0

0

1

0

0

.

.

.

Categorical Inputs

Smarter Variables

ZIP

Urbanicity

HomeVal

Local

...

99801

99622

99523

99523

99737

99937

99533

99523

99622

.

.

.

75

100

150

150

150

75

100

150

100

.

.

.

1

1

1

0

1

1

1

0

1

.

.

.

1

2

1

1

3

3

2

1

3

.

.

.

0

1

DA

DB

Dc

DD

1

0

0

0

0

1

0

0

0

0

1

0

0

0

0

1

28

7

A

16

0

B

94

11

C

23

21

D

Categorical Inputs

Quasi-Complete Separation

Categorical Inputs

Clustering Levels

0

1

28

7

A

16

0

B

94

11

C

23

21

D

Merged:

2=

31.7

100%

...

0

1

28

7

110

11

23

21

B & C

30.7

97%

Categorical Inputs

Clustering Levels

0

1

28

7

A

16

0

B

94

11

C

23

21

D

Merged:

2=

31.7

100%

...

0

1

0

1

28

7

138

18

110

11

23

21

23

21

B & C

A & BC

30.7

28.6

97%

90%

Categorical Inputs

Clustering Levels

0

1

28

7

A

16

0

B

94

11

C

23

21

D

Merged:

2=

31.7

100%

...

0

1

0

1

28

7

0

1

138

18

110

11

161

39

23

21

23

21

Merged:

B & C

A & BC

ABC & D

2=

31.7

30.7

28.6

0

100%

97%

90%

0%

Categorical Inputs

Clustering Levels

0

1

28

7

A

16

0

B

94

11

C

23

21

D

Greenacre (1988, 1993) PROC MEANS – PROC CLUSTER – PROC TREE -… HOME WORK

Variable Clustering

Procvarclass[later Lecture]

Checking Deposits

Mortgage Balance

Number of Checks

Teller Visits

Credit Card Balance

Age

Variable Screening

Univariate Screening

Variable Screening

Univariate Smoothing

Empirical Logits

where

mi= number of events

Mi = number of cases

1. Hand-Crafted New Input Variables

2. Polynomial Models

3. Flexible Multivariate Function Estimators

4. Do Nothing

Empirical Logit Plots

All

Subsets

Stepwise

Time

Fast Backward

25

50

75

100

150

200

Number of Variables

Subset Selection

Scalability in PROC LOGISTIC

Training

Test

Accuracy = 70%

Accuracy = 47%

x1

 gray

black 

 gray

black 

x2

x2

Honest Assessment

The Optimism Principle

Honest Assessment

Data Splitting

Validation

Training

Test

A

B

C

D

E

1)

2)

3)

4)

5)

Train

BCDE

ACDE

ABDE

ABCE

ABCD

Validate

A

B

C

D

E

Honest Assessment

Other Approaches

Predicted Class

0

1

True

Negative

False

Positive

Actual

Negative

0

Actual Class

False

Negative

True

Positive

Actual

Positive

1

Predicted

Negative

Predicted

Positive

Misclassification

Confusion Matrix

Predicted Class

0

1

0

Actual Class

True

Positive

Actual

Positive

1

Predicted

Positive

Sensitivity and Positive Predicted Value

Predicted

Predicted

0

1

0

1

29

21

56

41

0

50

97

Actual

17

33

1

2

1

50

3

46

54

57

43

Sample

Population

Oversampled Test Set

Predicted Class

0

1

0·Sp

0(1—Sp)

0

0

Actual Class

1(1—Se)

1·Se

1

1

Total Profit

70

5

Predicted

16*99 - 5 = \$1579

9

16

0

1

\$0

-\$1

0

66

9

Actual

21*99 - 9 = \$2070

\$0

\$99

4

21

1

57

18

24*99 - 18 = \$2358

1

24

Allocation Rules

Profit Matrix

Decision

Bayes Rule:

Decision 1 if

0

1

0

Actual Class

1

Allocation Rules

Profit Matrix

Allocation Rules

Classifier Performance

Allocation Rules

Using Profit to Assess Fit

Overall Predictive Power

Class Separation

Overall Predictive Power

Area under the ROC Curve

ROC and ROCCONTRAST Statements

ROC<'label'> <specification> </ options>;

ROCCONTRAST<'label'><contrast></ options>;