logistic regression n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Logistic Regression PowerPoint Presentation
Download Presentation
Logistic Regression

Loading in 2 Seconds...

play fullscreen
1 / 75

Logistic Regression - PowerPoint PPT Presentation


  • 112 Views
  • Uploaded on

Logistic Regression. overview. Remember?. Applications: Prediction vs. Explanatory Analysis. The terms in the model, the values of their coefficients, and their statistical significance are of secondary importance .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Logistic Regression' - ashlyn


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
remember
Remember?

Applications: Predictionvs. Explanatory Analysis

  • The terms in the model, the values of their coefficients, and their statistical significance are of secondary importance.
  • The focus is on producing a model that is the best at predicting future values of Y as a function of the Xs. The predicted value of Y is given by this formula:
  • The focus is on understanding the relationship between the dependent variable and the independent variables.
  • Consequently, the statistical significance of the coefficients is important as well as the magnitudes and signs of the coefficients.
logistic regression1

Target Marketing

Attrition Prediction

Credit Scoring

Fraud Detection

Logistic Regression

Примеры задач

logistic regression2
Logistic regression

Regression and other models

logistic regression3
Logistic regression

Types of Logistic Regression

logistic regression4
Logistic regression

Supervised (binary) Classification

(Binary) Target

Input Variables

y

x1

x2

x3

x4

x5

x6

...

xk

1

...

2

...

3

...

Cases

4

...

5

...

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

n

...

logistic regression5
Logistic regression

Задача и данные

Other product usage in a three month period

Did customer purchase variable annuity product?

1= yes

0= no

Demographics

~32’000 obs

47 vars

logistic regression6
Logistic regression

Задача и данные

analytical challenges1
Analytical Challenges

Opportunistic Data

Operational / Observational

Massive

  • Analytical data preparation step:
  • BENCHMARK: 80/20
  • [MY] LIFE: 99/1

Errors and Outliers

2+2=5

Missing Values

analytical challenges2
Analytical Challenges

Mixed Measurement Scales

sales, executive, homemaker, ...

88.60, 3.92, 34890.50, 45.01, ...

F, D, C, B, A

0, 1, 2, 3, 4, 5, 6, ...

M, F

27513, 21737, 92614, 10043, ...

analytical challenges3
Analytical Challenges

High Dimensionality

analytical challenges4
Analytical Challenges

Rare Target Event

Event

respond

churn

default

fraud

No Event

not respond

stay

pay off

legitimate

analytical challenges5
Analytical Challenges

Nonlinearities and Interactions

E(y)

E(y)

x1

x1

x2

x2

Nonlinear

Nonadditive

Linear

Additive

analytical challenges6
Analytical Challenges

Model Selection

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Underfitting

Overfitting

Just Right

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

logistic regression7
Logistic REGRESSION

Why not linear?

  • OLS Reg: Yi=0+1X1i+i

Linear Prob. Model: pi=0+1X1i

  • Probabilities are bounded, but linear functions can take on any value. (Once again, how do you interpret a predicted value of -0.4 or 1.1?)
  • Given the bounded nature of probabilities, can you assume a linear relationship between X and p throughout the possible range of X?
  • Can you assume a random error with constant variance?
  • What is the observed probability for an observation?
  • If the response variable is categorical, then how do you code the response numerically?
  • If the response is coded (1=Yes and 0=No) and your regression equation predicts 0.5 or 1.1 or -0.4, what does that mean practically?
  • If there are only two (or a few) possible response levels, is it reasonable to assume constant variance and normality?
logistic regression8
Logistic REGRESSION

Functional Form

posterior probability

parameter

input

logistic regression9
Logistic REGRESSION

The Logit Link Function

pi = 1

pi = 0

smaller    larger

logistic regression10
Logistic REGRESSION

The Fitted Surface

logistic regression11
Logistic REGRESSION

LOGISTIC Procedure

proclogisticdata=develop plots(only)=(effect(clbandx=(ddabaldepamt checks res))

oddsratio (type=horizontalstat));

class res (param=ref ref='S');

model ins(event='1') =

ddaddabaldepdepamtcashbkchecks res / stbclodds=pl;

unitsddabal=1000depamt=1000 / default=1;

oddsratio'Comparisons of Residential Classification' res / diff=allcl=pl;

run;

logistic regression12
Logistic REGRESSION

Properties of the Odds Ratio

No Association

Группа в числителеимеет более высокие шансы

Группа в знаменателеимеет более высокие шансы наступления события

0 1

Estimated logistic regression model:

logit(p) = .7567 + .4373*(gender)

where females are coded 1 and males are coded 0

Estimated odds ratio (Females to Males):

odds ratio = (e-.7567+.4373)/(e-.7567) = 1.55

logistic regression13
Logistic REGRESSION

Results from oddsratio

oddsratio'Comparisons of Residential Classification' res / diff=allcl=pl;

logistic regression14
Logistic REGRESSION

Results from PLOTS =(EFFECT(…

plots(only)=(effect(clbandx=(ddabaldepamt checks res))

logistic regression15
Logistic REGRESSION

Logistic Discrimination

oversampling1
oversampling

Sampling Designs

Joint

(x,y),(x,y),(x,y),

(x,y),(x,y),(x,y),

(x,y),(x,y),(x,y),

(x,y),(x,y),...

{(x,y),(x,y),(x,y),(x,y)}

Separate

x,x,x,

x,x,x,

x,x,x,

x,x,...

x,x,x,

x,x,x,

x,x,x,

x,x,...

{(x,0),(x,0),(x,1),(x,1)}

y = 0

y = 1

oversampling2
oversampling

The Effect of Oversampling

oversampling3
oversampling

Offset

Два способа корректировки

Включить параметр «сдвига» в модель

Скорректировать вероятности на выходе модели

Adjusted Probability:

model… / offset=X

- в действительности

- в выборке

oversampling4
oversampling

Корректировка вероятностей

/* Specify the prior probability */

/* to correct for oversampling */

%let pi1=.02;

/* Correct predicted probabilities */

proclogisticdata=develop;

model ins(event='1')=ddaddabaldepdepamtcashbk checks;

scoredata = pmlr.newout=scored priorevent=&pi1;

run;

missing values

14

2

2

67

1

4

?

3

1

33

1

7

18

2

1

6

0

1

31

3

8

51

1

8

Missing values

Does Pr(missing) Depend on the Data?

  • No
    • MCAR (missing completely at random)
  • Yes
    • that unobserved value
    • other unobserved values
    • other observed values(including the target)
missing values1
Missing values

Complete Case Analysis

Input Variables

Cases

...

missing values2
Missing values

Complete Case Analysis

Input Variables

Cases

missing values3
Missing values

New Missing Values

Fitted Model:

New Case:

Predicted Value:

missing values4
Missing values

Missing Value Imputation

6

03

2.6

0

8.3

42

66

C03

12

04

1.8

0

0.5

86

65

C14

6.5

01

2.3

.33

4.8

37

66

C00

8

01

2.1

1

4.8

37

64

C08

6

01

2.8

1

9.6

22

66

C99

3

01

2.7

0

1.1

28

64

C00

2

02

2.1

1

5.9

21

63

C03

10

03

2.0

0

0.8

0

63

C99

7

01

2.5

0

5.5

62

67

C12

6.5

01

2.4

0

0.9

29

63

C05

missing values5
Missing values

Imputation + Indicators

Incomplete

Data

Completed

Data

Missing

Indicator

34

63

.

22

26

54

18

.

47

20

34

63

30

22

26

54

18

30

49

20

0

0

1

0

0

0

0

1

0

0

Median = 30

missing values6
Missing values

Imputation + Indicators

datadevelop1; /* Create missing indicators */

set develop;

/* name the missing indicator variables */

array mi{*} MIAcctAgMIPhone … MICRScor;

/* select variables with missing values */

array x{*} acctage phone … crscore;

doi=1to dim(mi);

mi{i}=(x{i}=.);

end;

run;

procstdizedata=develop1

reponly

method=median /* Impute missing values with the median */

out=imputed;

var &inputs;

run;

missing values7

X1 =

X2 = ?

Missing values

Cluster Imputation [at later lectures]

categorical inputs1
Categorical Inputs

Dummy Variables

X

DA

DB

DC

DD

D

B

C

C

A

A

D

C

A

.

.

.

0

0

0

0

1

1

0

0

1

.

.

.

0

1

0

0

0

0

0

0

0

.

.

.

0

0

1

1

0

0

0

1

0

.

.

.

1

0

0

0

0

0

1

0

0

.

.

.

categorical inputs2
Categorical Inputs

Smarter Variables

ZIP

Urbanicity

HomeVal

Local

...

99801

99622

99523

99523

99737

99937

99533

99523

99622

.

.

.

75

100

150

150

150

75

100

150

100

.

.

.

1

1

1

0

1

1

1

0

1

.

.

.

1

2

1

1

3

3

2

1

3

.

.

.

categorical inputs3

0

1

DA

DB

Dc

DD

1

0

0

0

0

1

0

0

0

0

1

0

0

0

0

1

28

7

A

16

0

B

94

11

C

23

21

D

Categorical Inputs

Quasi-Complete Separation

categorical inputs4
Categorical Inputs

Clustering Levels

0

1

28

7

A

16

0

B

94

11

C

23

21

D

Merged:

2=

31.7

100%

...

categorical inputs5

0

1

28

7

110

11

23

21

B & C

30.7

97%

Categorical Inputs

Clustering Levels

0

1

28

7

A

16

0

B

94

11

C

23

21

D

Merged:

2=

31.7

100%

...

categorical inputs6

0

1

0

1

28

7

138

18

110

11

23

21

23

21

B & C

A & BC

30.7

28.6

97%

90%

Categorical Inputs

Clustering Levels

0

1

28

7

A

16

0

B

94

11

C

23

21

D

Merged:

2=

31.7

100%

...

categorical inputs7

0

1

0

1

28

7

0

1

138

18

110

11

161

39

23

21

23

21

Merged:

B & C

A & BC

ABC & D

2=

31.7

30.7

28.6

0

100%

97%

90%

0%

Categorical Inputs

Clustering Levels

0

1

28

7

A

16

0

B

94

11

C

23

21

D

Greenacre (1988, 1993) PROC MEANS – PROC CLUSTER – PROC TREE -… HOME WORK

variable clustering2
Variable Clustering

Procvarclass[later Lecture]

Checking Deposits

Mortgage Balance

Number of Checks

Teller Visits

Credit Card Balance

Age

variable screening
Variable Screening

Univariate Screening

variable screening1
Variable Screening

Univariate Smoothing

slide54
Empirical Logits

where

mi= number of events

Mi = number of cases

empirical logit plots

1. Hand-Crafted New Input Variables

2. Polynomial Models

3. Flexible Multivariate Function Estimators

4. Do Nothing

Empirical Logit Plots
subset selection1

All

Subsets

Stepwise

Time

Fast Backward

25

50

75

100

150

200

Number of Variables

Subset Selection

Scalability in PROC LOGISTIC

honest assessment

Training

Test

Accuracy = 70%

Accuracy = 47%

x1

 gray

black 

 gray

black 

x2

x2

Honest Assessment

The Optimism Principle

honest assessment1
Honest Assessment

Data Splitting

Validation

Training

Test

honest assessment2

A

B

C

D

E

1)

2)

3)

4)

5)

Train

BCDE

ACDE

ABDE

ABCE

ABCD

Validate

A

B

C

D

E

Honest Assessment

Other Approaches

misclassification

Predicted Class

0

1

True

Negative

False

Positive

Actual

Negative

0

Actual Class

False

Negative

True

Positive

Actual

Positive

1

Predicted

Negative

Predicted

Positive

Misclassification

Confusion Matrix

slide63

Predicted Class

0

1

0

Actual Class

True

Positive

Actual

Positive

1

Predicted

Positive

Sensitivity and Positive Predicted Value

slide65

Predicted

Predicted

0

1

0

1

29

21

56

41

0

50

97

Actual

17

33

1

2

1

50

3

46

54

57

43

Sample

Population

Oversampled Test Set

slide66

Predicted Class

0

1

0·Sp

0(1—Sp)

0

0

Actual Class

1(1—Se)

1·Se

1

1

Adjustments for Oversampling

allocation rules1

Total Profit

70

5

Predicted

16*99 - 5 = $1579

9

16

0

1

$0

-$1

0

66

9

Actual

21*99 - 9 = $2070

$0

$99

4

21

1

57

18

24*99 - 18 = $2358

1

24

Allocation Rules

Profit Matrix

allocation rules2

Decision

Bayes Rule:

Decision 1 if

0

1

0

Actual Class

1

Allocation Rules

Profit Matrix

allocation rules3
Allocation Rules

Classifier Performance

allocation rules4
Allocation Rules

Using Profit to Assess Fit

overall predictive power
Overall Predictive Power

Class Separation

overall predictive power2
Overall Predictive Power

Area under the ROC Curve

slide75
ROC and ROCCONTRAST Statements

ROC<'label'> <specification> </ options>;

ROCCONTRAST<'label'><contrast></ options>;