An Introduction to Logistic Regression - PowerPoint PPT Presentation

An introduction to logistic regression
1 / 21

  • Uploaded on
  • Presentation posted in: General

An Introduction to Logistic Regression. Eni sumarminingsih , Ssi , mm Program studi statistika Jurusan matematika Universitas brawijaya. Outline. Introduction and Description Some Potential Problems and Solutions. Introduction and Description. Why use logistic regression?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

An Introduction to Logistic Regression

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

An introduction to logistic regression

An Introduction to Logistic Regression

Enisumarminingsih, Ssi, mm

Program studistatistika





  • Introduction and Description

  • Some Potential Problems and Solutions

Introduction and description

Introduction and Description

  • Why use logistic regression?

  • Estimation by maximum likelihood

  • Interpreting coefficients

  • Hypothesis testing

  • Evaluating the performance of the model

Why use logistic regression

Why use logistic regression?

  • There are many important research topics for which the dependent variable is "limited."

  • For example: voting, morbidity or mortality, and participation data is not continuous or distributed normally.

  • Binary logistic regression is a type of regression analysis where the dependent variable is a dummy variable: coded 0 (did not vote) or 1(did vote)

The linear probability model

The Linear Probability Model

In the OLS regression:

Y =  + X + e ; where Y = (0, 1)

  • The error terms are heteroskedastic

  • e is not normally distributed because Y takes on only two values

  • The predicted probabilities can be greater than 1 or less than 0

An example

An Example

You are a researcher who is interested in understanding the effect of smoking and weight upon resting pulse rate. Because you have categorized the response-pulse rate-into low and high, a binary logistic regression analysis is appropriate to investigate the effects of smoking and weight upon pulse rate.

The data

The Data

Ols results

OLS Results


Regression Analysis: Tekanan Darah versus Weight, Merokok

The regression equation is

Tekanan Darah = 0.745 - 0.00392 Weight + 0.210 Merokok

Predictor Coef SE Coef T P

Constant 0.7449 0.2715 2.74 0.007

Weight -0.003925 0.001876 -2.09 0.039

Merokok 0.20989 0.09626 2.18 0.032

S = 0.416246 R-Sq = 7.9% R-Sq(adj) = 5.8%



Predicted Values outside the 0,1 range

Descriptive Statistics: FITS1

Variable N N* Mean StDev Minimum Q1 Median Q3 Maximum

FITS1 92 0 0.2391 0.1204 -0.0989 0.1562 0.2347 0.3132 0.5309



The logistic regression model

The Logistic Regression Model

The "logit" model solves these problems:ln[p/(1-p)] =  + X + e

  • p is the probability that the event Y occurs, p(Y=1)

  • p/(1-p) is the "odds ratio"

  • ln[p/(1-p)] is the log odds ratio, or "logit"

An introduction to logistic regression


  • The logistic distribution constrains the estimated probabilities to lie between 0 and 1.

  • The estimated probability is:p = 1/[1 + exp(- - X)]

  • if you let  + X =0, then p = .50

  • as  + X gets really big, p approaches 1

  • as  + X gets really small, p approaches 0

Comparing lp and logit models

Logit Model

Comparing LP and Logit Models


LP Model




Maximum likelihood estimation mle

Maximum Likelihood Estimation (MLE)

  • MLE is a statistical method for estimating the coefficients of a model.

Interpreting coefficients

Interpreting Coefficients

  • Since: ln[p/(1-p)] =  + X + e

    The slope coefficient () is interpreted as the rate of change in the "log odds" as X changes … not very useful.

An introduction to logistic regression

  • An interpretation of the logit coefficient which is usually more intuitive is the "odds ratio"

  • Since: [p/(1-p)] = exp( + X)

    exp() is the effect of the independent variable on the "odds ratio"

From minitab output

From Minitab Output:

Logistic Regression Table

                                                Odds     95% CI

Predictor       Coef     SE Coef       Z      P   Ratio  Lower  Upper

Constant    -1.98717    1.67930   -1.18   0.237


 Yes        -1.19297    0.552980  -2.16   0.031    0.30   0.10   0.90

Weight     0.0250226  0.0122551  2.04   0.041   1.03   1.00   1.05

  • **Although there is evidence that the estimated coefficient for Weight is not zero, the odds ratio is very close to one  (1.03), indicating that a one pound increase in weight minimally effects a person's resting pulse rate

  • **Given that subjects have the same weight, the odds ratio can be interpreted as the odds of smokers in the sample having a low pulse being 30% of the odds of non-smokers having a low pulse.

Hypothesis testing

Hypothesis Testing

  • The Wald statistic for the  coefficient is:Wald (Z)= [/s.e.B]2which is distributed chi-square with 1 degree of freedom.

  • The last Log-Likelihood from the maximum likelihood iterations is displayed along with the statistic G. This statistic tests the null hypothesis that all the coefficients associated with predictors equal zero versus these coefficients not all being equal to zero. In this example, G = 7.574, with a p-value of 0.023, indicating that there is sufficient evidence that at least one of the coefficients is different from zero, given that your accepted level is greater than 0.023.

Evaluating the performance of the model

Evaluating the Performance of the Model

Goodness-of-Fit Tests displays Pearson, deviance, and Hosmer-Lemeshow goodness-of-fit tests. If the p-value is less than your accepted α-level, the test would reject the null hypothesis of an adequate fit.

The goodness-of-fit tests, with p-values ranging from 0.312 to 0.724, indicate that there is insufficient evidence to claim that the model does not fit the data adequately



  • The presence of multicollinearity will not lead to biased coefficients.

  • But the standard errors of the coefficients will be inflated.

  • If a variable which you think should be statistically significant is not, consult the correlation coefficients.

  • If two variables are correlated at a rate greater than .6, .7, .8, etc. then try dropping the least theoretically important of the two.

  • Login