Statistical analysis sc504 hs927 spring term 2008
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Statistical Analysis SC504/HS927 Spring Term 2008 PowerPoint PPT Presentation


  • 47 Views
  • Uploaded on
  • Presentation posted in: General

Statistical Analysis SC504/HS927 Spring Term 2008. Introduction to Logistic Regression Dr. Daniel Nehring. Outline. Preliminaries: The SPSS syntax Linear regression and logistic regression OLS with a binary dependent variable Principles of logistic regression

Download Presentation

Statistical Analysis SC504/HS927 Spring Term 2008

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Statistical analysis sc504 hs927 spring term 2008

Statistical AnalysisSC504/HS927Spring Term 2008

Introduction to Logistic Regression

Dr. Daniel Nehring


Outline

Outline

  • Preliminaries: The SPSS syntax

  • Linear regression and logistic regression

  • OLS with a binary dependent variable

  • Principles of logistic regression

  • Interpreting logistic regression coefficients

  • Advanced principles of logistic regression (for self-study)

  • Source:

    http://privatewww.essex.ac.uk/~dfnehr


Preliminaries

PRELIMINARIES


The spss syntax

The SPSS syntax

  • Simple programming language allowing access to all SPSS operations

  • Access to operations not covered in the main interface

  • Accessible through syntax windows

  • Accessible through ‘Paste’ buttons in every window of the main interface

  • Documentation available in ‘Help’ menu


Using spss syntax files

Using SPSS syntax files

  • Saved in a separate file format through the syntax window

  • Run commands by highlighting them and pressing the arrow button.

  • Comments can be entered into the syntax.

  • Copy-paste operations allow easy learning of the syntax.

  • The syntax is preferable at all times to the main interface to keep a log of work and identify and correct mistakes.


Part i

PART I


Simple l inear regression

Simple linear regression

  • Relation between 2 continuous variables

    Regression coefficient b1

    • Measures associationbetween y and x

    • Amount by which y changes on average when x changes by one unit

    • Least squares method

y

Slope

x


Multiple l inear regression

Multiple linear regression

  • Relation between a continuous variable and a setof i continuous variables

  • Partial regression coefficients bi

    • Amount by which y changes on average when xi changes by one unit and all the other xis remain constant

    • Measures association between xi and y adjusted for all other xi


Multiple linear regression

Multiple linear regression

PredictedPredictor variables

Response variableExplanatory variables

DependentIndependent variables


Ols with a binary dependent variable

OLS with a binary dependent variable

  • Binary variables can take only 2 possible values:

    • yes/no (e.g. educated to degree level, smoker/non-smoker)

    • success/failure (e.g. of a medical treatment)

  • Coded 1 or 0 (by convention 1=yes/ success)

  • Using OLS for a binary dependent variable  predicted values can be interpreted as probabilities; expected to lie between 0 and 1

  • But nothing to constrain the regression model to predict values between 0 and 1; less than 0 & greater than 1 are possible and have no logical interpretation

  • Approaches which ensure that predicted values lie between 0 & 1 are required such as logistic regression


Fitting equation to the data

Fitting equation to the data

  • Linear regression: Least squares

  • Logistic regression: Maximum likelihood

  • Likelihood function

    • Estimates parameters with property that likelihood (probability) of observed data is higher than for any other values

    • Practically easier to work with log-likelihood


Maximum likelihood estimation mle

Maximum Likelihood Estimation (MLE)

  • OLS cannot be used for logistic regression since the relationship between the dependent and independent variable is non-linear

  • MLE is used instead to estimate coefficients on independent variables (parameters)

  • Of all possible values of these parameters, MLE chooses those under which the model would have been most likely to generate the observed sample


Logistic regression

Logistic regression

  • Models relationship betweenset of variables xi

    • dichotomous (yes/no)

    • categorical (social class, ...)

    • continuous (age, ...)

      and

    • dichotomous (binary) variable Y


Part ii

PART II


Logistic regression 1

Logistic regression (1)

  • ‘Logistic regression’ or ‘logit’

  • p is the probability of an event occurring

  • 1-p is the probability of the event not occurring

  • p can take any value from 0 to 1

  • the odds of the event occurring =

  • the dependent variable in a logistic regression is the natural log of the odds:


Logistic regression 2

Logistic regression (2)

  • ln (.) can take any value, p will always range from 0 to 1

  • the equation to be estimated is:


L ogistic regression 3

{

logit of P(y|x)

Logistic regression (3)

Logistic transformation


Predicting p

Predicting p

let

then to predict p for individual i,


L ogistic function 1

Logistic function (1)

Probability ofevent y

x


Part iii

PART III


Interpreting logistic regression coefficients

Interpreting logistic regression coefficients

  • intercept is value of ‘log of the odds’ when all independent variables are zero

  • each slope coefficient is the change in log odds from a 1-unit increase in the independent variable, controlling for the effects of other variables

  • two problems:

    • log odds not easy to interpret

    • change in log odds from 1-unit increase in one independent depends on values of other independent variables

  • but the exponent of b (eb) is not dependent on values of other independent variables and is the odds ratio


Odds ratio

Odds ratio

  • odds ratio for coefficient on a dummy variable, e.g. female=1 for women, 0 for men

  • odds ratio = ratio of the odds of event occurring for women to the odds of its occurring for men

  • odds for women are eb times odds for men


General rules for interpreting logistic regression coefficients

General rules for interpreting logistic regression coefficients

if b1 > 0, X1 increases p

if b1 < 0, X1 decreases p

if odds ratio >1, X1 increases p

if odds ratio < 1, X1 decreases p

if CI for b1 includes 0, X1 does not have a statistically significant effect on p

if CI for odds ratio includes 1, X1 does not have a statistically significant effect on p


An example modelling the relationship between disability age and income in the 65 population

An example: modelling the relationship between disability, age and income in the 65+ population

  • dependent variable = presence of disability (1=yes,0=no)

  • independent variables:

    X1 age in years (in excess of 65 i.e. 650, 70  5)

    X2 whether has low income (in lowest 3rd of the income distribution)

  • data: Health Survey for England, 2000


Example logistic regression estimate for probability of being disabled people aged 65

Example: logistic regression estimate for probability of being disabled, people aged 65+


Part iv

PART IV


Odds log odds odds ratios and probabilities

Odds, log odds, odds ratios and probabilities


Odds odd ratios and probabilities

Odds, odd ratios and probabilities

  • pj= 0.2 i.e. a 20% probability

  • oddsj = 0.2/(1-0.2) = 0.2/0.8 = 0.25

  • pk = 0.4

  • oddsk= 0.4/0.6 = 0.67

  • relative probability/risk pj/pk = 0.2/0.4 = 0.5

  • odds ratio, oddsi/oddsj = 0.25/0.67 = 0.37

  • odds ratio is not equal to relative probability/risk

  • exceptapproximately if pj and pk are small………


Points to note from logit example xls

Points to note from logit example.xls

  • if you see an odds ratio of e.g. 1.5 for a dummy variable indicating female, beware of saying ‘women have a probability 50% higher than men’. Only if both p’s are small can you say this.

  • better to calculate probabilities for example cases and compare these


Predicting p1

Predicting p

let

then to predict p for individual i,


E g predicting a probability from our model

E.g.: Predicting a probability from our model

  • Predict disability for someone on low income aged 75:

  • Add up the linear equation

    a(=-.912) + [age over 65 i.e.]10*0.078+1*-0.27

    =-0.402

  • Take the exponent of it to get to the odds of being disabled

    =.669

  • Put the odds over 1+the odds to give the probability

    =c.0.4 – or a 40 per cent chance of being disabled


Goodness of fit in logistic regressions

Goodness of fit in logistic regressions

  • based on improvements in the likelihood of observing the sample

  • use a chi-square test with the test statistic =

  • where R and U indicate restricted and unrestricted models

  • unrestricted – all independent variables in model

  • restricted – all or a subset of variables excluded from the model (their coefficients restricted to be 0)


Statistical significance of coefficient estimates in logistic regressions

Statistical significance of coefficient estimates in logistic regressions

  • Calculated using standard errors as in OLS

  • for large n, t > 1.96 means that there is a 5% or lower probability that the true value of the coefficient is 0.

    or p  0.05


95 confidence intervals for logistic regression coefficient estimates

95% confidence intervals for logistic regression coefficient estimates

  • For CIs of odds ratios calculate CIs for coefficients and take their exponents


  • Login