Logistic Regression Analysis for Predicting Binary Outcomes
400 likes | 569 Views
Learn the principles of logistic regression and how to predict binary outcomes using SPSS. Explore odds ratio, logit transformation, and analyze categorical and continuous predictors.
Logistic Regression Analysis for Predicting Binary Outcomes
E N D
Presentation Transcript
Logistic regression analysis • Martin van der Esch, PhD
Discovering statistics using SPSS Andy Field • http://www.youtube.com/watch?v=OvQShzJ7Sns (part 1) • http://www.youtube.com/watch?v=zdJhydkcqv4 (part 2) • http://www.youtube.com/watch?v=hxcDOoupB4Y (part 3) • etc
Logistic regression analysis • The basic principle of logistic regression is much the same as in linear regression analysis • Aim is to predict a transformation of the dichotomized dependent variable • logit transformation
Step 1: simple linear regression equation for binary dependent variable: Step 2: formulate estimated probability of Y: Step 3: in logistic regression we use odds ratio for estimated probability: Steps to follow
Step 4: in case of skewed data (right sided):Logit transformation , makes log odds. Step 5: Different ways of presentation: estimated probability of p can be calculated from combination of variables Steps to follow 2
Binary instead of continuous outcome We are interested in a binary outcome measure For example; Heart attack Y = 0 (“no”) Y = 1 (“yes”)
… and we want But, how do we get there…?
Analysing a binary variable (Y) as if it was a continuous variable Not possible, because Y (heart attack) is no or yes (0 or 1)
Heart attack 1 0,8 0,6 0,4 0,2 0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 age Possible… Relation between age and probable heart attack; p(y=1) Relation between age and probable heart attack; p(y=1)
use of logistic model • NO modelling of the dichotomous outcome event itself • model probability of the outcome event given a set of prognostic factors • probability (D=1 | X1,X2,…,Xn) • probability (death | man, 80 yrs, with hypertension, normal cholesterol level)
Estimated probability of outcome But, distribution of probability is skewed…
Logit(p) of outcome Logit transformation of proportion to remove skeweness!
Logit(p) of outcome a probability can be transformed into a number between minus infinity and infinity in two step • obtain the odds (2 out of 5 is sick: odds = 2/3) • take the natural logarithm The natural logarithm is the logaritm with the basic value e (e=2,71828…): 'elog' of 'ln'
Model • the ln(odds) of an event is modelled • the model is similar to the linear regression model
Model • It is far more easier to model along the whole number line, as in linear regression • from minus infinity to infinity • a probability is defined as being between 0 and 1
Solution: logit transformation (is linear in x) Logit
Outcome = natural logit of the odds on the outcome Model for Logistic Regression
Summary • Rewrite the outcome as a probility on the outcome • 2. Logit transformation: rewrite the outcome as a Ln(odds)
Model voor Logistic regression β’s (beta’s) estimated with Maximum Likelihood procedure
Logistic regression analysis • ‘Best’ line is calculated with ‘maximum likelihood procedure’ • Maximum likelihood: obtained by several repeated cycles of calculation
Example: Binary outcome (heart attack) and one binary predictor (smoking)
Ln(odds)infarct = -0.171 + 0,8 x Roken What is ß0 ? ß0 = ln(odds)heartattack non-smoker oddsheartattack non-smoker = EXP(ß0)
Ln(odds)heartattack = -171 + 0,8 x Roken ln(odds)smoking - ln(odds)non-smoking = ß0 + ß1 - ß0= ß1 ln[(odds)smoking/(odds)non-smoking]= ß1 ln (OR) = ß1 OR = EXP(ß1) = EXP(0,8) = 2,23 Interpretation?
Hypothesis testing: statistical difference between smokers and non-smokers • Wald toets • 95% CI of Odds Ratio • Likelihood-ratio-test (see M2-HC7 diagnosis)
Wald toets = (b/SE(b))2 Chi-square divided with one degree of freedom (0.7997 / 0.2454)2 = 10.6231
Example: Binary outcome (heart attack) and one binary predictor (smoking)
Testing the regression coefficient • Likelihoodratiotest: • -2log likelihood of the model with the determinant in comparison with the -2log likelihood of the model without the determinant • Difference is chi-square divided • The amount of df is the same as the difference between the variables between both models
Logistic regression with categorical predictor Analysis of three groups
Frequence of ‘recovery’ • recovery recovery • group yes no • medication1 35 65 • medication2 40 60 • placebo 20 80
What to do? • We analyse both medicationgroups with the placebogroup with dummy-variables
We are also able to analyse the relationship between continuous variable and binary outcome with logistic regression analysis
Logistic regression analysic with a continuous variable • Relation between age and pain(no/yes) • Effect size is odds ratio for the change of one unit of the determinant
Linearity check • Similar with linear regression analysis • No scatter plot, but histogram: • Adding a quadratic term and splitting exposure variable into groups. • Be careful: do not use OR, but !