Chapter_6_Field_2005 Logistic Regression

1. Chapter_6_Field_2005Logistic Regression Logistic Regression is used when the outcome variable is categorical (binary). Can we tell in which category (e.g., male or female; + or � performance in a test, dead or alive after a treatment) a person belongs by looking at some predictor variable(s)?

2. The logistic regression equation(s)? Normal regression We predict the value of the outcome variable Y from the predictor(s) X Simple: Yi = (b0 + b1Xi) + ?i Multiple: Yi = (b0 + b1X1 + b2X2 + ... + bnXn) + ?i Logistic regression We predict the probability P of the outcome Y (e.g., showing + or - performance) from the predictor(s) X P(Y)= 1 Simple Log. Regr. 1+e -(b0+b1X1+?i)? P(Y)= 1 Multiple Log. Regr. 1+e -(b0+b1X1+b2X2+...+bnXn+?i)?

3. The logistic function in regression The logistic function has the advantage that it: Yields probabilities (between 0 and 1). Can integrate dependencies through parameters (the regression coefficients)?

4. Comparing linear with logistic regression(Taken from http://faculty.vassar.edu/lowry/logreg1.html)? The problem with linear regression is that a continuation of the line (for higher or lower values of x) would exceed the range of probabilities from 0-1. The sigmoid curve of the logistic curve makes sure that the outcome will always be within that range.

5. What's the difference? In normal linear regression, the relationship between the predictor(s) X and the continuous outcome Y is linear In logistic regression, there is no such linearity: the outcome is not continuous (but binary categorical), so the relation between X(s) and Y cannot be linear Binary outcome and binomial distribution: �When the response variable is binary (e.g. death or survival), then the probability distribution of the number of deaths in a sample of a particular size, for given values of the explanatory variables, is usually assumed to be binomial.�

6. An example: Dose of a drug and +/-successful treatmenttaken from: http://www.dtreg.com/logistic.htm Predictor variable: dosage of a drug (continuous)? Outcome variable: +/- successful treatment. The form of the logistic, sigmoid function yields a (quasi) binary outcome.

7. Logistic regression A logistic R gives us the probability P(Yi) that Y has occurred for individual i. P ranges from 0-1. We build a model from the data that allows us to estimate the P of the outcome variable Y, given the predictor(s) X. The maximum-likelihood estimation is used to select the best-fitting coefficients

8. The log(istic)-likelihood statistics In order to assess the fit of the model, we compare the observed and predicted values. The measure is the log-likelihood: N ?log-likelihood???? {Yi ln(P(Yi)) + (1-Yi) ln [1-P(Yi)]} i=1 In the log-likelihood statistics, the probabilities P for predicted and actual outcomes are compared: how much unexplained variance (residual sum of squares, RSS) is left? The log-likelihood is a measure of error in categorical models and compares to the Residual Sum of Squares in normal linear regression. ? The less the better, so we hope for small RSS.

9. Comparing the basic model to the log regression model As in linear regression, in log regression, we compare our log regression model to a baseline model. However, the baseline is not the mean (since for 2 categories such as +/- it has no meaning) but the more frequent category. Thus, we only include the constant (b0) in the baseline model. When more predictors are added, the improvement of the model can be assessed by subtracting the baseline model from the new model: ?2= 2[LL(New) � LL(Baseline)] k=# of predictors (df = knew � kbaseline) (always 1 for baseline)? LL=loglikelihood You multiply by 2 in order to produce a ?2 distribution (comparing two frequency values).The ?2 value is tested for significance.

10. Correlation coefficients R and R2 Normal regression R = partial correlation between the outcome variable (Y) and each of the predictor variables (X)? R2=amount of variance explained by the model Logistic regression R = same logic as in normal regression, however, R is dependent on 'Wald statistics', therefore NOT equivalent to R in normal regression R2L = kind of equivalent to R2, but determined differently

11. The Wald statistic(taken from http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1065119)? �Wald ?2 statistics are used to test the significance of individual coefficients in the model according to: Wald = (coefficient b/SE of coefficient b)2 Each Wald statistic is compared with a ?2 distribution with 1 df. Wald statistics are easy to calculate but their reliability is questionable, particularly for small samples. For data that produce large estimates of the coefficient, the standard error is often inflated, resulting in a lower Wald statistic, and therefore the explanatory variable may be incorrectly assumed to be unimportant in the model. Likelihood ratio tests (...) are generally considered to be superior�.

12. R and Hosmer & Lemeshow's R2 in logistic regression R = ? ? Wald � (2 x df)? -2LL(Original)? R2L = -2LL(Model)? -2LL(Original)? R2L (Hosmer & Lemeshow) is derived by dividing the model ?2 (based on the -2 LL) by the original -2LL (loglikelihood baseline model)? R2L shows how much the baseline model improves when new predictors are added. It ranges between 0 and 1.

13. The Hosmer-Lemeshow test(taken from http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1065119)? �The Hosmer�Lemeshow test is a commonly used test for assessing the goodness of fit of a model and allows for any number of explanatory variables, which may be continuous or categorical. The test is similar to a ?2 goodness of fit test and has the advantage of partitioning the observations into groups of approximately equal size, and therefore there are less likely to be groups with very low observed and expected frequencies. The observations are grouped into deciles (sets of 10, my addition) based on the predicted probabilities. The test statistic is calculated as above (see preceding slide, my addition) using the observed and expected counts for both events (my addition, e.g., �the deaths and survivals�), and has an approximate ?2 distribution with 8 (=10 - 2) degrees of freedom.�

14. Cox and Snell's R2CS Nagelkerke's R2N Instead of R2L SPSS produces R2CS which is based on the LL of the new model vs the LL of the original model and the sample size n: R2CS = 1 � e [-2/n (LL(New) � LL(Baseline))] Since the R2CS statistics never reaches the maximum of 1, Nagelkerke's R2N is used instead ;-)?

15. Different measures � similar meaning We have encountered 3 alternative measures of R2 in logistic regression: Hosmer & Lemeshow's R2L Cox & Snell's R2CS Nagelkerke's R2N �The R2 statistics do not measure the goodness of fit of the model but indicate how useful the explanatory variables are in predicting the response variable and can be referred to as measures of effect size.� (http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1065119)? They all correspond to R2 in normal regression and show the significance of the LL model

16. Assessing the b-coefficients Normal regression In normal regression, the b-coefficients of the predictors X were estimated, their SE computed, and a t-statistics told us whether these b-values were significant, i.e., if the predictors contributed substantially to the model Logistic regression In log regression, the analogous statistics is the Wald statistics which is based on the chi-square ?2 distribution. The ?2 test tells us whether the b-values are significant Wald = b/SEb (A ?2 test compares frequencies of categories, here, the number of occurrences and non-occurrences of Y)?. The ?2 distribution can be found in Appendix A4 (p 760) of Field's textbook.

17. The Exponent b Exp(B)? Exp(B) tells us the odds (the chance) of a change ? of Y occurring for person i if the predictor changes for 1 unit (i.e., from one category to the other)? ? like b-coefficient in normal regression odds = ____P(event: win)___ P(no event: no win)? P(event Y) = ________1_________ 1 + e -(b0+b1X1)? P (no eventY) =1 - P(eventY)

18. Exp(B) � change in odds - continued ?odds = odds after a unit change in the predictor original odds (before that change)? (For a dichotomous predictor, a unit change in the predictor amounts to switching from 0 to 1)? The proportionate change in odds is Exp(B)? If Exp(B)>1: when predictor increases 1 unit, the odds of the outcome occurring increases X units If Exp(B)<1: when predictor increases 1 unit, the odds of the outcome occurring decreases X units.

19. Methods of logistic regression Forced entry method (default): All predictors are entered at the same time (in the same block)? Stepwise methods: Forward: First, only b0 is entered, then, stepwise, the other predictors are entered, according to their significance, until significant predictors are exhausted Backward: All predictors are included, then removed (or retained) stepwise. (The Backward method is better at avoiding suppressor effects, i.e, when a predictor has an influence which does not show because it is interrelated with and therefore suppressed by another predictor)? Stepwise methods are recommended for exploratory purposes Forced entry is recommeded for theory testing

20. Example 1Logistic regression on SPSS(using display.sav)? Predictors (independent Var): Age (continuous)? ToM test: False Belief (categorical: Passing the 'display' test: +/-appropriate display)? Outcome (dependent Var): Possession of 'display' rules (categorical, +/-)?

21. Analyze --> Regression --> Binary Logistic

22. Options for Statistics and Plots ? run the analysis

23. Interpreting Logistic Regression The coding of the categorical variables

24. The initial model with only the constant b0

25. Should the other variables be included??YES

26. Predictor1: False Belief FB understanding � crosstabs and classification tableAnalyze --> Descriptive Statistics --> Cross-tabs (just specify FB for row and Display rule for column)?

27. Predictor 1: False Belief FB understanding � summary statistics

28. Predictor 1: False Belief FB understanding � Variables in equation Logit: natural logarithm of the odds of Y(outcome) occurring.

29. Various R2's

30. Exp(B) =Change in odds In order to know the change in odds resulting from a unit change in the predictor (FB-task), we 1. calculate the odds of a child having display rule understanding (outcome) without mastering the FB-task (original odds). 2. calculate the odds of a child having display rule understanding (outcome) given that she has false belief understanding (predictor)? ? odds = odds after a unit change in predictor original odds

31. Original odds P(event Y) = ________1__ 1 + e -(b0+b1X1)? = ________1______ =.2069 1 + e-[-1.3437+(2.7607 x 0)] P (no event Y) = 1 � P(event Y)? = 1� .2069 =.7931 odds = .2069 = .2609 .7931 ?The odds of showing the right display behavior while having NO understanding of false belief (X1=0) is .2609. This is the original chance.

32. Model odds

33. Proportionate Change of odds = Exp (B) von X1 ? odds = odds after a unit change in predictor original odds = 4.1256 = 15.5129 .2609 ? If you change the predictor (X1=FB) from 0 to 1, the odds raise 15.51 times that this child will show proper display rule behavior. ? The odds that a child who has FB under-standing also has proper display behavior is 15 times higher than the odds that a child who has NO FB understanding has proper display behavior.

34. Without the model... ??We do not have to add further variables

36. Summary The Model Summary with -2log-likelihood statistics shows the overall fit of the model If the ?2-statistics is <.05, the model is a significant improvement over the default model

37. Summary � continued In the Variables in the equations you can check the contribution of each variable. The Wald-statistics of the variables should be <.05, too. Exp(B) of a variable: if >1, then the odds of the outcome occurring increases; if <1, the odds decrease.

38. Predicted probabilitiesfor all 4 cases: (FB=0/1; displ beh=0/1)?

39. Interpreting residuals To assess points of poor fit of the model and To identify influencing cases We look at Studentized residuals (in display.sav as SRE))? Standardized residuals (in display.sav as ZRE)? Deviance statistics As in normal regression, 95% of all cases should fall within ?1.96 SD 99% of all cases should fall within ?2.58 SD Cases exceeding these boundaries are outliers

40. Residuals in the Case summaries

41. Example 2: predicting success in penalty shooting(using penalty.sav)? Problem: Predicting success or failure of a football player in penalty shooting

42. Block entry regression with 2 blocks 1. Enter the two known variables: pswq and previous as 1 block 2. Next enter the new variable anxious as the 2nd block

43. Block 0: the basic model of the constant The 0-constant model predicts that the player always scores (because this is the most frequent case). In 40 cases out of n=75 this is correct, since 40 players actually scored. In 35 out of 75 this is incorrect, since 35 players missed. Thus, the basic model is correct in 53.3%.

44. Block 0: the basic model of the constant The basic model with the constant does not yield a significant prediction...?

45. Block 1: 'pswq' und 'previous'

46. Block 1: pswq und previousVariables in the equation

47. Block 1: Classification plot

48. Determining R2 R2= model ?2 = 54.977 = .53 ____________ ______ original -2LL 103.6385 ? 53% of the variance can be accounted for by the model (block 1)?

49. Block 2 � adding 'anxious'

50. Block 2 � Variables in equation

51. Block 2 � classification plot

52. Testing Collinearity in Logistic Regression ? Collinearity: The two variables �previous� and �anxiety� are strongly linearly related. ? We have to use linear regression! Analyze ? Regression ? Linear Enter the variables in the same way as before, but tick only 'Collinearity' in the Statistics.

53. Collinearity statistics and diagnostics

54. Pearson correlationsAnalyze ? Correlate ? Bivariate

55. What to do in case of collinearity? Omitting one of the two collinear variables. Problem: Unclear which variable to drop! Statistics cannot help here any further Replacing one of the variables If several variables are involved in multi-collineaerity: Running a factor analysis to see how the variables load on these factors Acknowledge the inconclusiveness of the model

56. Prediction in logistic regression is not always causal ;-)? Continuous Predictor: size Binary outcome: gender

Chapter_6_Field_2005 Logistic Regression

Chapter_6_Field_2005 Logistic Regression

Presentation Transcript

Logistic regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic regression

Logistic Regression