1 / 35

What you've always wanted to know about logistic regression analysis, but were afraid to ask...

What you've always wanted to know about logistic regression analysis, but were afraid to ask. Februari, 1 2010 Gerrit Rooks Sociology of Innovation Innovation Sciences & Industrial Engineering Phone: 5509 email: g.rooks@tue.nl. This Lecture. Why logistic regression analysis ?

anisa
Download Presentation

What you've always wanted to know about logistic regression analysis, but were afraid to ask...

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation Innovation Sciences & Industrial Engineering Phone: 5509 email: g.rooks@tue.nl

  2. ThisLecture • Whylogisticregressionanalysis? • The logisticregression model • Estimation • Goodness of fit • Anexample

  3. What's the difference between 'normal' regression and logistic regression? Regression analysis: • Relate one or more independent (predictor) variables to a dependent (outcome) variable

  4. What's the difference between 'normal' regression and logistic regression? • Often you will be confronted with outcome variables that are dichotomic: • success vs failure • employed vs unemployed • promoted or not • sick or healthy • pass or fail an exam

  5. ExampleRelationship between hours studied for exam and success

  6. Linear regression analysisWhy is this wrong?

  7. Logistic RegressionThe better alternative

  8. The logistic regression equationpredicting probabilities predicted probability (always between 0 and 1) similar to regression analysis

  9. The Logistic functionSometimes authors rearrange the model or also

  10. Parameters are estimated by `fitting' models, based on the available predictors, to the observed data The chosen model fits the data best, i.e. is closest to the data Fit is determined by the so-called log likelihood statistic How do we estimate coefficients?Maximum-likelihood estimation

  11. Maximum likelihood estimationThe log-likelihood statistic Large values of LL indicate poor fit of the modelHOWEVER, THIS STATISTIC CANNOT BE USED TO EVALUATE THE FIT OF A SINGLE MODEL

  12. An example to illustrate maximum likelihood and the log likelihood statistic Suppose we know hours spent studying and the outcome of an exam

  13. In ML different values for the parameters are `tried' Lets look at two possibilities: 1; b0 = 0 & b1= 0.05; 2, b0 = 0 & b1= 0.05

  14. We are now able to calculate the log likelihood statistic

  15. Two models and their log likelihood statistic Based on a clever algorithm the model with the best fit (LL closest to 0) is chosen

  16. Obviously SPSS does all the work for you How to interpret output of SPSS Two major issues Overall model fit Between model comparisons Pseudo R-square Predictive accuracy / classification test Coefficients Wald test Likelihood ratio test Odds ratios After estimationHow do I determine significance?

  17. Model fit: Between model comparison The log-likelihood ratio test statistic can be used to test the fit of a model Model fit full model Model fit reduced model The test statistic has a chi-square distribution

  18. Model fit The log-likelihood ratio test statistic can be used to test the fit of a model Model fit full model Model fit reduced model

  19. Between model comparison • Estimate a null model • Baseline model • Estimate an improved model • This model contains more variables • Assess the difference in -2LL between the models • This difference follows a chi-square distribution • degrees of freedom = # estimated parameters in proposed model – # estimated parameters in null model Model fit full model Model fit reduced model 20

  20. Overall model fitR and R2 SS due to regression Total SS R2 in multiple regression is a measure of the variance explained by the model

  21. Just like in multiple regression, logit R2 ranges 0.0 to 1.0 Cox and Snell cannot theoretically reach 1 Nagelkerke adjusted so that it can reach 1 Overall model fitpseudo R2 log-likelihood of the model that you want to test log-likelihood of model before any predictors were entered NOTE: R2 in logistic regression tends to be (even) smaller than in multiple regression

  22. What is a small or large R and R2?Strength of correlation

  23. Overall model fitClassification table How well does the model predict outcomes? spssoutput This means that we assume that if our model predicts that a player will score with a probability of .51 (above .5) the prediction will be a score (lower than .50 is a miss).

  24. Testing significance of coefficientsThe Wald statistic: not really good • In linear regression analysis this statistic is used to test significance • In logistic regression something similar exists • however, when b is large, standard error tends to become inflated, hence underestimation (Type II errors are more likely) estimate t-distribution standard error of estimate

  25. Likelihood ratio testan alternative way to test significance of a coefficient To avoid type II errors for some variables you best use the Likelihood ratio test model with variable model without variable

  26. Before we go to the exampleA recap • Logisticregression • dichotomousoutcome • logisticfunction • log-likelihood / maximum likelihood • Model fit • likelihood ratio test (compare LL of models) • PseudoR-square • Classificationtable • Wald test

  27. Illustration with SPSS • Penalty kicks data, variables: • Scored: outcome variable, • 0 = penalty missed, and 1 = penalty scored • Pswq: degree to which a player worries • Previous: percentage of penalties scored by a particulare player in their career

  28. SPSS OUTPUT Logistic Regression Tells you something about the number of observations and missings

  29. Block 0: Beginning Block this table is based on the empty model, i.e. only the constant in the model these variables will be entered in the model later on

  30. Block is useful to check significance of individual coefficients, see Field Block 1: Method = Enter this is the test statistic Note: Nagelkerke is larger than Cox after dividing by -2 New model

  31. Block 1: Method = Enter (Continued) Predictive accuracy has improved (was 53%) significance based on Wald statistic estimates change in odds standard error estimates

  32. How is the classification table constructed? oops wrong prediction oops wrong prediction

  33. How is the classification table constructed?

  34. How is the classification table constructed?

More Related