1 / 23

D/RS 1013

D/RS 1013. Logistic Regression. Some Questions. Do children have a better chance of surviving a severe illness than adults? Can income, credit history, & education distinguish those who will repay a loan from those who will not?

siusan
Download Presentation

D/RS 1013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. D/RS 1013 Logistic Regression

  2. Some Questions • Do children have a better chance of surviving a severe illness than adults? • Can income, credit history, & education distinguish those who will repay a loan from those who will not? • Are clients with high scores on a personality test more likely to respond to psychotherapy than clients with low scores? • Can scores on a math pretest predict who will pass or fail a course?

  3. Answering these questions • Linear regression? • Why not? • Logistic regression answers same questions as discriminant, without assumptions about the data

  4. Logistic regression • expect a nonlinear relationship • s shaped (sigmoidal) curve • curve never below zero or above 1 • predicted values interpreted as probability of group membership

  5. Logistic Curve • math data, scores of1-11 on pretest,fail = 0pass = 1

  6. Residuals • generally small, largest in middle of curve. • actual value-predicted value • pretest score of 5, who passed the test, • 1(actual value) - .21(predicted value) = .79(residual or estimation error). • two possible residual values for each value of predictor

  7. Different Shapes and Directions

  8. Negative Curve

  9. Assumptions • outcomes on the DV are mutually exclusive and exhaustive • sample size recommendations range from 10-50 cases per IV • "too small sample" can lead to: • extremely high parameter estimates and standard errors • failure to converge

  10. Assumptions (cont.) • either increase cases or decrease predictors • large samples required for maximum likelihood estimation

  11. Testing the Overall Model • "constant only" model • no IVs entered • first -2 log likelihood • full model • all IVs entered • second -2 log likelihood • difference is the overall "model" Chi-square, if p<.05, the model provides classification power

  12. Coefficients and Testing • natural log of the odds ratio associated with the variable • convert to odds by raising “e” to the B power • significance of each is tested via the associated Wald statistic • similar to t used to test coefficients in linear regression, p < .05 indicates that the coefficient is not zero

  13. Coefficient Interpretation • interpret odds ratios, not actual coefficients, sign of the B coefficients gives us information • positive B coefficient: odds increase as predictor increases • negative B coefficient: odds decrease as predictor increases

  14. Coefficient Interpretation (cont.) • take exp(B) converts coefficient to odds • change in odds associated with one unit increase in predictor • to see change with two unit increase in predictor • would multiply B by 2 prior to raising e to that power • would calculate e(2Bi)

  15. The Logistic Model • where: Ŷi = estimated probability • u = A + BX (in our math example) • or more generally (multiple predictors) • u = A +B1X1+B2X2+…+BkXk (k=# predictors)

  16. Applying the Model • math data, constant and intercept found to be: • A=-14.79 and B=2.69, • pretest score of 5, we want to find the probability of passing

  17. Converting to Odds • p(target)/p(other) = • .2075/.7925 = .2618

  18. Applying the Model (cont.) • Pretest score of 7, u = -14.79 + 2.69(7) = 4.04 • odds are .9827/.0173 = 56.8263

  19. Crosschecking • 56.8263/.2618 = 217.03, which not coincidentally equals (within rounding error): • e2(2.69) = e5.38 = 217.022, since we moved 2 units-multiply B by 2 prior to finding exp(B).

  20. Confidence Intervals for Coefficients • odds ratios for coefficients presented with 95% confidence intervals • if one is in the CI, coefficient is not statistically significant at the .05 level

  21. Classification Table • Same idea as classification results (confusion matrix) in discriminant analysis. • Overall % accuracy=N(on diagonal)/total N • Sensitivity - % of target group accurately classified • Specificity - % of "other group" correctly classified

  22. Final Points • general procedure • fit the model • remove ns predictors • rerun • reporting only significant predictors • cross-validation • generate/modify model with half-test the classification with other half

More Related