1 / 29

Lecture 7: Modeling for prediction

Lecture 7: Modeling for prediction. Draper & Smith (1981) " The screening of variables should never be left to the soul discretion of any statistical procedure ". Hosmer & Lemeshow (2000) “ The analyst, not the computer, is ultimately responsible for the review and evaluation of the model .“

ziven
Download Presentation

Lecture 7: Modeling for prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 7: Modeling for prediction Draper & Smith (1981) "The screening of variables should never be left to the soul discretion of any statistical procedure". Hosmer & Lemeshow (2000) “The analyst, not the computer, is ultimately responsible for the review and evaluation of the model.“ Analysis should commence with exploration of univariate associations. Analogous to the case of linear regression, there are a number of different approaches to variable selection, once the form of the "full" model has been determined. Biost 536 Thompson Part 3

  2. Example Tibial fractures are prone to delayed union or non-union, especially those occurring in the lower third of the bone. This study aimed to develop a model to predict the probability of nonunion (within six months) of tibial fractures. Variables: Nonunion (0=no, 1=yes), Age (yrs), race (0=non-African, 1=African), displacement (0=none/minimal, 1=half/full-diameter), associated injuries (0=none, 1=some), associated medical conditions, e.g. epilepsy (0=no, 1=yes), intoxicated at time of injury (0=no, 1=yes), time to walking with a crutch (wks) 191 individuals in the study – reserve 86 of these subjects for validation. Exploratory data analysis should precede analysis! . tab nonu race, col | race nonu | 0 1 | Total -----------+----------------------+---------- 0 | 29 27 | 56 | 45.31 65.85 | 53.33 -----------+----------------------+---------- 1 | 35 14 | 49 | 54.69 34.15 | 46.67 -----------+----------------------+---------- Total | 64 41 | 105 Biost 536 Thompson Part 3

  3. . tab nonu inj, col | inj nonu | 0 1 | Total -----------+----------------------+---------- 0 | 52 4 | 56 | 53.61 50.00 | 53.33 -----------+----------------------+---------- 1 | 45 4 | 49 | 46.39 50.00 | 46.67 -----------+----------------------+---------- Total | 97 8 | 105 | 100.00 100.00 | 100.00 . tab nonu med, col | med nonu | 0 1 | Total -----------+----------------------+---------- 0 | 53 3 | 56 | 53.00 60.00 | 53.33 -----------+----------------------+---------- 1 | 47 2 | 49 | 47.00 40.00 | 46.67 -----------+----------------------+---------- Total | 100 5 | 105 | 100.00 100.00 | 100.00 . tab nonu alc, col | alc nonu | 0 1 | Total -----------+----------------------+---------- 0 | 33 23 | 56 | 54.10 52.27 | 53.33 -----------+----------------------+---------- 1 | 28 21 | 49 | 45.90 47.73 | 46.67 -----------+----------------------+---------- Total | 61 44 | 105 | 100.00 100.00 | 100.00 Biost 536 Thompson Part 3

  4. . summ age crutch Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age | 105 33.74286 14.17814 12 86 crutch | 105 4.52381 4.03374 0 20 . lowess nonu age, gen(nage) . twoway (scatter nonu age) (line nage age, sort), legend(off) scheme(s1mono) ytitle(Probability of nonunion) xtitle(age (years)) Biost 536 Thompson Part 3

  5. . fracpoly logistic nonu age race disp inj med alc crutch -> gen double Icrut__1 = crutch-4.523809524 if e(sample) ........ -> gen double Iage__1 = X^3-38.41895606 if e(sample) -> gen double Iage__2 = X^3*ln(X)-46.72450666 if e(sample) (where: X = age/10) Log likelihood = -58.743122 Pseudo R2 = 0.1903 ------------------------------------------------------------------------------ nonu | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Iage__1 | 1.070328 .0310319 2.34 0.019 1.011202 1.13291 Iage__2 | .9698842 .0133055 -2.23 0.026 .9441533 .9963163 race | .2344612 .1207148 -2.82 0.005 .0854714 .6431629 disp | 4.188472 2.166944 2.77 0.006 1.519423 11.54602 inj | 2.832849 2.541565 1.16 0.246 .4881414 16.43998 med | .4261364 .4768636 -0.76 0.446 .0475358 3.82012 alc | .4591479 .2475397 -1.44 0.149 .1596047 1.320868 Icrut__1 | 1.134159 .076004 1.88 0.060 .9945625 1.29335 ------------------------------------------------------------------------------ Deviance: 117.49. Best powers of age among 44 models fit: 3 3. . fracplot, ciopts(lcolor(dkgreen) lwidth(medthick)) Biost 536 Thompson Part 3

  6. 1 .8 .6 Probability of nonunion .4 .2 0 0 5 10 15 20 Time to crutch walking (weeks) . lowess nonu crutch, gen(ncrtch) . twoway (scatter nonu crutch) (line ncrtch crutch, sort), legend(off) scheme(s1mono) ytitle(Probability of nonunion) xtitle(Time to crutch walking (weeks)) Biost 536 Thompson Part 3

  7. . fracpoly logistic nonu crutch race disp inj med alc age -> gen double Iage__1 = age-33.74285714 if e(sample) ........ -> gen double Icrut__1 = X^-2-3.277348395 if e(sample) -> gen double Icrut__2 = X^-2*ln(X)+1.9451631 if e(sample) (where: X = (crutch+1)/10) Log likelihood = -56.283737 Pseudo R2 = 0.2242 ------------------------------------------------------------------------------ nonu | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Icrut__1 | .7282482 .0783032 -2.95 0.003 .5898696 .8990894 Icrut__2 | .8764888 .0404932 -2.85 0.004 .8006107 .9595583 race | .2353746 .1210225 -2.81 0.005 .0859206 .6447949 disp | 3.632398 1.85468 2.53 0.012 1.335288 9.881246 inj | 2.10494 1.951578 0.80 0.422 .3420229 12.95461 med | .5656956 .6921998 -0.47 0.642 .051408 6.224938 alc | .4084044 .2203485 -1.66 0.097 .1418526 1.175827 Iage__1 | 1.038973 .0205459 1.93 0.053 .9994743 1.080033 ------------------------------------------------------------------------------ Deviance: 112.57. Best powers of crutch among 44 models fit: -2 -2. . fracplot, ciopts(lcolor(dkgreen) lwidth(medthick)) Biost 536 Thompson Part 3

  8. Stepwise selection algorithms • Step 0: • Fit a model with the intercept only and evaluate the likelihood, L0. • Fit each of the k possible univariate models, evaluate their likelihoods, Lj0, j=1,2,..,k and carry out the LRTs comparing L0 and Lj0. • The variable (say the 1st) with smallest p-value is included in the model, provided this p-value is less than some pre-specified pE. • Step 1: • All k-1 models containing the intercept, the 1st variable and one of the remaining variables are fitted. • The log-likelihoods are compared with those from the model containing just the intercept and the 1st variable. • Say the 2nd variable has the smallest LRT p-value, p2. It is then included in the model, provided p2 < pE. Biost 536 Thompson Part 3

  9. Step 2: • Carry out a LRT to assess whether the 1st variable, given the presence of the 2nd variable, can be dropped from the model. • Compare the p-value from the LRT with a pre-specified p-value pR. If p<pR, retain the 1st variable and continue. • All k-2 models containing the intercept, the first two variables and one of the remaining variables are fitted. • The log-likelihoods are compared with those from the model containing just the intercept and the first two variables. • Say the 3rd variable has the smallest LRT p-value, p3. It is then included in the model, provided p3< pE. ……. • Step S: • All k variables have been included in the model or all variables in the model have p-values less than pR and all variables not in the model have p-values greater than pE. The same principles can be applied, working backward from the "full" model with all k variables. Biost 536 Thompson Part 3

  10. Tibial fracture example continued . logit nonu age race disp inj med alc Icrut__1 Icrut__2 Iteration 0: log likelihood = -72.546947 Iteration 1: log likelihood = -56.942279 Iteration 2: log likelihood = -56.292889 Iteration 3: log likelihood = -56.283739 Iteration 4: log likelihood = -56.283737 Logistic regression Number of obs = 105 LR chi2(8) = 32.53 Prob > chi2 = 0.0001 Log likelihood = -56.283737 Pseudo R2 = 0.2242 ------------------------------------------------------------------------------ nonu | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0382327 .0197752 1.93 0.053 -.0005259 .0769913 race | -1.446577 .5141697 -2.81 0.005 -2.454331 -.4388231 disp | 1.289893 .5105939 2.53 0.012 .2891473 2.290639 inj | .7442871 .9271418 0.80 0.422 -1.072877 2.561452 med | -.5696992 1.223626 -0.47 0.642 -2.967962 1.828563 alc | -.8954975 .539535 -1.66 0.097 -1.952967 .1619717 Icrut__1 | -.3171133 .1075226 -2.95 0.003 -.5278538 -.1063728 Icrut__2 | -.1318313 .0461994 -2.85 0.004 -.2223804 -.0412822 _cons | -.8352537 .8876475 -0.94 0.347 -2.575011 .9045035 ------------------------------------------------------------------------------ Biost 536 Thompson Part 3

  11. Tibial fracture example . sw logistic nonu age race disp inj med alc (Icrut__1 Icrut__2), forw pe(.15) pr(.2) lr LR test begin with empty model p = 0.0006 < 0.1500 adding Icrut__1 Icrut__2 p = 0.0148 < 0.1500 adding race p = 0.0213 < 0.1500 adding disp p = 0.1070 < 0.1500 adding age p = 0.0827 < 0.1500 adding alc Logistic regression Number of obs = 105 LR chi2(6) = 31.79 Prob > chi2 = 0.0000 Log likelihood = -56.653938 Pseudo R2 = 0.2191 ------------------------------------------------------------------------------ nonu | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Icrut__1 | .713378 .0749687 -3.21 0.001 .5805868 .8765412 Icrut__2 | .8684253 .0391119 -3.13 0.002 .7950533 .9485684 race | .2453815 .1248113 -2.76 0.006 .0905493 .6649646 disp | 3.424641 1.720637 2.45 0.014 1.279226 9.168173 age | 1.033878 .0193218 1.78 0.075 .9966934 1.07245 alc | .405238 .2177009 -1.68 0.093 .1413936 1.161423 ------------------------------------------------------------------------------ . sw logistic nonu age race disp inj med alc (Icrut__1 Icrut__2), pe(.15) pr(.2) lr LR test begin with full model p = 0.6337 >= 0.2000 removing med p = 0.4737 >= 0.2000 removing inj Log likelihood = -56.653938 ------------------------------------------------------------------------------ nonu | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 1.033878 .0193218 1.78 0.075 .9966934 1.07245 race | .2453815 .1248113 -2.76 0.006 .0905493 .6649646 disp | 3.424641 1.720637 2.45 0.014 1.279226 9.168173 alc | .405238 .2177009 -1.68 0.093 .1413936 1.161423 Icrut__1 | .713378 .0749687 -3.21 0.001 .5805868 .8765412 Icrut__2 | .8684253 .0391119 -3.13 0.002 .7950533 .9485684 ------------------------------------------------------------------------------ Biost 536 Thompson Part 3

  12. All subsets • All possible different models may be compared based on a formal, explicit criterion. For instance: • Akaike's information criterion (AIC) = -2l +2k where k is the number of parameters estimated. • Mallows Ck (H&L Section 4.4) • Deviance (where appropriate). • If the number of variables is large, specialized software is needed • Hosmer DW, Jovanovic B, Lemeshow S (1989). Best subsets logistic regression. Biometrics 45: 1265-1270. Biost 536 Thompson Part 3

  13. . logsub nonu age race disp inj med alc Icrut__1 Icrut__2 Logistic Regression Subsets Response variable: nonu Cq chi2 df 8 predictor model, no variables removed 9.00 0.00 0 age race disp inj med alc Icrut__1 Icrut__2 Cq chi2 df p 7 predictors, 1 variable removed 10.98 3.74 1 0.053 age 15.43 7.92 1 0.005 race 13.80 6.38 1 0.012 disp 7.69 0.64 1 0.422 inj 7.23 0.22 1 0.642 med 9.93 2.75 1 0.097 alc 16.27 8.70 1 0.003 Icrut__1 15.67 8.14 1 0.004 Icrut__2 Cq chi2 df p 6 predictors, 2 variables removed 16.16 10.48 2 0.005 age race 13.42 7.91 2 0.019 age disp 9.06 3.81 2 0.149 age inj 8.98 3.74 2 0.154 age med 10.59 5.25 2 0.072 age alc 20.14 14.21 2 0.001 age Icrut__1 19.64 13.74 2 0.001 age Icrut__2 17.66 11.89 2 0.003 race disp 13.72 8.19 2 0.017 race inj 13.46 7.94 2 0.019 race med 14.44 8.86 2 0.012 race alc 19.87 13.96 2 0.001 race Icrut__1 19.51 13.62 2 0.001 race Icrut__2 11.92 6.50 2 0.039 disp inj 11.97 6.54 2 0.038 disp med 14.63 9.04 2 0.011 disp alc 23.33 17.20 2 0.000 disp Icrut__1 22.66 16.58 2 0.000 disp Icrut__2 5.77 0.72 2 0.698 inj med 8.61 3.39 2 0.184 inj alc 16.07 10.39 2 0.006 inj Icrut__1 15.51 9.87 2 0.007 inj Icrut__2 Biost 536 Thompson Part 3

  14. Cq chi2 df 6 predictors, 2 variables removed 8.14 2.95 2 0.229 med alc 15.19 9.56 2 0.008 med Icrut__1 14.60 9.01 2 0.011 med Icrut__2 14.62 9.03 2 0.011 alc Icrut__1 14.11 8.55 2 0.014 alc Icrut__2 15.24 9.61 2 0.008 Icrut__1 Icrut__2 Cq chi2 df p 5 predictors, 3 variables removed 16.91 13.06 3 0.005 age race disp 14.17 10.49 3 0.015 age race inj 14.19 10.51 3 0.015 age race med 14.59 10.89 3 0.012 age race alc 22.23 18.05 3 0.000 age race Icrut__1 21.91 17.75 3 0.000 age race Icrut__2 11.42 7.91 3 0.048 age disp inj 11.44 7.92 3 0.048 age disp med 13.41 9.77 3 0.021 age disp alc 23.98 19.69 3 0.000 age disp Icrut__1 23.39 19.15 3 0.000 age disp Icrut__2 7.06 3.81 3 0.283 age inj med 8.73 5.37 3 0.146 age inj alc 18.67 14.71 3 0.002 age inj Icrut__1 18.18 14.26 3 0.003 age inj Icrut__2 8.60 5.26 3 0.154 age med alc 18.38 14.44 3 0.002 age med Icrut__1 17.88 13.97 3 0.003 age med Icrut__2 18.18 14.25 3 0.003 age alc Icrut__1 17.66 13.76 3 0.003 age alc Icrut__2 etc etc Biost 536 Thompson Part 3

  15. Best subsets summary . logistic nonu age race disp alc Icrut__1 Icrut__2 if group==0 Logistic regression Number of obs = 105 LR chi2(6) = 31.79 Prob > chi2 = 0.0000 Log likelihood = -56.653938 Pseudo R2 = 0.2191 ------------------------------------------------------------------------------ nonu | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 1.033878 .0193218 1.78 0.075 .9966934 1.07245 race | .2453815 .1248113 -2.76 0.006 .0905493 .6649646 disp | 3.424641 1.720637 2.45 0.014 1.279226 9.168173 alc | .405238 .2177009 -1.68 0.093 .1413936 1.161423 Icrut__1 | .713378 .0749687 -3.21 0.001 .5805868 .8765412 Icrut__2 | .8684253 .0391119 -3.13 0.002 .7950533 .9485684 ------------------------------------------------------------------------------ Pause and consider the scientific / biological plausibility of the model Biost 536 Thompson Part 3

  16. How well does this model predict nonunion? . predict pnonu, p . list nonu age race disp alc crutch pnonu if _n<=20, noobs +----------------------------------------------------+ | nonu age race disp alc crutch pnonu | |----------------------------------------------------| | 1 30 0 1 0 0 .7627636 | | 0 46 1 0 0 5 .4056404 | | 0 20 0 1 0 2 .56468 | | 0 35 1 1 0 2 .3441154 | | 0 37 1 0 0 4 .2882892 | |----------------------------------------------------| | 0 31 1 1 1 9 .4616252 | | 1 34 0 1 1 3 .5952806 | | 1 30 0 1 0 1 .4234816 | | 1 23 0 1 0 5 .8157244 | | 0 44 0 0 0 1 .2548225 | |----------------------------------------------------| | 0 50 0 1 1 5 .8151621 | | 0 23 0 1 1 6 .6760721 | | 1 36 0 1 0 4 .8453948 | | 0 27 0 0 0 1 .1625406 | | 1 22 0 1 0 4 .7742515 | |----------------------------------------------------| | 1 47 0 0 1 2 .2739798 | | 1 37 0 1 1 2 .4808351 | | 1 54 0 1 1 8 .8761599 | | 0 46 1 1 1 12 .61348 | | 0 61 1 1 0 0 .6890706 | +----------------------------------------------------+ Biost 536 Thompson Part 3

  17. . estat classification if group==0 Logistic model for nonu -------- True -------- Classified | D ~D | Total -----------+--------------------------+----------- + | 33 17 | 50 - | 16 39 | 55 -----------+--------------------------+----------- Total | 49 56 | 105 Classified + if predicted Pr(D) >= .5 True D defined as nonu != 0 -------------------------------------------------- Sensitivity Pr( +| D) 67.35% Specificity Pr( -|~D) 69.64% Positive predictive value Pr( D| +) 66.00% Negative predictive value Pr(~D| -) 70.91% -------------------------------------------------- False + rate for true ~D Pr( +|~D) 30.36% False - rate for true D Pr( -| D) 32.65% False + rate for classified + Pr(~D| +) 34.00% False - rate for classified - Pr( D| -) 29.09% -------------------------------------------------- Correctly classified 68.57% -------------------------------------------------- . estat classification if group==0, cutoff(.4) Logistic model for nonu -------- True -------- Classified | D ~D | Total -----------+--------------------------+----------- + | 40 21 | 61 - | 9 35 | 44 -----------+--------------------------+----------- Total | 49 56 | 105 Classified + if predicted Pr(D) >= .4 True D defined as nonu != 0 -------------------------------------------------- Sensitivity Pr( +| D) 81.63% Specificity Pr( -|~D) 62.50% Positive predictive value Pr( D| +) 65.57% Negative predictive value Pr(~D| -) 79.55% -------------------------------------------------- False + rate for true ~D Pr( +|~D) 37.50% False - rate for true D Pr( -| D) 18.37% False + rate for classified + Pr(~D| +) 34.43% False - rate for classified - Pr( D| -) 20.45% -------------------------------------------------- Correctly classified 71.43% -------------------------------------------------- Biost 536 Thompson Part 3

  18. .lroc if group==0 . lsens if group==0 Biost 536 Thompson Part 3

  19. “Testing the procedure on the data that gave it birth is almost certain to overestimate performance, for the optimizing process that chose it from among many possible procedures will have made the greatest use possible of any and all idiosyncrancies of those particular data … As a result, the procedure will likely work better for these data than for almost any other data that will arise in practice.” Mosteller F, Tukey JW (1977) Data analysis and regression. Biost 536 Thompson Part 3

  20. Using the validation sample . estat classification if group==1 Logistic model for nonu -------- True -------- Classified | D ~D | Total -----------+--------------------------+----------- + | 18 26 | 44 - | 11 31 | 42 -----------+--------------------------+----------- Total | 29 57 | 86 Classified + if predicted Pr(D) >= .5 True D defined as nonu != 0 -------------------------------------------------- Sensitivity Pr( +| D) 62.07% Specificity Pr( -|~D) 54.39% Positive predictive value Pr( D| +) 40.91% Negative predictive value Pr(~D| -) 73.81% -------------------------------------------------- False + rate for true ~D Pr( +|~D) 45.61% False - rate for true D Pr( -| D) 37.93% False + rate for classified + Pr(~D| +) 59.09% False - rate for classified - Pr( D| -) 26.19% -------------------------------------------------- Correctly classified 56.98% -------------------------------------------------- . lroc if group==1 . roctab nonu pnonu if group==1 ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval] -------------------------------------------------------- 86 0.6400 0.0650 0.51268 0.76742 . Biost 536 Thompson Part 3

  21. Useful references Lee K, Koval JJ (1997). Determination of the best significance level in forward stepwise logistic regression. Communications in Statistics 26: 559-575. Raftery AE, Madigan D, Hoeting JA (1997). Bayesian model averaging for linear regression models. Journal of the American Statistical Association92: 179-191. Hoeting JA, Madigan D, Raftery AE (1999). Bayesian model averaging: a tutorial. Statistical Science 14: 382-401. McIntosh, MW, Pepe MS (2002) Combining several screening tests: optimality of the risk score. Biometrics 53: 657-664. Biost 536 Thompson Part 3

  22. Biost 536 Thompson Part 3

  23. Assessing model fit Biost 536 Thompson Part 3

  24. Biost 536 Thompson Part 3

  25. Overall assessment of fit Biost 536 Thompson Part 3

  26. Example: esophageal cancer case-control study . xi: logistic case i.age i.alcohol i.tobacco i.age _Iage_1-6 (naturally coded; _Iage_1 omitted) i.alcohol _Ialcohol_1-4 (naturally coded; _Ialcohol_1 omitted) i.tobacco _Itobacco_1-4 (naturally coded; _Itobacco_1 omitted) Logistic regression Number of obs = 975 LR chi2(11) = 285.62 Prob > chi2 = 0.0000 Log likelihood = -351.93592 Pseudo R2 = 0.2887 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Iage_2 | 7.249153 8.003167 1.79 0.073 .8328154 63.09948 _Iage_3 | 43.65363 46.62157 3.54 0.000 5.38204 354.0738 _Iage_4 | 76.33883 81.30049 4.07 0.000 9.467161 615.5611 _Iage_5 | 133.808 144.0209 4.55 0.000 16.22978 1103.193 _Iage_6 | 124.7787 139.9078 4.30 0.000 13.85905 1123.434 _Ialcohol_2 | 4.198086 1.049782 5.74 0.000 2.571568 6.853377 _Ialcohol_3 | 7.24794 2.063937 6.96 0.000 4.147867 12.66497 _Ialcohol_4 | 36.70338 14.13218 9.36 0.000 17.25685 78.06397 _Itobacco_2 | 1.549686 .3538287 1.92 0.055 .9905925 2.424334 _Itobacco_3 | 1.669657 .4557781 1.88 0.060 .9778419 2.850925 _Itobacco_4 | 5.160313 1.775732 4.77 0.000 2.628853 10.12945 ------------------------------------------------------------------------------ Biost 536 Thompson Part 3

  27. . lfit, group(10) table Logistic model for case, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) +--------------------------------------------------------+ | Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total | |-------+--------+-------+-------+-------+-------+-------| | 1 | 0.0070 | 0 | 0.3 | 99 | 98.7 | 99 | | 2 | 0.0299 | 1 | 1.9 | 125 | 124.1 | 126 | | 3 | 0.0456 | 4 | 3.4 | 76 | 76.6 | 80 | | 4 | 0.0717 | 4 | 6.7 | 100 | 97.3 | 104 | | 5 | 0.1193 | 12 | 12.1 | 96 | 95.9 | 108 | |-------+--------+-------+-------+-------+-------+-------| | 6 | 0.1790 | 12 | 10.8 | 56 | 57.2 | 68 | | 7 | 0.2450 | 25 | 24.8 | 82 | 82.2 | 107 | | 8 | 0.3590 | 34 | 31.4 | 59 | 61.6 | 93 | | 9 | 0.4891 | 44 | 39.9 | 49 | 53.1 | 93 | | 10 | 0.9625 | 64 | 68.7 | 33 | 28.3 | 97 | +--------------------------------------------------------+ number of observations = 975 number of groups = 10 Hosmer-Lemeshow chi2(8) = 4.31 Prob > chi2 = 0.8284 No evidence of lack of fit Biost 536 Thompson Part 3

  28. Define residuals Biost 536 Thompson Part 3

  29. Summary measures of goodness of fit Pearson lack of fit test in the esophageal cancer study: . lfit Logistic model for case, goodness-of-fit test number of observations = 975 number of covariate patterns = 88 Pearson chi2(76) = 86.56 Prob > chi2 = 0.1913 Biost 536 Thompson Part 3

More Related