lecture 15 logistic regression inference and link functions
Download
Skip this Video
Download Presentation
Lecture 15: Logistic Regression: Inference and link functions

Loading in 2 Seconds...

play fullscreen
1 / 23

Lecture 15: Logistic Regression: Inference and link functions - PowerPoint PPT Presentation


  • 54 Views
  • Uploaded on

Lecture 15: Logistic Regression: Inference and link functions. BMTRY 701 Biostatistical Methods II. More on our example. > pros5.reg <- glm(cap.inv ~ log(psa) + gleason, family=binomial) > summary(pros5.reg) Call: glm(formula = cap.inv ~ log(psa) + gleason, family = binomial)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Lecture 15: Logistic Regression: Inference and link functions' - mili


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
lecture 15 logistic regression inference and link functions

Lecture 15:Logistic Regression: Inference and link functions

BMTRY 701Biostatistical Methods II

more on our example
More on our example

> pros5.reg <- glm(cap.inv ~ log(psa) + gleason, family=binomial)

> summary(pros5.reg)

Call:

glm(formula = cap.inv ~ log(psa) + gleason, family = binomial)

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -8.1061 0.9916 -8.174 2.97e-16 ***

log(psa) 0.4812 0.1448 3.323 0.000892 ***

gleason 1.0229 0.1595 6.412 1.43e-10 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 512.29 on 379 degrees of freedom

Residual deviance: 403.90 on 377 degrees of freedom

AIC: 409.9

what is a good multiple regression model
What is a good multiple regression model?
  • Principles of model building are analogous to linear regression
  • We use the same approach
    • Look for significant covariates in simple models
    • consider multicollinearity
    • look for confounding (i.e. change in betas when a covariate is removed)
multiple regression model proposal
Multiple regression model proposal
  • Gleason, logPSA, Volume, Digital Exam result, detection in RE
  • But, what about collinearity? 5 choose 2 pairs.

gleason log.psa. vol

gleason 1.00 0.46 -0.06

log.psa. 0.46 1.00 0.05

vol -0.06 0.05 1.00

categorical pairs
Categorical pairs

> dpros.dcaps <- epitab(dpros, dcaps)

> dpros.dcaps$tab

Outcome

Predictor 1 p0 2 p1 oddsratio lower upper

1 95 0.2802360 4 0.09756098 1.000000 NA NA

2 123 0.3628319 9 0.21951220 1.737805 0.5193327 5.815089

3 84 0.2477876 12 0.29268293 3.392857 1.0540422 10.921270

4 37 0.1091445 16 0.39024390 10.270270 3.2208157 32.748987

Outcome

Predictor p.value

1 NA

2 4.050642e-01

3 3.777900e-02

4 1.271225e-05

> fisher.test(table(dpros, dcaps))

Fisher\'s Exact Test for Count Data

data: table(dpros, dcaps)

p-value = 2.520e-05

alternative hypothesis: two.sided

categorical vs continuous
Categorical vs. continuous
  • t-tests and anova: means by category

> summary(lm(log(psa)~dcaps))

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.2506 0.1877 6.662 9.55e-11 ***

dcaps 0.8647 0.1632 5.300 1.97e-07 ***

---

Residual standard error: 0.9868 on 378 degrees of freedom

Multiple R-squared: 0.06917, Adjusted R-squared: 0.06671

F-statistic: 28.09 on 1 and 378 DF, p-value: 1.974e-07

> summary(lm(log(psa)~factor(dpros)))

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2.1418087 0.0992064 21.589 < 2e-16 ***

factor(dpros)2 -0.1060634 0.1312377 -0.808 0.419

factor(dpros)3 0.0001465 0.1413909 0.001 0.999

factor(dpros)4 0.7431101 0.1680055 4.423 1.28e-05 ***

---

Residual standard error: 0.9871 on 376 degrees of freedom

Multiple R-squared: 0.07348, Adjusted R-squared: 0.06609

F-statistic: 9.94 on 3 and 376 DF, p-value: 2.547e-06

categorical vs continuous1
Categorical vs. continuous

> summary(lm(vol~dcaps))

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 22.905 3.477 6.587 1.51e-10 ***

dcaps -6.362 3.022 -2.106 0.0359 *

---

Residual standard error: 18.27 on 377 degrees of freedom

(1 observation deleted due to missingness)

Multiple R-squared: 0.01162, Adjusted R-squared: 0.009003

F-statistic: 4.434 on 1 and 377 DF, p-value: 0.03589

> summary(lm(vol~factor(dpros)))

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 17.417 1.858 9.374 <2e-16 ***

factor(dpros)2 -1.638 2.453 -0.668 0.505

factor(dpros)3 -1.976 2.641 -0.748 0.455

factor(dpros)4 -3.513 3.136 -1.120 0.263

---

Residual standard error: 18.39 on 375 degrees of freedom

(1 observation deleted due to missingness)

Multiple R-squared: 0.003598, Adjusted R-squared: -0.004373

F-statistic: 0.4514 on 3 and 375 DF, p-value: 0.7164

categorical vs continuous2
Categorical vs. continuous

> summary(lm(gleason~dcaps))

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 5.2560 0.1991 26.401 < 2e-16 ***

dcaps 1.0183 0.1730 5.885 8.78e-09 ***

---

Residual standard error: 1.047 on 378 degrees of freedom

Multiple R-squared: 0.08394, Adjusted R-squared: 0.08151

F-statistic: 34.63 on 1 and 378 DF, p-value: 8.776e-09

> summary(lm(gleason~factor(dpros)))

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 5.9798 0.1060 56.402 < 2e-16 ***

factor(dpros)2 0.4217 0.1403 3.007 0.00282 **

factor(dpros)3 0.4890 0.1511 3.236 0.00132 **

factor(dpros)4 0.9636 0.1795 5.367 1.40e-07 ***

---

Residual standard error: 1.055 on 376 degrees of freedom

Multiple R-squared: 0.07411, Adjusted R-squared: 0.06672

F-statistic: 10.03 on 3 and 376 DF, p-value: 2.251e-06

lots of correlation between covariates
Lots of “correlation” between covariates
  • We should expect that there will be insignificance and confounding.
  • Still, try the ‘full model’ and see what happens
full model results
Full model results

> mreg <- glm(cap.inv ~ gleason + log(psa) + vol + dcaps +

factor(dpros), family=binomial)

>

> summary(mreg)

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -8.617036 1.102909 -7.813 5.58e-15 ***

gleason 0.908424 0.166317 5.462 4.71e-08 ***

log(psa) 0.514200 0.156739 3.281 0.00104 **

vol -0.014171 0.007712 -1.838 0.06612 .

dcaps 0.464952 0.456868 1.018 0.30882

factor(dpros)2 0.753759 0.355762 2.119 0.03411 *

factor(dpros)3 1.517838 0.372366 4.076 4.58e-05 ***

factor(dpros)4 1.384887 0.453127 3.056 0.00224 **

---

Null deviance: 511.26 on 378 degrees of freedom

Residual deviance: 376.00 on 371 degrees of freedom

(1 observation deleted due to missingness)

AIC: 392

what next
What next?
  • Drop or retain?
  • How to interpret?
likelihood ratio test
Likelihood Ratio Test
  • Recall testing multiple coefficients in linear regression
  • Approach: ANOVA
  • We don’t have ANOVA for logistic
  • More general approach: Likelihood Ratio Test
  • Based on the likelihood (or log-likelihood) for “competing” nested models
likelihood ratio test1
Likelihood Ratio Test
  • Ho: small model
  • Ha: large model
  • Example:
estimating the log likelihood
Estimating the log-likelihood
  • Recall that we use the log-likelihood because it is simpler (back to linear regression)
  • MLEs:
    • Betas are selected to maximize the likelihood
    • Betas also maximize the log-likelihood
    • If we plus the estimated betas, we get our ‘maximized’ log-likelihood for that model
  • We compare the log-likelihoods from competing (nested) models
likelihood ratio test2
Likelihood Ratio Test
  • LR statistic = G2 = -2*(LogL(H0)-LogL(H1))
  • Under the null: G2 ~ χ2(p-q)
  • If G2 < χ2(p-q),1-α, conclude H0
  • If G2 > χ2(p-q),1-αconclude H1
lrt in r
LRT in R
  • -2LogL = Residual Deviance
  • So, G2 = Dev(0) - Dev(1)
  • Fit two models:
slide19

> mreg1 <- glm(cap.inv ~ gleason + log(psa) + vol + factor(dpros),

+ family=binomial)

> mreg0 <- glm(cap.inv ~ gleason + log(psa) + vol, family=binomial)

> mreg1

Coefficients:

(Intercept) gleason log(psa) vol

-8.31383 0.93147 0.53422 -0.01507

factor(dpros)2 factor(dpros)3 factor(dpros)4

0.76840 1.55109 1.44743

Degrees of Freedom: 378 Total (i.e. Null); 372 Residual

(1 observation deleted due to missingness)

Null Deviance: 511.3

Residual Deviance: 377.1 AIC: 391.1

> mreg0

Coefficients:

(Intercept) gleason log(psa) vol

-7.76759 0.99931 0.50406 -0.01583

Degrees of Freedom: 378 Total (i.e. Null); 375 Residual

(1 observation deleted due to missingness)

Null Deviance: 511.3

Residual Deviance: 399 AIC: 407

testing dpros
Testing DPROS
  • Dev(0) – Dev(1) =
  • p – q =
  • χ2(p-q),1-α, =
  • Conclusion?
  • p-value?
more in r
More in R

qchisq(0.95,3)

-2*(logLik(mreg0) - logLik(mreg1))

1-pchisq(21.96, 3)

> anova(mreg0, mreg1)

Analysis of Deviance Table

Model 1: cap.inv ~ gleason + log(psa) + vol

Model 2: cap.inv ~ gleason + log(psa) + vol + factor(dpros)

Resid. Df Resid. Dev Df Deviance

1 375 399.02

2 372 377.06 3 21.96

>

notes on lrt
Notes on LRT
  • Again, models have to be NESTED
  • For comparing models that are not nested, you need to use other approaches
  • Examples:
    • AIC
    • BIC
    • DIC
  • Next time….
for next time read the following article
For next time, read the following article

Low Diagnostic Yield of Elective Coronary Angiography

Patel, Peterson, Dai et al.

NEJM, 362(10). pp. 2886-95

March 11, 2010

http://content.nejm.org/cgi/content/short/362/10/886?ssource=mfv

ad