Lecture 9 anova tables f tests
Download
1 / 29

Lecture 9: ANOVA tables F-tests - PowerPoint PPT Presentation


  • 116 Views
  • Uploaded on

Lecture 9: ANOVA tables F-tests. BMTRY 701 Biostatistical Methods II. ANOVA. Analysis of Variance Similar in derivation to ANOVA that is generalization of two-sample t-test Partitioning of variance into several parts that due to the ‘model’: SSR that due to ‘error’: SSE

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Lecture 9: ANOVA tables F-tests' - kenley


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Lecture 9 anova tables f tests

Lecture 9:ANOVA tablesF-tests

BMTRY 701Biostatistical Methods II


Anova
ANOVA

  • Analysis of Variance

  • Similar in derivation to ANOVA that is generalization of two-sample t-test

  • Partitioning of variance into several parts

    • that due to the ‘model’: SSR

    • that due to ‘error’: SSE

  • The sum of the two parts is the total sum of squares: SST






Example loglos beds
Example: logLOS ~ BEDS

> ybar <- mean(data$logLOS)

> yhati <- reg$fitted.values

> sst <- sum((data$logLOS- ybar)^2)

> ssr <- sum((yhati - ybar )^2)

> sse <- sum((data$logLOS - yhati)^2)

>

> sst

[1] 3.547454

> ssr

[1] 0.6401715

> sse

[1] 2.907282

> sse+ssr

[1] 3.547454

>


Degrees of freedom
Degrees of Freedom

  • Degrees of freedom for SST: n - 1

    • one df is lost because it is used to estimate mean Y

  • Degrees of freedom for SSR: 1

    • only one df because all estimates are based on same fitted regression line

  • Degrees of freedom for SSE: n - 2

    • two lost due to estimating regression line (slope and intercept)


Mean squares
Mean Squares

  • “Scaled” version of Sum of Squares

  • Mean Square = SS/df

  • MSR = SSR/1

  • MSE = SSE/(n-2)

  • Notes:

    • mean squares are not additive! That is, MSR + MSE ≠SST/(n-1)

    • MSE is the same as we saw previously



Anova for loglos beds
ANOVA for logLOS ~ BEDS

> anova(reg)

Analysis of Variance Table

Response: logLOS

Df Sum Sq Mean Sq F value Pr(>F)

BEDS 1 0.64017 0.64017 24.442 2.737e-06 ***

Residuals 111 2.90728 0.02619


Inference
Inference?

  • What is of interest and how do we interpret?

  • We’d like to know if BEDS is related to logLOS.

  • How do we do that using ANOVA table?

  • We need to know the expected value of the MSR and MSE:


Implications
Implications

  • mean of sampling distribution of MSE is σ2regardless of whether or not β1= 0

  • If β1= 0, E(MSE) = E(MSR)

  • If β1≠ 0, E(MSE) < E(MSR)

  • To test significance of β1, we can test if MSR and MSE are of the same magnitude.


F test
F-test

  • Derived naturally from the arguments just made

  • Hypotheses:

    • H0: β1= 0

    • H1:β1≠ 0

  • Test statistic: F* = MSR/MSE

  • Based on earlier argument we expect F* >1 if H1 is true.

  • Implies one-sided test.


F test1
F-test

  • The distribution of F under the null has two sets of degrees of freedom (df)

    • numerator degrees of freedom

    • denominator degrees of freedom

  • These correspond to the df as shown in the ANOVA table

    • numerator df = 1

    • denominator df = n-2

  • Test is based on


Implementing the f test
Implementing the F-test

  • The decision rule

  • If F* > F(1-α; 1, n-2), then reject Ho

  • If F* ≤ F(1-α; 1, n-2), then fail to reject Ho



Anova for loglos beds1
ANOVA for logLOS ~ BEDS

> anova(reg)

Analysis of Variance Table

Response: logLOS

Df Sum Sq Mean Sq F value Pr(>F)

BEDS 1 0.64017 0.64017 24.442 2.737e-06 ***

Residuals 111 2.90728 0.02619

> qf(0.95, 1, 111)

[1] 3.926607

> 1-pf(24.44,1,111)

[1] 2.739016e-06


More interesting mlr
More interesting: MLR

  • You can test that several coefficients are zero at the same time

  • Otherwise, F-test gives the same result as a t-test

  • That is: for testing the significance of ONE covariate in a linear regression model, an F-test and a t-test give the same result:

    • H0: β1= 0

    • H1:β1≠ 0


General f testing approach
general F testing approach

  • Previous seems simple

  • It is in this case, but can be generalized to be more useful

  • Imagine more general test:

    • Ho: small model

    • Ha: large model

  • Constraint: the small model must be ‘nested’ in the large model

  • That is, the small model must be a ‘subset’ of the large model


Example of nested models
Example of ‘nested’ models

Model 1:

Model 2:

Model 3:

Models 2 and 3 are nested in Model 1

Model 2 is not nested in Model 3

Model 3 is not nested in Model 2


Testing models must be nested
Testing: Models must be nested!

  • To test Model 1 vs. Model 2

    • we are testing that β2 = 0

    • Ho: β2 = 0 vs. Ha: β2 ≠ 0

    • If β2 = 0 , then we conclude that Model 2 is superior to Model 1

    • That is, if we reject the null hypothesis

Model 1:

Model 2:


R

reg1 <- lm(LOS ~ INFRISK + ms + NURSE + nurse2, data=data)

reg2 <- lm(LOS ~ INFRISK + NURSE + nurse2, data=data)

reg3 <- lm(LOS ~ INFRISK + ms, data=data)

> anova(reg1)

Analysis of Variance Table

Response: LOS

Df Sum Sq Mean Sq F value Pr(>F)

INFRISK 1 116.446 116.446 45.4043 8.115e-10 ***

ms 1 12.897 12.897 5.0288 0.02697 *

NURSE 1 1.097 1.097 0.4277 0.51449

nurse2 1 1.789 1.789 0.6976 0.40543

Residuals 108 276.981 2.565

---


R

> anova(reg2)

Analysis of Variance Table

Response: LOS

Df Sum Sq Mean Sq F value Pr(>F)

INFRISK 1 116.446 116.446 44.8865 9.507e-10 ***

NURSE 1 8.212 8.212 3.1653 0.078 .

nurse2 1 1.782 1.782 0.6870 0.409

Residuals 109 282.771 2.594

---

> anova(reg1, reg2)

Analysis of Variance Table

Model 1: LOS ~ INFRISK + ms + NURSE + nurse2

Model 2: LOS ~ INFRISK + NURSE + nurse2

Res.Df RSS Df Sum of Sq F Pr(>F)

1 108 276.981

2 109 282.771 -1 -5.789 2.2574 0.1359


R

> summary(reg1)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 6.355e+00 5.266e-01 12.068 < 2e-16 ***

INFRISK 6.289e-01 1.339e-01 4.696 7.86e-06 ***

ms 7.829e-01 5.211e-01 1.502 0.136

NURSE 4.136e-03 4.093e-03 1.010 0.315

nurse2 -5.676e-06 6.796e-06 -0.835 0.405

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.601 on 108 degrees of freedom

Multiple R-squared: 0.3231, Adjusted R-squared: 0.2981

F-statistic: 12.89 on 4 and 108 DF, p-value: 1.298e-08

>


Testing more than two covariates
Testing more than two covariates

  • To test Model 1 vs. Model 3

    • we are testing that β3 = 0 AND β4 = 0

    • Ho: β3 = β4 = 0 vs. Ha: β3 ≠ 0 or β4 ≠ 0

    • If β3 = β4 = 0, then we conclude that Model 3 is superior to Model 1

    • That is, if we reject the null hypothesis

Model 1:

Model 3:


R

> anova(reg3)

Analysis of Variance Table

Response: LOS

Df Sum Sq Mean Sq F value Pr(>F)

INFRISK 1 116.446 116.446 45.7683 6.724e-10 ***

ms 1 12.897 12.897 5.0691 0.02634 *

Residuals 110 279.867 2.544

---

> anova(reg1, reg3)

Analysis of Variance Table

Model 1: LOS ~ INFRISK + ms + NURSE + nurse2

Model 2: LOS ~ INFRISK + ms

Res.Df RSS Df Sum of Sq F Pr(>F)

1 108 276.981

2 110 279.867 -2 -2.886 0.5627 0.5713


R

> summary(reg3)

Call:

lm(formula = LOS ~ INFRISK + ms, data = data)

Residuals:

Min 1Q Median 3Q Max

-2.9037 -0.8739 -0.1142 0.5965 8.5568

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 6.4547 0.5146 12.542 <2e-16 ***

INFRISK 0.6998 0.1156 6.054 2e-08 ***

ms 0.9717 0.4316 2.251 0.0263 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.595 on 110 degrees of freedom

Multiple R-squared: 0.3161, Adjusted R-squared: 0.3036

F-statistic: 25.42 on 2 and 110 DF, p-value: 8.42e-10


Testing multiple coefficients simultaneously
Testing multiple coefficients simultaneously

  • Region: it is a ‘factor’ variable with 4 categories


ad