1 / 29

# Lecture 9: ANOVA tables F-tests - PowerPoint PPT Presentation

Lecture 9: ANOVA tables F-tests. BMTRY 701 Biostatistical Methods II. ANOVA. Analysis of Variance Similar in derivation to ANOVA that is generalization of two-sample t-test Partitioning of variance into several parts that due to the ‘model’: SSR that due to ‘error’: SSE

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Lecture 9: ANOVA tables F-tests' - kenley

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Lecture 9:ANOVA tablesF-tests

BMTRY 701Biostatistical Methods II

• Analysis of Variance

• Similar in derivation to ANOVA that is generalization of two-sample t-test

• Partitioning of variance into several parts

• that due to the ‘model’: SSR

• that due to ‘error’: SSE

• The sum of the two parts is the total sum of squares: SST

> ybar <- mean(data\$logLOS)

> yhati <- reg\$fitted.values

> sst <- sum((data\$logLOS- ybar)^2)

> ssr <- sum((yhati - ybar )^2)

> sse <- sum((data\$logLOS - yhati)^2)

>

> sst

[1] 3.547454

> ssr

[1] 0.6401715

> sse

[1] 2.907282

> sse+ssr

[1] 3.547454

>

• Degrees of freedom for SST: n - 1

• one df is lost because it is used to estimate mean Y

• Degrees of freedom for SSR: 1

• only one df because all estimates are based on same fitted regression line

• Degrees of freedom for SSE: n - 2

• two lost due to estimating regression line (slope and intercept)

• “Scaled” version of Sum of Squares

• Mean Square = SS/df

• MSR = SSR/1

• MSE = SSE/(n-2)

• Notes:

• mean squares are not additive! That is, MSR + MSE ≠SST/(n-1)

• MSE is the same as we saw previously

> anova(reg)

Analysis of Variance Table

Response: logLOS

Df Sum Sq Mean Sq F value Pr(>F)

BEDS 1 0.64017 0.64017 24.442 2.737e-06 ***

Residuals 111 2.90728 0.02619

• What is of interest and how do we interpret?

• We’d like to know if BEDS is related to logLOS.

• How do we do that using ANOVA table?

• We need to know the expected value of the MSR and MSE:

• mean of sampling distribution of MSE is σ2regardless of whether or not β1= 0

• If β1= 0, E(MSE) = E(MSR)

• If β1≠ 0, E(MSE) < E(MSR)

• To test significance of β1, we can test if MSR and MSE are of the same magnitude.

• Derived naturally from the arguments just made

• Hypotheses:

• H0: β1= 0

• H1:β1≠ 0

• Test statistic: F* = MSR/MSE

• Based on earlier argument we expect F* >1 if H1 is true.

• Implies one-sided test.

• The distribution of F under the null has two sets of degrees of freedom (df)

• numerator degrees of freedom

• denominator degrees of freedom

• These correspond to the df as shown in the ANOVA table

• numerator df = 1

• denominator df = n-2

• Test is based on

• The decision rule

• If F* > F(1-α; 1, n-2), then reject Ho

• If F* ≤ F(1-α; 1, n-2), then fail to reject Ho

> anova(reg)

Analysis of Variance Table

Response: logLOS

Df Sum Sq Mean Sq F value Pr(>F)

BEDS 1 0.64017 0.64017 24.442 2.737e-06 ***

Residuals 111 2.90728 0.02619

> qf(0.95, 1, 111)

[1] 3.926607

> 1-pf(24.44,1,111)

[1] 2.739016e-06

• You can test that several coefficients are zero at the same time

• Otherwise, F-test gives the same result as a t-test

• That is: for testing the significance of ONE covariate in a linear regression model, an F-test and a t-test give the same result:

• H0: β1= 0

• H1:β1≠ 0

• Previous seems simple

• It is in this case, but can be generalized to be more useful

• Imagine more general test:

• Ho: small model

• Ha: large model

• Constraint: the small model must be ‘nested’ in the large model

• That is, the small model must be a ‘subset’ of the large model

Model 1:

Model 2:

Model 3:

Models 2 and 3 are nested in Model 1

Model 2 is not nested in Model 3

Model 3 is not nested in Model 2

• To test Model 1 vs. Model 2

• we are testing that β2 = 0

• Ho: β2 = 0 vs. Ha: β2 ≠ 0

• If β2 = 0 , then we conclude that Model 2 is superior to Model 1

• That is, if we reject the null hypothesis

Model 1:

Model 2:

reg1 <- lm(LOS ~ INFRISK + ms + NURSE + nurse2, data=data)

reg2 <- lm(LOS ~ INFRISK + NURSE + nurse2, data=data)

reg3 <- lm(LOS ~ INFRISK + ms, data=data)

> anova(reg1)

Analysis of Variance Table

Response: LOS

Df Sum Sq Mean Sq F value Pr(>F)

INFRISK 1 116.446 116.446 45.4043 8.115e-10 ***

ms 1 12.897 12.897 5.0288 0.02697 *

NURSE 1 1.097 1.097 0.4277 0.51449

nurse2 1 1.789 1.789 0.6976 0.40543

Residuals 108 276.981 2.565

---

> anova(reg2)

Analysis of Variance Table

Response: LOS

Df Sum Sq Mean Sq F value Pr(>F)

INFRISK 1 116.446 116.446 44.8865 9.507e-10 ***

NURSE 1 8.212 8.212 3.1653 0.078 .

nurse2 1 1.782 1.782 0.6870 0.409

Residuals 109 282.771 2.594

---

> anova(reg1, reg2)

Analysis of Variance Table

Model 1: LOS ~ INFRISK + ms + NURSE + nurse2

Model 2: LOS ~ INFRISK + NURSE + nurse2

Res.Df RSS Df Sum of Sq F Pr(>F)

1 108 276.981

2 109 282.771 -1 -5.789 2.2574 0.1359

> summary(reg1)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 6.355e+00 5.266e-01 12.068 < 2e-16 ***

INFRISK 6.289e-01 1.339e-01 4.696 7.86e-06 ***

ms 7.829e-01 5.211e-01 1.502 0.136

NURSE 4.136e-03 4.093e-03 1.010 0.315

nurse2 -5.676e-06 6.796e-06 -0.835 0.405

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.601 on 108 degrees of freedom

Multiple R-squared: 0.3231, Adjusted R-squared: 0.2981

F-statistic: 12.89 on 4 and 108 DF, p-value: 1.298e-08

>

• To test Model 1 vs. Model 3

• we are testing that β3 = 0 AND β4 = 0

• Ho: β3 = β4 = 0 vs. Ha: β3 ≠ 0 or β4 ≠ 0

• If β3 = β4 = 0, then we conclude that Model 3 is superior to Model 1

• That is, if we reject the null hypothesis

Model 1:

Model 3:

> anova(reg3)

Analysis of Variance Table

Response: LOS

Df Sum Sq Mean Sq F value Pr(>F)

INFRISK 1 116.446 116.446 45.7683 6.724e-10 ***

ms 1 12.897 12.897 5.0691 0.02634 *

Residuals 110 279.867 2.544

---

> anova(reg1, reg3)

Analysis of Variance Table

Model 1: LOS ~ INFRISK + ms + NURSE + nurse2

Model 2: LOS ~ INFRISK + ms

Res.Df RSS Df Sum of Sq F Pr(>F)

1 108 276.981

2 110 279.867 -2 -2.886 0.5627 0.5713

> summary(reg3)

Call:

lm(formula = LOS ~ INFRISK + ms, data = data)

Residuals:

Min 1Q Median 3Q Max

-2.9037 -0.8739 -0.1142 0.5965 8.5568

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 6.4547 0.5146 12.542 <2e-16 ***

INFRISK 0.6998 0.1156 6.054 2e-08 ***

ms 0.9717 0.4316 2.251 0.0263 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.595 on 110 degrees of freedom

Multiple R-squared: 0.3161, Adjusted R-squared: 0.3036

F-statistic: 25.42 on 2 and 110 DF, p-value: 8.42e-10

• Region: it is a ‘factor’ variable with 4 categories