Loading in 5 sec....

Lecture 9: ANOVA tables F-testsPowerPoint Presentation

Lecture 9: ANOVA tables F-tests

- 116 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Lecture 9: ANOVA tables F-tests' - kenley

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Lecture 9:ANOVA tablesF-tests

BMTRY 701Biostatistical Methods II

ANOVA

- Analysis of Variance
- Similar in derivation to ANOVA that is generalization of two-sample t-test
- Partitioning of variance into several parts
- that due to the ‘model’: SSR
- that due to ‘error’: SSE

- The sum of the two parts is the total sum of squares: SST

Example: logLOS ~ BEDS

> ybar <- mean(data$logLOS)

> yhati <- reg$fitted.values

> sst <- sum((data$logLOS- ybar)^2)

> ssr <- sum((yhati - ybar )^2)

> sse <- sum((data$logLOS - yhati)^2)

>

> sst

[1] 3.547454

> ssr

[1] 0.6401715

> sse

[1] 2.907282

> sse+ssr

[1] 3.547454

>

Degrees of Freedom

- Degrees of freedom for SST: n - 1
- one df is lost because it is used to estimate mean Y

- Degrees of freedom for SSR: 1
- only one df because all estimates are based on same fitted regression line

- Degrees of freedom for SSE: n - 2
- two lost due to estimating regression line (slope and intercept)

Mean Squares

- “Scaled” version of Sum of Squares
- Mean Square = SS/df
- MSR = SSR/1
- MSE = SSE/(n-2)
- Notes:
- mean squares are not additive! That is, MSR + MSE ≠SST/(n-1)
- MSE is the same as we saw previously

ANOVA for logLOS ~ BEDS

> anova(reg)

Analysis of Variance Table

Response: logLOS

Df Sum Sq Mean Sq F value Pr(>F)

BEDS 1 0.64017 0.64017 24.442 2.737e-06 ***

Residuals 111 2.90728 0.02619

Inference?

- What is of interest and how do we interpret?
- We’d like to know if BEDS is related to logLOS.
- How do we do that using ANOVA table?
- We need to know the expected value of the MSR and MSE:

Implications

- mean of sampling distribution of MSE is σ2regardless of whether or not β1= 0
- If β1= 0, E(MSE) = E(MSR)
- If β1≠ 0, E(MSE) < E(MSR)
- To test significance of β1, we can test if MSR and MSE are of the same magnitude.

F-test

- Derived naturally from the arguments just made
- Hypotheses:
- H0: β1= 0
- H1:β1≠ 0

- Test statistic: F* = MSR/MSE
- Based on earlier argument we expect F* >1 if H1 is true.
- Implies one-sided test.

F-test

- The distribution of F under the null has two sets of degrees of freedom (df)
- numerator degrees of freedom
- denominator degrees of freedom

- These correspond to the df as shown in the ANOVA table
- numerator df = 1
- denominator df = n-2

- Test is based on

Implementing the F-test

- The decision rule
- If F* > F(1-α; 1, n-2), then reject Ho
- If F* ≤ F(1-α; 1, n-2), then fail to reject Ho

ANOVA for logLOS ~ BEDS

> anova(reg)

Analysis of Variance Table

Response: logLOS

Df Sum Sq Mean Sq F value Pr(>F)

BEDS 1 0.64017 0.64017 24.442 2.737e-06 ***

Residuals 111 2.90728 0.02619

> qf(0.95, 1, 111)

[1] 3.926607

> 1-pf(24.44,1,111)

[1] 2.739016e-06

More interesting: MLR

- You can test that several coefficients are zero at the same time
- Otherwise, F-test gives the same result as a t-test
- That is: for testing the significance of ONE covariate in a linear regression model, an F-test and a t-test give the same result:
- H0: β1= 0
- H1:β1≠ 0

general F testing approach

- Previous seems simple
- It is in this case, but can be generalized to be more useful
- Imagine more general test:
- Ho: small model
- Ha: large model

- Constraint: the small model must be ‘nested’ in the large model
- That is, the small model must be a ‘subset’ of the large model

Example of ‘nested’ models

Model 1:

Model 2:

Model 3:

Models 2 and 3 are nested in Model 1

Model 2 is not nested in Model 3

Model 3 is not nested in Model 2

Testing: Models must be nested!

- To test Model 1 vs. Model 2
- we are testing that β2 = 0
- Ho: β2 = 0 vs. Ha: β2 ≠ 0
- If β2 = 0 , then we conclude that Model 2 is superior to Model 1
- That is, if we reject the null hypothesis

Model 1:

Model 2:

R

reg1 <- lm(LOS ~ INFRISK + ms + NURSE + nurse2, data=data)

reg2 <- lm(LOS ~ INFRISK + NURSE + nurse2, data=data)

reg3 <- lm(LOS ~ INFRISK + ms, data=data)

> anova(reg1)

Analysis of Variance Table

Response: LOS

Df Sum Sq Mean Sq F value Pr(>F)

INFRISK 1 116.446 116.446 45.4043 8.115e-10 ***

ms 1 12.897 12.897 5.0288 0.02697 *

NURSE 1 1.097 1.097 0.4277 0.51449

nurse2 1 1.789 1.789 0.6976 0.40543

Residuals 108 276.981 2.565

---

R

> anova(reg2)

Analysis of Variance Table

Response: LOS

Df Sum Sq Mean Sq F value Pr(>F)

INFRISK 1 116.446 116.446 44.8865 9.507e-10 ***

NURSE 1 8.212 8.212 3.1653 0.078 .

nurse2 1 1.782 1.782 0.6870 0.409

Residuals 109 282.771 2.594

---

> anova(reg1, reg2)

Analysis of Variance Table

Model 1: LOS ~ INFRISK + ms + NURSE + nurse2

Model 2: LOS ~ INFRISK + NURSE + nurse2

Res.Df RSS Df Sum of Sq F Pr(>F)

1 108 276.981

2 109 282.771 -1 -5.789 2.2574 0.1359

R

> summary(reg1)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 6.355e+00 5.266e-01 12.068 < 2e-16 ***

INFRISK 6.289e-01 1.339e-01 4.696 7.86e-06 ***

ms 7.829e-01 5.211e-01 1.502 0.136

NURSE 4.136e-03 4.093e-03 1.010 0.315

nurse2 -5.676e-06 6.796e-06 -0.835 0.405

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.601 on 108 degrees of freedom

Multiple R-squared: 0.3231, Adjusted R-squared: 0.2981

F-statistic: 12.89 on 4 and 108 DF, p-value: 1.298e-08

>

Testing more than two covariates

- To test Model 1 vs. Model 3
- we are testing that β3 = 0 AND β4 = 0
- Ho: β3 = β4 = 0 vs. Ha: β3 ≠ 0 or β4 ≠ 0
- If β3 = β4 = 0, then we conclude that Model 3 is superior to Model 1
- That is, if we reject the null hypothesis

Model 1:

Model 3:

R

> anova(reg3)

Analysis of Variance Table

Response: LOS

Df Sum Sq Mean Sq F value Pr(>F)

INFRISK 1 116.446 116.446 45.7683 6.724e-10 ***

ms 1 12.897 12.897 5.0691 0.02634 *

Residuals 110 279.867 2.544

---

> anova(reg1, reg3)

Analysis of Variance Table

Model 1: LOS ~ INFRISK + ms + NURSE + nurse2

Model 2: LOS ~ INFRISK + ms

Res.Df RSS Df Sum of Sq F Pr(>F)

1 108 276.981

2 110 279.867 -2 -2.886 0.5627 0.5713

R

> summary(reg3)

Call:

lm(formula = LOS ~ INFRISK + ms, data = data)

Residuals:

Min 1Q Median 3Q Max

-2.9037 -0.8739 -0.1142 0.5965 8.5568

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 6.4547 0.5146 12.542 <2e-16 ***

INFRISK 0.6998 0.1156 6.054 2e-08 ***

ms 0.9717 0.4316 2.251 0.0263 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.595 on 110 degrees of freedom

Multiple R-squared: 0.3161, Adjusted R-squared: 0.3036

F-statistic: 25.42 on 2 and 110 DF, p-value: 8.42e-10

Testing multiple coefficients simultaneously

- Region: it is a ‘factor’ variable with 4 categories

Download Presentation

Connecting to Server..