 Download Presentation Topic 7: Analysis of Variance

Topic 7: Analysis of Variance - PowerPoint PPT Presentation

Topic 7: Analysis of Variance. Outline. Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General linear test Pearson Correlation / R 2. Analysis of Variance. Organize results arithmetically Total sum of squares in Y is I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation Topic 7: Analysis of Variance

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
1. Topic 7: Analysis of Variance

2. Outline • Partitioning sums of squares • Breakdown degrees of freedom • Expected mean squares (EMS) • F test • ANOVA table • General linear test • Pearson Correlation / R2

3. Analysis of Variance • Organize results arithmetically • Total sum of squares in Y is • Partition this into two sources • Model (explained by regression) • Error (unexplained / residual)

4. Total Sum of Squares • MST is the usual estimate of the variance of Y if there are no explanatory variables • SAS uses the term Corrected Total for this source • Uncorrected is ΣYi2 • The “corrected” means that we subtract of the mean before squaring

5. Model Sum of Squares • dfR = 1 (due to the addition of the slope) • MSR = SSR/dfR • KNNL uses regressionfor what SAS calls model • So SSR (KNNL) is the same as SS Model

6. Error Sum of Squares • dfE = n-2 (both slope and intercept) • MSE = SSE/dfE • MSE is an estimate of the variance of Y taking into account (or conditioning on) the explanatory variable(s) • MSE=s2

7. ANOVA Table Source df SS MS Regression 1 SSR/dfR Error n-2 SSE/dfE ________________________________ Total n-1 SSTO/dfT

8. Expected Mean Squares • MSR, MSE are random variables • When H0 : β1 = 0 is true E(MSR) =E(MSE)

9. F test • F*=MSR/MSE ~ F(dfR, dfE) = F(1, n-2) • See KNNL pgs 69-71 • When H0: β1=0 is false, MSR tends to be larger than MSE • We reject H0 when F is large If F* F(1-α, dfR, dfE) = F(.95, 1, n-2) • In practice we use P-values

10. F test • When H0: β1=0 is false, F has a noncentralF distribution • This can be used to calculate power • Recall t* = b1/s(b1) tests H0 : β1=0 • It can be shown that (t*)2 = F* (pg 71) • Two approaches give same P-value

11. ANOVA Table Source df SS MS F P Model 1 SSM MSM MSM/MSE 0.## Error n-2 SSE MSE Total n-1 **Note: Model instead of Regression used here. More similar to SAS

12. Examples • Tower of Pisa study (n=13 cases) proc reg data=a1; model lean=year; run; • Toluca lot size study (n=25 cases) proc reg data=toluca; model hours=lotsize; run;

13. Pisa Output

14. Pisa Output (30.07)2=904.2 (rounding error)

15. Toluca Output

16. Toluca Output (10.29)2=105.88

17. General Linear Test • A different view of the same problem • We want to compare two models • Yi = β0 + β1Xi + ei (full model) • Yi = β0 + ei (reduced model) • Compare two models using the error sum of squares…better model will have “smaller” mean square error

18. General Linear Test • Let SSE(F) = SSE for full model SSE(R) = SSE for reduced model • Compare with F(1-α,dfR-dfF,dfF)

19. Simple Linear Regression • dfR=n-1, dfF=n-2, • dfR-dfF=1 • F=(SSTO-SSE)/MSE=SSR/MSE • Same test as before • This approach is more general

20. Pearson Correlation • r is the usual correlation coefficient • It is a number between –1 and +1 and measures the strength of the linear relationship between two variables

21. Pearson Correlation • Notice that • Test H0: β1=0 similar to H0: ρ=0

22. R2 and r2 • Ratio of explained and total variation

23. R2 and r2 • We use R2 when the number of explanatory variables is arbitrary (simple and multiple regression) • r2=R2only for simple regression • R2 is often multiplied by 100 and thereby expressed as a percent

24. R2 and r2 • R2 always increases when additional explanatory variables are added to the model • Adjusted R2 “penalizes” larger models • Doesn’t necessarily get larger

25. Pisa Output R-Square 0.9880 (SAS) = SSM/SSTO = 15804/15997 = 0.9879

26. Toluca Output R-Square 0.8215 (SAS) = SSM/SSTO = 252378/307203 = 0.8215

27. Background Reading • May find 2.10 and 2.11 interesting • 2.10 provides cautionary remarks • Will discuss these as they arise • 2.11 discusses bivariate Normal dist • Similarities and differences • Confidence interval for r • Program topic7.sas has the code to generate the ANOVA output • Read Chapter 3