1 / 16

Analysis of Variance

Analysis of Variance. Harry R. Erwin, PhD School of Computing and Technology University of Sunderland. Resources. Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. Gonick, L., and Woollcott Smith (1993) A Cartoon Guide to Statistics. HarperResource (for fun).

erwin
Download Presentation

Analysis of Variance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Variance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

  2. Resources • Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. • Gonick, L., and Woollcott Smith (1993) A Cartoon Guide to Statistics. HarperResource (for fun).

  3. When is Anova Used? • All explanatory variables are categorical—unquantified and unordered • The explanatory variables are called ‘factors’; each has two or more levels. • If there is one factor with two levels, use Student’s t. • If there is one factor with three+ levels, use one-way Anova. • If there are two factors, use two-way Anova. • For three factors, use three-way Anova, and so on… • If every combination of factors is present, you have a factorial design, allowing you to study interactions between variables (and order no longer matters!).

  4. The Basic Idea of Anova • You compare means by comparing variances. (Picture) • Compute the overall variance of the data: • s2 = sum of squares (SSY)/degrees of freedom (kn-1) • If the treatment means are significantly different, the sum of squares computed from the individual treatment means will be smaller than the sum of squares computed from the overall mean. • SSE = computed from the individual treatment means. degrees of freedom = k (n-1) • SSA = SSY-SSE, df =k-1. • Finally, use an F test to determine if the SSA is significant.

  5. Anova Tools • model<-aov(y~A) • summary(model) • Tells you whether the SSA is significant • plot(model) • checks for constancy of variance and normality of errors. • Demonstration (155-161)

  6. Demonstration oneway<-read.table("oneway.txt",header=T) attach(oneway) names(oneway) [1] "ozone" "garden” plot(1:20,ozone,ylim=c(0,8),ylab="y",xlab="order") abline(mean(ozone),0) for(i in 1:20)lines(c(i,i),c(mean(ozone),ozone[i]))

  7. Variance from Mean

  8. Separating the two gardens

  9. Analysis summary(aov(ozone~garden)) Df Sum Sq Mean Sq F value Pr(>F) garden 1 20.0000 20.0000 15 0.001115 ** Residuals 18 24.0000 1.3333 <- residual mean square plot(aov(ozone~garden)) <- similar to lm plots, less informative

  10. Investigating Factor Levels • summary.aov() allows you to do hypothesis testing. • What is more interesting (usually) than hypothesis testing are the effects of factor levels. • That uses summary.lm() from the regression lecture • summary.lm(aov(ozone~garden)) • Discuss (pages 164-166 of text)

  11. Plotting ANOVA • Box and whisker plots • Show the nature of the variation within each treatment. • Show skew. • Bar plots with error bars • Preferred by many journals • Demo (pages 168-169) • Show when means are significantly different.

  12. Factorial Experiments • All combinations of factors present. Highly desirable. • Allow us to investigate interactions. • summary() • summary.lm() • Demo of simplification.

  13. Pseudoreplication • aov and lme can handle complicated error structures. • Avoid two kinds of pseudoreplication: • Nested sampling • Split-plot analysis • You can average away spatial pseudoreplication and conduct individual ANOVAs for each time. • Has some weaknesses (page 180)

  14. Random Effects and Nested Designs • Mixed effects models: both fixed (affecting the mean) and random (affecting the variance) effects in the explanatory variables. • Affected by grouping. • Page 179 for categorisation.

  15. Longitudinal Data • Repeated measurements from an individual • Common in medical studies • Be critical! Can separate age effects from cohort effects. • Response is a measurement series.

  16. Derived Variable Analysis • Summarise the statistics to average away the pseudoreplication and analyse the statistics. • Weak if explanatory variable change over time. • Watch for variation in: • random effects (VCA) • serial correlation • measurement error

More Related