160 likes | 287 Views
This resource provides a thorough overview of Analysis of Variance (ANOVA), detailing its applications and techniques for comparing means through variances. It explains the use of one-way, two-way, and factorial ANOVA, along with model fitting in R. Key concepts include hypothesis testing, effect of factor levels, and visualization through box plots and bar plots. The guide emphasizes the importance of avoiding pseudoreplication and explains mixed-effects models for random and fixed effects. Ideal for students and professionals in statistics and research.
E N D
Analysis of Variance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland
Resources • Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. • Gonick, L., and Woollcott Smith (1993) A Cartoon Guide to Statistics. HarperResource (for fun).
When is Anova Used? • All explanatory variables are categorical—unquantified and unordered • The explanatory variables are called ‘factors’; each has two or more levels. • If there is one factor with two levels, use Student’s t. • If there is one factor with three+ levels, use one-way Anova. • If there are two factors, use two-way Anova. • For three factors, use three-way Anova, and so on… • If every combination of factors is present, you have a factorial design, allowing you to study interactions between variables (and order no longer matters!).
The Basic Idea of Anova • You compare means by comparing variances. (Picture) • Compute the overall variance of the data: • s2 = sum of squares (SSY)/degrees of freedom (kn-1) • If the treatment means are significantly different, the sum of squares computed from the individual treatment means will be smaller than the sum of squares computed from the overall mean. • SSE = computed from the individual treatment means. degrees of freedom = k (n-1) • SSA = SSY-SSE, df =k-1. • Finally, use an F test to determine if the SSA is significant.
Anova Tools • model<-aov(y~A) • summary(model) • Tells you whether the SSA is significant • plot(model) • checks for constancy of variance and normality of errors. • Demonstration (155-161)
Demonstration oneway<-read.table("oneway.txt",header=T) attach(oneway) names(oneway) [1] "ozone" "garden” plot(1:20,ozone,ylim=c(0,8),ylab="y",xlab="order") abline(mean(ozone),0) for(i in 1:20)lines(c(i,i),c(mean(ozone),ozone[i]))
Analysis summary(aov(ozone~garden)) Df Sum Sq Mean Sq F value Pr(>F) garden 1 20.0000 20.0000 15 0.001115 ** Residuals 18 24.0000 1.3333 <- residual mean square plot(aov(ozone~garden)) <- similar to lm plots, less informative
Investigating Factor Levels • summary.aov() allows you to do hypothesis testing. • What is more interesting (usually) than hypothesis testing are the effects of factor levels. • That uses summary.lm() from the regression lecture • summary.lm(aov(ozone~garden)) • Discuss (pages 164-166 of text)
Plotting ANOVA • Box and whisker plots • Show the nature of the variation within each treatment. • Show skew. • Bar plots with error bars • Preferred by many journals • Demo (pages 168-169) • Show when means are significantly different.
Factorial Experiments • All combinations of factors present. Highly desirable. • Allow us to investigate interactions. • summary() • summary.lm() • Demo of simplification.
Pseudoreplication • aov and lme can handle complicated error structures. • Avoid two kinds of pseudoreplication: • Nested sampling • Split-plot analysis • You can average away spatial pseudoreplication and conduct individual ANOVAs for each time. • Has some weaknesses (page 180)
Random Effects and Nested Designs • Mixed effects models: both fixed (affecting the mean) and random (affecting the variance) effects in the explanatory variables. • Affected by grouping. • Page 179 for categorisation.
Longitudinal Data • Repeated measurements from an individual • Common in medical studies • Be critical! Can separate age effects from cohort effects. • Response is a measurement series.
Derived Variable Analysis • Summarise the statistics to average away the pseudoreplication and analyse the statistics. • Weak if explanatory variable change over time. • Watch for variation in: • random effects (VCA) • serial correlation • measurement error