Bios 101 Lecture 8: Analysis of Variance (ANOVA) of Variance Shankar Viswanathan, DrPH

Bios 101 Lecture 8: Analysis of Variance (ANOVA)of Variance • Shankar Viswanathan, DrPH • Division of Biostatistics, DEPH • December 20, 2011

Analysis of Variance (ANOVA) is a technique for assessing how several nominal/ categorical independent variables affect a continuous dependent variable. ANOVA is usually concerned with comparisons involving several population means. The simplest special case involving a comparison of two populations means the ANOVA procedure is equivalent to the usual two-sample t- test.

Why Not ‘t’- Test? • Each test is based on the probability that the null hypothesis is true. Therefore each time we conduct a test we are at the risk of Type I error. The probability level as we set as a point at which we reject the null hypothesis also is the level of risk with which we are comfortable. • If, the level is 0.05 we are accepting, the risk that 5 of 100 times, our rejection of null hypothesis will be in error. When we calculate multiple t tests on independent sample the rate of error increases in exponentially by the number of test conducted. Suppose we have A, B, C, D as a group, the calculation of rate of Type I error is determined by • 1-(1-)t , where - level of significance t-number of tests. • 1-(1-0.05) 6 =0.265

WHY ANOVA? • Although means are compared, the comparisons are made using estimate of variance. The ANOVA test statistic or F statistics are actually ratios of estimate of variance. ANOVA is used only when independent variable is nominal and dependent variable is continuous.

FACTORS AND LEVELS • Basic nominal/ categorical independent variable is called a factor. The different categories of a factor are referred to as its levels. • Example: • If we wanted to compare the effects of several drugs on some human health response, the nominal variable “drug” as a single factor and specified drug categories as the levels.

ASSUMPTIONS • Random samples are selected from each of K populations or groups. • A value of a specified dependent variable is recorded for each experimental unit sampled. • The dependent variable is normally distributed in each population. • The variance of the dependent variable is same in each population

Source of Variation • ANOVA separates the variation in all the data into two parts: • the variation between each group mean and the overall mean for all the groups (the between group variability) and • the variation between each study participant and the participants group mean (the within-group variability). • If the between-group variability is much greater than the within-group variability, there are likely to be difference between the group means.

The main analysis I Thepopulation means are all equal. If there are K means then the null hypothesis is Alternative hypothesis is given by

SUM OF SQUARES • The statistical concept of variation involves a qualification of the amount of variation of the variable around the mean. This we call as the SUM OF SQUARES. Sum of squares is the sum of squared deviation of each of the variable around their respective mean. Sum of squares is used to measure the total variation . • Between group variation • Within group variation

Within group variation (WSS) • The within group variation is the total variation that occurs in each subgroup. It is calculated by finding the sum of squares separately and then summing the results • Between group variation (BSS) • The between group variation examines how each of the groups varies from the grand mean. We use group means as representative of the individual groups. The between group variation examines the variation of the group means from the grand mean.

Total variation (TSS)The Total sum of squares is equal to the sum of squared deviation of each score in all the groups from the grand mean.(All subjects belong to one population). TSS = BSS + WSS Example: Suppose if we are interested in finding whether the following response scores differ significantly between the three groups.

Total sum of squares

Within Sum of Squares Therefore Within sum of squares = 2+2+6=10

squares= (4x4)+(4x0)+(4x4)=32 Betweensum of Between Sum of Squares

Multiple Comparisons procedure • ANOVA is a " group comparison " that determines whether a statistically significant difference exists somewhere among the groups studied. • If a significant difference is indicated, ANOVA is usually followed by a " multiple comparison procedure " that compares combinations of groups to examine further any differences among them. • The most common multiple comparison procedure is the " pairwise comparison ", in which each group mean is compared (two at a time) to all other group means to determine which groups differ significantly.

Multiple t-test (Least-significant difference) Uses t-tests to perform all pairwise comparisons between group means. No adjustment is made to the error rate for multiple comparisons. Drawback with this procedure is that the type I error increases with the number of test made.

Bonferroni Test • Uses t tests to perform pairwise comparisons between group means, but controls overall error rate by setting the error rate for each test to the experiment wise error rate divided by the total number of tests. • Disadvantage with this procedure is that true overall level may be so much less than the maximum value ‘’ that none of individual tests are more likely to be rejected.

Tukey’s Method • Uses the studentized range statistic to make all of the pairwise comparisons between groups. Sets the experimentwise error rate at the error rate for the collection for all pairwise comparisons • This method is applicable when • 1. Size of the sample from each group are equal. • 2. Pairwise comparisons of means are of primary interest that is Null hypothesis of the form. Ho= to be considered.

Scheffé test • Performs simultaneous joint pairwise comparisons for all possible pairwise combinations of means. Uses the F sampling distribution. • This method is recommended when • 1. The size of the samples selected from the different populations are unequal. • 2. Comparisons other than simple pairwise comparison between two means are of interest.

ANOVA Statistically Significant Pairwise Group# equal Planned Bonferroni test Scheffe test Tukey test Flow Chart selecting Multiple comparison procedure No Yes No Yes No Yes

Two Way Analysis of Variances • Two-way ANOVA assesses the effect of two categorical explanatory variables ( called factor ) on a single continuous responce variable. • Analysis similar to One way procedure but contains additional variance break-up in the total sum of square. • Advantages: • 1. Economy, many hypothesis can be tested for almost the same cost. • 2. Ability to test for interactions that is to find whether the effect of an approach varies depending on the group of subjects.

GUIDELINES • Confirm that the assumptions of the analysis have been met. • Report the results of the ANOVA in a table. • Report the actual P value for each explanatory variable. • Report how any outlying data were treated in the analysis. • Name the statistical package or program used in the analysis.

Bios 101 Lecture 8: Analysis of Variance (ANOVA) of Variance Shankar Viswanathan, DrPH