Slides Prepared by JOHN S. LOUCKS St. Edward’s University

Slides Prepared by JOHN S. LOUCKS St. Edward’s University

Chapter 13 Analysis of Variance and Experimental Design • An Introduction to Analysis of Variance • Analysis of Variance: Testing for the Equality of k Population Means • Multiple Comparison Procedures • An Introduction to Experimental Design • Completely Randomized Designs • Randomized Block Design • Factorial Experiments

An Introduction to Analysis of Variance • Analysis of Variance(ANOVA) can be used to test for the equality of three or more population means using data obtained from observational or experimental studies. • We want to use the sample results to test the following hypotheses. H0: 1=2=3=. . . = k Ha: Not all population means are equal • If H0 is rejected, we cannot conclude that all population means are different. • Rejecting H0 means that at least two population means have different values.

Assumptions for Analysis of Variance • For each population, the response variable is normally distributed. • The variance of the response variable, denoted 2, is the same for all of the populations. • The observations must be independent.

Analysis of Variance:Testing for the Equality of K Population Means • Between-Treatments Estimate of Population Variance • Within-Treatments Estimate of Population Variance • Comparing the Variance Estimates: The F Test • ANOVA Table

Between-Treatments Estimate of Population Variance • A between-treatments estimate of 2 is called the mean square due to treatments(MSTR). • The numerator of MSTR is called the sum of squares due to treatments(SSTR). • The denominator of MSTR represents the degrees of freedom associated with SSTR.

Within-Treatments Estimate of Population Variance • The estimate of 2 based on the variation of the sample observations within each treatment is called the mean square due to error(MSE). • The numerator of MSE is called the sum of squares due to error(SSE). • The denominator of MSE represents the degrees of freedom associated with SSE.

Comparing the Variance Estimates: The F Test • If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of MSTR/MSE is an F distribution with MSTR d.f. equal to k - 1 and MSE d.f. equal to nT - k. • If the means of the k populations are not equal, the value of MSTR/MSE will be inflated because MSTR overestimates 2. • Hence, we will reject H0 if the resulting value of MSTR/MSE appears to be too large to have been selected at random from the appropriate F distribution.

Test for the Equality of k Population Means • Hypotheses H0: 1=2=3=. . . = k Ha: Not all population means are equal • Test Statistic F = MSTR/MSE

Test for the Equality of k Population Means • Rejection Rule Using test statistic: Reject H0 if F > Fa Using p-value: Reject H0 if p-value < a where the value of F is based on an F distribution with k - 1 numerator degrees of freedom and nT - k denominator degrees of freedom

Sampling Distribution of MSTR/MSE • The figure below shows the rejection region associated with a level of significance equal to  where F denotes the critical value. Do Not Reject H0 Reject H0 MSTR/MSE F Critical Value

ANOVA Table Source of Sum of Degrees of Mean Variation Squares Freedom Squares F TreatmentSSTR k - 1 MSTR MSTR/MSE Error SSE nT - k MSE TotalSST nT - 1 SST divided by its degrees of freedom nT - 1 is simply the overall sample variance that would be obtained if we treated the entire nT observations as one data set.

Example: Reed Manufacturing • Analysis of Variance J. R. Reed would like to know if the mean number of hours worked per week is the same for the department managers at her three manufacturing plants (Buffalo, Pittsburgh, and Detroit). A simple random sample of 5 managers from each of the three plants was taken and the number of hours worked by each manager for the previous week is shown on the next slide.

Example: Reed Manufacturing • Sample Data Plant 1 Plant 2 Plant 3 Observation Buffalo Pittsburgh Detroit 1 48 73 51 2 54 63 63 3 57 66 61 4 54 64 54 5 62 74 56 Sample Mean 55 68 57 Sample Variance26.0 26.5 24.5

Example: Reed Manufacturing • Hypotheses H0:  1= 2= 3 Ha: Not all the means are equal where:  1 = mean number of hours worked per week by the managers at Plant 1  2 = mean number of hours worked per week by the managers at Plant 2  3 = mean number of hours worked per week by the managers at Plant 3

Example: Reed Manufacturing • Mean Square Due to Treatments Since the sample sizes are all equal x = (55 + 68 + 57)/3 = 60 SSTR = 5(55 - 60)2 + 5(68 - 60)2 + 5(57 - 60)2 = 490 MSTR = 490/(3 - 1) = 245 • Mean Square Due to Error SSE = 4(26.0) + 4(26.5) + 4(24.5) = 308 MSE = 308/(15 - 3) = 25.667 =

Example: Reed Manufacturing • F - Test If H0 is true, the ratio MSTR/MSE should be near 1 because both MSTR and MSE are estimating 2. If Hais true, the ratio should be significantly larger than 1 because MSTR tends to overestimate 2.

Example: Reed Manufacturing • Rejection Rule Using test statistic: Reject H0 if F > 3.89 Using p-value: Reject H0 if p-value < .05 where F.05 = 3.89 is based on an F distribution with 2 numerator degrees of freedom and 12 denominator degrees of freedom

Example: Reed Manufacturing • Test Statistic F = MSTR/MSE = 245/25.667 = 9.55 • Conclusion F = 9.55 > F.05 = 3.89, so we reject H0. The mean number of hours worked per week by department managers is not the same at each plant.

Example: Reed Manufacturing • ANOVA Table Source of Sum of Degrees of Mean Variation Squares Freedom Square F Treatments 490 2 245 9.55 Error308 12 25.667 Total798 14

Using Excel’s Anova: Single Factor Tool • Step 1Select the Tools pull-down menu • Step 2Choose the Data Analysis option • Step 3 Choose Anova: Single Factor from the list of Analysis Tools … continued

Using Excel’s Anova: Single Factor Tool • Step 4When the Anova: Single Factor dialog box appears: Enter B1:D6 in the Input Range box Select Grouped By Columns Select Labels in First Row Enter .05 in the Alpha box Select Output Range Enter A8 (your choice) in the Output Range box Click OK

Using Excel’s Anova: Single Factor Tool • Value Worksheet (top portion)

Using Excel’s Anova: Single Factor Tool • Value Worksheet (bottom portion)

Using Excel’s Anova: Single Factor Tool • Using the p-Value • The value worksheet shows that the p-value is .00331 • The rejection rule is “Reject H0 if p-value < .05” • Thus, we reject H0 because the p-value = .00331 < a = .05 • We conclude that the mean number of hours worked per week by the managers differ among the three plants

Multiple Comparison Procedures Suppose that analysis of variance has provided statistical evidence to reject the null hypothesis of equal population means. Fisher’s least significance difference (LSD) procedure can be used to determine where the differences occur.

Fisher’s LSD Procedure • Hypotheses H0: i = j Ha: ij • Test Statistic

Fisher’s LSD Procedure • Rejection Rule Using test statistic: Reject H0 if t < -ta/2 or t > ta/2 Using p-value: Reject H0 if p-value < a where the value of ta/2 is based on a t distribution with nT - k degrees of freedom.

Fisher’s LSD Procedure based on the Test Statistic xi - xj _ _ • Hypotheses H0: i = j Ha: ij • Test Statistic xi - xj • Rejection Rule Reject H0 if |xi - xj| > LSD where _ _ _ _

Example: Reed Manufacturing • Fisher’s LSD Assuming  = .05, • Hypotheses (A)H0: 1 = 2 Ha: 12 • Test Statistic |x1 - x2| = |55 - 68| = 13 • Conclusion The mean number of hours worked at Plant 1 is not equal to the mean number worked at Plant 2. _ _

Example: Reed Manufacturing • Fisher’s LSD • Hypotheses (B) H0: 1 = 3 Ha: 13 • Test Statistic |x1 - x3| = |55 - 57| = 2 • Conclusion There is no significant difference between the mean number of hours worked at Plant 1 and the mean number of hours worked at Plant 3. _ _

Example: Reed Manufacturing • Fisher’s LSD • Hypotheses (C) H0: 2 = 3 Ha: 23 • Test Statistic |x2 - x3| = |68 - 57| = 11 • Conclusion The mean number of hours worked at Plant 2 is not equal to the mean number worked at Plant 3. _ _

Type I Error Rates • The comparisonwise Type I error ratea indicates the level of significance associated with a single pairwise comparison • The experimentwise Type I error rateaEW is the probability of making a Type I error on at least one of the (k – 1)! pairwise comparisons aEW = 1 – (1 – a)(k – 1)! • The experimentwise Type I error rate gets larger for problems with more populations (larger k)

An Introduction to Experimental Design • Statistical studies can be classified as being either experimental or observational. • In an experimental study, one or more factors are controlled so that data can be obtained about how the factors influence the variables of interest. • In an observational study, no attempt is made to control the factors. • Cause-and-effect relationships are easier to establish in experimental studies than in observational studies.

An Introduction to Experimental Design • A factor is a variable that the experimenter has selected for investigation. • A treatment is a level of a factor. • Experimental unitsare the objects of interest in the experiment. • A completely randomized designis an experimental design in which the treatments are randomly assigned to the experimental units. • If the experimental units are heterogeneous, blocking can be used to form homogeneous groups, resulting in a randomized block design.

Completely Randomized Designs • Between-Treatments Estimate of Population Variance • Within-Treatments Estimate of Population Variance • Comparing the Variance Estimates: The F Test • ANOVA Table • Pairwise Comparisons

Between-Treatments Estimateof Population Variance • In the context of experimental design, the between-samples estimate of 2 is referred to as the mean square due to treatments(MSTR). • The formula for MSTR is • The numerator is called the sum of squares due to treatments(SSTR). • The denominator k - 1 represents the degrees of freedom associated with SSTR.

Within-Treatments Estimate of Population Variance • The second estimate of 2, the within-samples estimate, is referred to as the mean square due to error(MSE). • The formula for MSE is • The numerator is called the sum of squares due to error(SSE). • The denominator nT - k represents the degrees of freedom associated with SSE.

ANOVA Table for aCompletely Randomized Design Source of Sum of Degrees of Mean Variation Squares Freedom Squares F TreatmentsSSTR k - 1 ErrorSSE nT - k TotalSST nT - 1

Example: Home Products, Inc. • Completely Randomized Design Home Products, Inc. is considering marketing a long-lasting car wax. Three different waxes (Type 1, Type 2, and Type 3) have been developed. In order to test the durability of these waxes, 5 new cars were waxed with Type 1, 5 with Type 2, and 5 with Type 3. Each car was then repeatedly run through an automatic carwash until the wax coating showed signs of deterioration. The number of times each car went through the carwash is shown on the next slide. Home Products, Inc. must decide which wax to market. Are the three waxes equally effective?

Example: Home Products, Inc. Wax Wax Wax Observation Type 1 Type 2 Type 3 1 27 33 29 2 30 28 28 3 29 31 30 4 28 30 32 5 31 30 31 Sample Mean 29.0 30.4 30.0 Sample Variance2.5 3.3 2.5

Example: Home Products, Inc. • Hypotheses H0: 1=2=3 Ha: Not all the means are equal where: 1 = mean number of washes for Type 1 wax 2 = mean number of washes for Type 2 wax 3 = mean number of washes for Type 3 wax

Example: Home Products, Inc. • Mean Square Between Treatments Since the sample sizes are all equal: x = (x1 + x2 + x3)/3 = (29 + 30.4 + 30)/3 = 29.8 SSTR = 5(29–29.8)2 + 5(30.4–29.8)2 + 5(30–29.8)2 = 5.2 MSTR = 5.2/(3 - 1) = 2.6 • Mean Square Error SSE = 4(2.5) + 4(3.3) + 4(2.5) = 33.2 MSE = 33.2/(15 - 3) = 2.77 _ _ _ =

Example: Home Products, Inc. • Rejection Rule Using test statistic: Reject H0 if F > 3.89 Using p-value: Reject H0 if p-value < .05 where F.05 = 3.89 is based on an F distribution with 2 numerator degrees of freedom and 12 denominator degrees of freedom

Example: Home Products, Inc. • Test Statistic F = MSTR/MSE = 2.6/2.77 = .939 • Conclusion Since F = .939 < F.05 = 3.89, we cannot reject H0. There is insufficient evidence to conclude that the mean number of washes for the three wax types are not all the same.

Example: Home Products, Inc. • ANOVA Table Source of Sum of Degrees of Mean Variation Squares Freedom Squares F Treatments5.2 2 2.60 .9398 Error 33.2 12 2.77 Total38.4 14

Using Excel’s ANOVA: Single Factor Tool • Value Worksheet (top portion)

Using Excel’s ANOVA: Single Factor Tool • Value Worksheet (bottom portion)

Using Excel’s ANOVA: Single Factor Tool • Conclusion Using the p-Value • The value worksheet shows a p-value of .418 • The rejection rule is “Reject H0 if p-value < .05” • Because .418 > .05, we cannot reject H0. There is insufficient evidence to conclude that the mean number of washes for the three wax types are not all the same.

Randomized Block Design • ANOVA Procedure • Computations and Conclusions

Slides Prepared by JOHN S. LOUCKS St. Edward’s University