Recall what we did last class in Exercise #2 on Class Handout #9:

Recall what we did last class in Exercise #2 on Class Handout #9:

2. A 0.05 significance level is chosen for a two-way ANOVA to study the height to which wheat grows for two types of wheat, labeled D and E, and four different types of soil, labeled C, G, H, and T. A random sample of heights is recorded in inches for each possible combination of wheat type and soil type with the following results: Soil Type C Soil Type G Soil Type H Soil Type T Wheat Type D 37.4 35.1 41.8 44.1 46.4 40.0 42.2 38.8 47.4 33.0 31.1 27.1 Wheat Type E 31.8 28.5 36.6 35.6 42.6 38.5 27.9 22.9 26.0 30.8 36.9 34.3 The data has been stored in the SPSS data file wheat_height. Identify the dependent (response) variable and the (explanatory) independent variables. (a) The dependent (response) variable is wheat height, and the independent (explanatory) variables are wheat type and soil type.

2.-continued Explain why only one define dummy variable is sufficient to represent the row independent variable, and only three dummy variables are sufficient to represent the column independent variable. Then define dummy variables sufficient to represent each independent variable. (b) A qualitative variable with k categories can be represented in a regression model with k 1 dummy variables. The variable wheat type can be represented by the following dummy variable: 1 for wheat type D R1= 0 for wheat type E The variable soil type can be represented by the following three dummy variables: 1 for soil type C C1= 0 otherwise 1 for soil type G C2= 0 otherwise 1 for soil type H C3= 0 otherwise

Use the dummy variables defined in part (b) to complete the description of the least squares regression equation below the two-way ANOVA table, and identify the terms that correspond to the f-tests in the ANOVA table. Then complete the Source column and the df column in the two-way ANOVA table. (c) (2)(4)  1 = 7 1 Wheat Type 3 Soil Type Interaction (1)(3) = 3 24 –(2)(4) = 16 24 –1 = 23 corresponds to the Soil Type f-test ^ Y = a + b1 R1 + b2C1+ b3C2+ b4C3+ b5R1C1+ b6R1C2+ b7R1C3 corresponds to the Wheat Type f-test corresponds to the Interaction f-test

2.-continued In the document titled Using SPSS Version 19.0, use SPSS with the section titled Creating new variables by recoding existing variables to add the dummy variables defined in part (b) to the data file; then use SPSS with the section Creating new variables with transformation of existing variables to add the products of dummy variables which are necessary to allow for interaction. (d) In the document titled Using SPSS Version 19.0, look at the section titled Performing a multiple linear regression with checks for multicollinearity and of linearity, homoscedasticity, and normality assumptions, and notice the following: steps 2 to 6 are not applicable, since the linearity assumption need only be checked for quantitative independent variables; step 7 was already done in part (d); step 10 is not applicable, since Levene’s test can be used to assess the uniform variance assumption (just as is done in a one-way ANOVA); step 12 is unnecessary, since much of the output from this step will be available in a better format from the SPSS option designed specifically for two-way ANOVA. (e)

In view of this, follow the instructions in steps 1, 8, 9, 11, 13, 14, and 15 to create graphs for assessing whether or not the normality assumption appears to be satisfied; then, decide whether or not this assumption appears to be satisfied. This histogram does not look bell-shaped, and the points on the normal probability plot show some departure from the diagonal line.

2.-continued (f) Based on the histogram and normal probability plot for the standardized residuals in part (e), explain why we might want to look at the skewness coefficient, the kurtosis coefficient, and the results of the Shapiro-Wilk test. Then use SPSS with the section titled Data Diagnostics to make a statement about whether or not non-normality needs to be a concern. Since there appears to be some possible evidence of non-normality in part (e), we want to know if non-normality needs to be a concern. Since the skewness and kurtosis coefficients are each well within two standard errors of zero, and the p = 0.047 is not less than 0.001 in the Shapiro-Wilk test, non-normality need not be a concern.

(g) In the document titled Using SPSS Version 19.0, use SPSS with the section titled Performing a two‑way ANOVA (analysis of variance)to obtain the output for a two-way ANOVA. We now continue with this exercise. Before running the SPSS program in part (g), go to the beginning of the handout to see the three possible scenarios for the results in a two-way ANOVA:

There are essentially three possible scenarios for the results in a two-way ANOVA: (1) Interaction effects and main effects are all found not to be statistically significant. In this scenario, the researcher would conclude that there is no difference in cell means, or in other words, the row variable and column variable are not significant in predicting the quantitative dependent variable. Hence, no further analysis would be necessary. (2) Interaction effects are found not to be statistically significant, but main effects are found to be statistically significant. In this scenario, the researcher would conclude that there is at least one difference in either row means, column means or both. Hence, further analysis would be necessary. When only two categories are compared, then describing the direction of the difference is necessary; when more than two categories are compared, then a multiple comparison procedure, such as those used when the f test in a one-way ANOVA is statistically significant, can be employed: Tukey’s Honestly Significant Difference (HSD) method, the Least Significant Difference (LSD) method, Bonferroni’s method, and Scheffe’s method are available when equal variances can be assumed, and Tamhane’s T2 method is available when unequal variances are assumed.

(3) Interaction effects are found to be statistically significant. Hence, further analysis would be necessary to describe the interaction effects. The researcher may or may not choose to consider main effects (using the procedures discussed in (2)), since these would now only be of secondary interest. To describe interaction, one possible method is Scheffe’s Multiple Comparison Procedure for Contrasts, which is performed in three steps: Return to Exercise #2: (g) In the document titled Using SPSS Version 19.0, use SPSS with the section titled Performing a two‑way ANOVA (analysis of variance)to obtain the output for a two-way ANOVA.

Observe that the SPSS output displays a warning that no post hoc tests were performed for type of wheat because there were fewer than three groups. Explain why are post hoc tests unnecessary when fewer than three groups are being compared. (h) When only two groups are being compared, then only one statistically significant difference is possible, making multiple comparison unnecessary.

Comment on what the results of Levene’s test tells us about the equal variance assumption for a two-way ANOVA. (i) Levene’s test is not statistically significant at the 0.05 level (f7, 16= 0.172, p = 0.988). Thus, we conclude that the equal variance assumption for a two-way ANOVA is satisfied.

2.-continued Looking at the ANOVA table displayed on the SPSS output, explain which of the three possible scenarios for the results in a two-way ANOVA have occurred with this data. (j) We find from the ANOVA table that there are statistically significant interaction effects at the 0.05 level (f3, 16= 9.282, p = 0.001). We also find that there are statistically significant main effects for both wheat type (f1, 16= 18.261, p = 0.001) and soil type (f3, 16= 7.609, p = 0.002), but describing the interaction is the primary interest.

2.-continued Complete the analysis of this two-way ANOVA data according to the scenario for the results in part (j). (k)

2.-continued

(3) Interaction effects are found to be statistically significant. Hence, further analysis would be necessary to describe the interaction effects. The researcher may or may not choose to consider main effects (using the procedures discussed in (2)), since these would now only be of secondary interest. To describe interaction, one possible method is Scheffe’s Multiple Comparison Procedure for Contrasts, which is performed in three steps: Note: we shall let yij represent the cell mean in the data for row i and column j. Column #g Column #h Find the absolute value of each contrast involving cell means of the form Step 1: Row #i yig yih yig yih – – yjg + yjh = yig yih – – yjg – yjh = Row #j yjg yjh yih yig yjg – – – yjh

2.-continued Complete the analysis of this two-way ANOVA data according to the scenario for the results in part (j). (k) Since we have concluded that there are significant interaction effects, we need to describe the interaction effects, and we shall use Scheffe’s Multiple Comparison Procedure for Contrasts to do this. From the SPSS output, we construct the following table of cell means: Soil Type C Soil Type G Soil Type H Soil Type T Wheat Type D 42.8 38.1 43.5 30.4 Wheat Type E 32.3 38.9 25.6 34.0 absolute value of contrasts: 11.4 1.2 | 38.1 – 42.8 – 32.3 + 25.6 | = | 38.1 – 43.5 – 32.3 + 38.9 | = 12.6 9.4 | 43.5 – 42.8 – 38.9 + 25.6 | = | 38.1 – 30.4 – 32.3 + 34.0 | = 20.8 8.2 | 42.8 – 30.4 – 25.6 + 34.0 | = | 43.5 – 30.4 – 38.9 + 34.0 | =

Step 2: Declare a contrast to be statistically significant if it has an absolute value larger than (r– 1)(c– 1)(f(r–1)(c–1) , n–rc ; )(MSE)(1/nig + 1/nih + 1/njg + 1/njh) This is the tabled f value defining the rejection region in the test concerning interaction. Mean Square Error with interaction terms in the model (1) (3) (3.24) (11.829)

Step 2: Declare a contrast to be statistically significant if it has an absolute value larger than (r– 1)(c– 1)(f(r–1)(c–1) , n–rc ; )(MSE)(1/nig + 1/nih + 1/njg + 1/njh) If all cell sizes are equal, say to m, then this quantity reduces to 4 / m . This is the tabled f value defining the rejection region in the test concerning interaction. Mean Square Error with interaction terms in the model Soil Type C Soil Type G Soil Type H Soil Type T Wheat Type D 37.4 35.1 41.8 44.1 46.4 40.0 42.2 38.8 47.4 33.0 31.1 27.1 Wheat Type E 31.8 28.5 36.6 35.6 42.6 38.5 27.9 22.9 26.0 30.8 36.9 34.3 = 12.38 (1) (3) (3.24) (11.829) (1/3 + 1/3 + 1/3 + 1/3)

Step 2: Step 3: Declare a contrast to be statistically significant if it has an absolute value larger than (r– 1)(c– 1)(f(r–1)(c–1) , n–rc ; )(MSE)(1/nig + 1/nih + 1/njg + 1/njh) If all cell sizes are equal, say to m, then this quantity reduces to 4 / m . This is the tabled f value defining the rejection region in the test concerning interaction. Mean Square Error with interaction terms in the model Summarize the results by describing the interaction effect corresponding to each statistically significant contrast. There are basically two types of interaction effects which can occur: One is where the difference between means is in the same direction in two different categories but is of a different magnitude in the two categories. The other is where the difference between means is in opposite directions in two different categories.

From the SPSS output, we construct the following table of cell means: Soil Type C Soil Type G Soil Type H Soil Type T Wheat Type D 42.8 38.1 43.5 30.4 Wheat Type E 32.3 38.9 25.6 34.0 absolute value of contrasts: 11.4 1.2 | 38.1 – 42.8 – 32.3 + 25.6 | = | 38.1 – 43.5 – 32.3 + 38.9 | = 12.6 9.4 | 43.5 – 42.8 – 38.9 + 25.6 | = | 38.1 – 30.4 – 32.3 + 34.0 | = 20.8 8.2 | 42.8 – 30.4 – 25.6 + 34.0 | = | 43.5 – 30.4 – 38.9 + 34.0 | = = 12.38 (1) (3) (3.24) (11.829) (1/3 + 1/3 + 1/3 + 1/3) With  = 0.05, we conclude the following: The amount that mean height with soil type G exceeds mean height with soil type H is larger with wheat type E than with wheat type D. For wheat type D, mean height with soil type H is larger than mean height with soil type T; for wheat type E, this difference is in the opposite direction.

2.-continued The conclusions were stated in terms of comparing the differences between soil types for the two wheat types. Alternatively, the conclusions can be stated in terms of comparing the differences between wheat types for the four soil types.

2.-continued Although describing the interaction is of primary interest, significant main effects for both wheat type and soil type were found. From Bonferroni’s multiple comparison method with  = 0.05, we conclude the following: The mean height is greater with Soil Type G than with Soil Type C (p = 0.049). The mean height is greater with Soil Type G than with Soil Type H (p = 0.017). The mean height is greater with Soil Type G than with Soil Type T (p = 0.002). With  = 0.05, we also conclude that mean height is greater with Wheat Type D than with Wheat Type E (p = 0.001). The “Partial Eta Squared” column of the ANOVA table on the SPSS output displays the proportion of total variance in the dependent variable (height) accounted for by each independent variable: Wheat Type, Soil Type, and Interaction between Wheat Type and Soil Type. Find the percent of total variance accounted for by each of these three independent variables. (l)

The “Partial Eta Squared” column of the ANOVA table on the SPSS output displays the proportion of total variance in the dependent variable (height) accounted for by each independent variable: Wheat Type, Soil Type, and Interaction between Wheat Type and Soil Type. Find the percent of total variance accounted for by each of these three independent variables. (l) Wheat Type accounts for 53.3% of the variance in height, Soil Type accounts for 58.8% of the variance in height, and Interaction between Wheat Type and Soil Type accounts for 63.5% of the variance in height.

3. Mean yearly income ($1000s) for voters who are employed full time in a state is being studied. A 0.05 significance level is chosen for a two-way ANOVA to compare mean yearly income for males and females and for three areas of residence: rural, suburban, and urban. The random sample of voters to be used consists of those in the data set of Exercise #1 on Class Handout #1; this data is stored in the SPSS data file survey. (Note: Examination of the data will reveal that each cell size is 5.) Identify the dependent (response) variable and the (explanatory) independent variables. (a) The dependent (response) variable is yearly income, and the independent (explanatory) variables are sex of the voter and area of residence.

Explain why only one define dummy variable is sufficient to represent the one of the independent variables, and only two dummy variables are sufficient to represent the other independent variable. Then define dummy variables sufficient to represent each independent variable. (b) A qualitative variable with k categories can be represented in a regression model with k 1 dummy variables. The variable sex of the voter can be represented by the following dummy variable: Note that the variable sex in the data file is already coded this way. 1 for female sex= 0 for male The variable area of residence can be represented by the following two dummy variables: 1 for rural C1= 0 otherwise 1 for suburban C2= 0 otherwise

3.-continued Use the dummy variables defined in part (b) to complete the description of the least squares regression equation below the two-way ANOVA table, and identify the terms that correspond to the f-tests in the ANOVA table. Then complete the Source column and the df column in the two-way ANOVA table. (c) (2)(3)  1 = 5 1 Sex 2 Residence Interaction (1)(2) = 2 30 –(2)(3) = 24 30 –1 = 29 corresponds to the Residence f-test ^ b4(sex)C1+ b5(sex)C2 Y = a + b1 sex + b2C1+ b3C2 + corresponds to the Interaction f-test corresponds to the Sex f-test

In the document titled Using SPSS Version 19.0, use SPSS with the section titled Creating new variables by recoding existing variables to add the dummy variables defined in part (b) to the data file; then use SPSS with the section Creating new variables with transformation of existing variables to add the products of dummy variables which are necessary to allow for interaction. (d) In the document titled Using SPSS Version 19.0, look at the section titled Performing a multiple linear regression with checks of linearity, homoscedasticity, and normality assumptions, and notice the following: steps 2 to 6 are not applicable, since the linearity assumption need only be checked for quantitative independent variables; step 7 was already done in part (d); step 10 is not applicable, since Levene’s test can be used to assess the uniform variance assumption (just as is done in a one-way ANOVA); step 12 is unnecessary, since much of the output from this step will be available in a better format from the SPSS option designed specifically for two-way ANOVA. (e)

3(e)-continued In view of this, follow the instructions in steps 1, 8, 9, 11, 13, 14, and 15 to create graphs for assessing whether or not the normality assumption appears to be satisfied; then, decide whether or not this assumption appears to be satisfied. This histogram is not too different from a bell-shaped distribution, and the points on the normal probability plot do not depart too drastically from the diagonal line.

(f) In the document titled Using SPSS Version 19.0, use SPSS with the section titled Performing a two‑way ANOVA (analysis of variance)to obtain the output for a two-way ANOVA.

Observe that the SPSS output displays a warning that no post hoc tests were performed for sex because there were fewer than three groups. Explain why are post hoc tests unnecessary when fewer than three groups are being compared. (g) When only two groups are being compared, then only one statistically significant difference is possible, making multiple comparison unnecessary.

Comment on what the results of Levene’s test tells us about the equal variance assumption for a two-way ANOVA. (h) Levene’s test is not statistically significant at the 0.05 level (f5, 24= 0.941, p = 0.473). Thus, we conclude that the equal variance assumption for a two-way ANOVA is satisfied.

3.-continued Looking at the ANOVA table displayed on the SPSS output, explain which of the three possible scenarios for the results in a two-way ANOVA have occurred with this data. (i) We find from the ANOVA table that there are no statistically significant interaction effects at the 0.05 level (f5, 24= 2.702, p = 0.087). Even though the interaction is not statistically significant at the chosen 0.05 level, the fact that the p-value is less than 0.10 will make some researchers think that perhaps there could be statistical significance in a future study with a larger sample size.

Looking at the ANOVA table displayed on the SPSS output, explain which of the three possible scenarios for the results in a two-way ANOVA have occurred with this data. (i) We find from the ANOVA table that there are no statistically significant interaction effects at the 0.05 level (f5, 24= 2.702, p = 0.087). Even though the interaction is not statistically significant at the chosen 0.05 level, the fact that the p-value is less than 0.10 will make some researchers think that perhaps there could be statistical significance in a future study with a larger sample size. We find that there are statistically significant main effects for both sex (f1, 24= 15.086, p = 0.001) and residence (f2, 24= 6.541, p = 0.005). Since only the main effects are statistically significant, identifying these significant main effects is of primary interest.

3.-continued

3.-continued Complete the analysis of this two-way ANOVA data according to the scenario for the results in part (i). (j)

Since we have concluded that there are significant main effects, we need to identify significant differences in row means and in column means. The row variable sex has only two categories, and since we concluded that the difference in mean yearly income between sexes was different, we just need to identify the direction of the difference. The column variable area of residence has three categories, and since we concluded that there is at least one difference in mean yearly income, we need to use multiple comparison to identify the differences. We shall use Bonferroni’s method to do this. From Bonferroni’s multiple comparison method with  = 0.05, we conclude the following: The mean yearly income is greater in the suburban area than in the rural area (p = 0.010). The mean yearly income is greater in the urban area than in the rural area (p = 0.020). With  = 0.05, we also conclude that mean yearly income is greater for males than for females (p = 0.001).

3.-continued The “Partial Eta Squared” column of the ANOVA table on the SPSS output displays the proportion of total variance in the dependent variable (income) accounted for by each independent variable: sex, residence, and interaction between sex and residence. Find the percent of total variance accounted for by each of these three independent variables. (k) Sex of the voter accounts for 38.6% of the variance in yearly income, area of residence accounts for 35.3% of the variance in yearly income, and Interaction between sex and residence accounts for 18.4% of the variance in yearly income.

3. Read the “INTRODUCTION” and “TWO-WAY ANALYSIS OF VARIANCE” sections of Chapter 6. Open the SPSS data file Job Satisfaction. (a) In the “PRACTICAL EXAMPLE” section, read the discussion for assumptions number 1 to 4 in the subsection “Hypothesis Testing”; then, use the Analyze> Descriptive Statistics> Explore options in SPSS to obtain the information in Table 6.2 and the graphs in Figure 6.1. (Table 6.3 displayed in this subsection can be obtained from work to be done in the subsection which follows.) (b) In the “PRACTICAL EXAMPLE” section, read the subsection “How to Use SPSS to Run Two-Way ANOVA”, and follow the instructions with SPSS, which should produce the output displayed in Table 6.4 to Table 6.9 and in Figure 6.2. Compare the syntax file commands generated by the output with those shown on page 167 of the textbook. Read the remaining portion of Chapter 6.

Recall what we did last class in Exercise #2 on Class Handout #9:

Recall what we did last class in Exercise #2 on Class Handout #9:

Presentation Transcript

Recall what we did on Exercise #1 on Class Handout #7:

Recall what we did on Exercise #2 on Class Handout #7:

What did we do last time?

What we did last year on WP2:

What did we cover in the last class?

What did we talk about last time?

Recall what we did last class in Exercise #2 on Class Handout #9:

What we learned in last class: