Stat 112: Lecture 21 Notes

Stat 112: Lecture 21 Notes • Model Building (Brief Discussion) • Chapter 9.1: One way Analysis of Variance. • Homework 6 is due Friday, Dec. 1st. • I will be e-mailing you tonight or tomorrow some comments on your project ideas. • I will have the quizzes graded by tomorrow’s office hours (Wed. 1:30-2:30); otherwise, I will return to you next Tuesday.

Model Building • Among the potential explanatory variables, think about which explanatory variables address the question of interest. • For each explanatory variable, investigate whether a transformation is needed for it either because of curvature or crunching. • Consider adding polynomial terms for each variable if there is remaining curvature for the variable (use the procedure of adding higher orders as long as the highest order term has p-value < 0.05). • Consider interactions between the explanatory variables, adding the interaction if the p-value < 0.05 on the interaction term.

Analysis of Variance • The goal of analysis of variance is to compare the means of several (many) groups. • Analysis of variance is regression with only categorical variables • One-way analysis of variance: Groups are defined by one categorical variable. • Two-way analysis of variance: Groups are defined by two categorical variables.

Milgram’s Obedience Experiments • Subjects recruited to take part in an experiment on “memory and learning.” • The subject is the teacher. The subject conducted a paired-associated learning task with the student. The subject is instructed by the experimenter to administer a shock to the student each time he gave a wrong response. Moreover, the subject was instructed to “move one level higher on the shock generator each time the learner gives a wrong answer.” The subject was also instructed to announce the voltage level before administering a shock.

Four Experimental Conditions • Remote-Feedback condition: Student is placed in a room where he cannot be seen by the subject nor can his voice be heard; his answers flash silently on signal box. However, at 300 volts the laboratory walls resound as he pounds in protest. After 315 volts, no further answers appear, and the pounding ceases. • Voice-Feedback condition: Same as remote-feedback condition except that vocal protests were introduced that could be heard clearly through the walls of the laboratory.

Proximity: Same as the voice-feedback condition except that student was placed in the same room as the subject, a few feet from subject. Thus, he was visible as well as audible. • Touch-Proximity: Same as proximity condition except that student received a shock only when his hand rested on a shock plate. At the 150-volt level, the student demanded to be let free and refused to place his hand on the shock plate. The experimenter ordered the subject to force the victim’s hand onto the plate.

Two Key Questions • Is there any difference among the mean voltage levels of the four conditions? • If there are differences, what conditions specifically are different?

Multiple Regression Model for Analysis of Variance • To answer these questions, we can fit a multiple regression model with voltage level as the response and one categorical explanatory variable (condition). • We obtain a sample from each level of the categorical variable (group) and are interested in estimating the population means of the groups based on these samples. • Assumptions of multiple regression model for one-way analysis of variance: • Linearity: automatically satisfied. • Constant variance: Check if spread within each group is the same. • Normality: Check if distribution within each group is normally distributed. • Independence: Sample consists of independent observations.

Comparing the Groups • The coefficient on Condition[Proximity]=-26.25 means that proximity is estimated to have a mean that is 26.25 less than the mean of the means of all the conditions. • Sample mean of proximity group.

Effect Test tests null hypothesis that the mean in all four conditions is the same versus alternative hypothesis that at least two of the conditions have different means. • p-value of Effect Test < 0.0001. Strong evidence that population means are not the same for all four conditions.

JMP for One-way ANOVA • One-way ANOVA can be carried out in JMP either using Fit Model with a categorical explanatory variable or Fit Y by X with the categorical variable as the explanatory variable. • After using the Fit Y by X command, click the red triangle next to Oneway Analysis and then Display Options, Boxplots to see side by side boxplots and click Mean/ANOVA to see means of the different groups and the test of whether all groups have the same means. This test of whether all groups have the same means has p-value Prob>F in the ANOVA table.

Prob>F = p-value for test that all groups have same mean. Same as p-value for Effect test in Fit Model Output.

Two Key Questions • Is there any difference among the mean voltage levels of the four conditions? Yes, there is strong evidence of a difference. p-value of Effect Test < 0.0001. • If there are differences, what conditions specifically are different?

Testing whether each of the groups is different • Naïve approach to deciding which groups have mean that is different from the average of the means of all groups: Do t-test for each group and look for groups that have p-value <0.05. • Problem: Multiple comparisons.

Errors in Hypothesis Testing When we do one hypothesis test and reject null hypothesis if p-value <0.05, then the probability of making a Type I error when the null hypothesis is true is 0.05. We protect against falsely rejecting a null hypothesis by making probability of Type I error small.

Multiple Comparisons Problem • Compound uncertainty: When doing more than one test, there is an increase chance of making a mistake. • If we do multiple hypothesis tests and use the rule of rejecting the null hypothesis in each test if the p-value is <0.05, then if all the null hypotheses are true, the probability of falsely rejecting at least one null hypothesis is >0.05.

Multiple Comparisons Simulation • In multiplecomp.JMP, 20 groups are compared with sample sizes of ten for each group. • The observations for each group are simulated from a standard normal distribution. Thus, in fact, • Number of pairs found to have significantly different means using t-test at level

Multiple Comparison Simulation • In multiplecomp.JMP, 20 groups are compared with sample sizes of ten for each group. • The observations for each group are simulated from a standard normal distribution. Thus, in fact, • Number of groups found to have means different than average using t-test and rejecting if p-value <0.05.

Individual vs. Familywise Error Rate • When several tests are considered simultaneously, they constitute a family of tests. • Individual Type I error rate: Probability for a single test that the null hypothesis will be rejected assuming that the null hypothesis is true. • Familywise Type I error rate: Probability for a family of test that at least one null hypothesis will be rejected assuming that all of the null hypotheses are true. • When we consider a family of tests, we want to make the familywise error rate small, say 0.05, to protect against falsely rejecting a null hypothesis.

Bonferroni Method • General method for doing multiple comparisons for any family of k tests. • Denote familywise type I error rate we want by p*, say p*=0.05. • Compute p-values for each individual test -- • Reject null hypothesis for ith test if • Guarantees that familywise type I error rate is at most p*. • Why Bonferroni works: If we do k tests and all null hypotheses are true , then using Bonferroni with p*=0.05, we have probability 0.05/k to make a Type I error for each test and expect to make k*(0.05/k)=0.05 errors in total.

Tukey’s HSD • Tukey’s HSD is a method that is specifically designed to control the familywise type I error rate (at 0.05) for analysis of variance. • After Fit Model, click the red triangle next to the X variable and click LSMeans Tukey HSD.

Comparisons between groups that are in red are groups for which the null hypothesis that the group means are the same is rejected using the Tukey HSD procedure, which controls the familywise Type I error rate at 0.05. A confidence interval for the difference in group means that adjusts for multiple comparisons is shown in the third and fourth lines.

Assumptions in one-way ANOVA • Assumptions needed for validity of one-way analysis of variance p-values and CIs: • Linearity: automatically satisfied. • Constant variance: Spread within each group is the same. • Normality: Distribution within each group is normally distributed. • Independence: Sample consists of independent observations.

Rule of thumb for checking constant variance • Constant variance: Look at standard deviation of different groups by using Fit Y by X and clicking Means and Std Dev. • Rule of Thumb: Check whether (highest group standard deviation/lowest group standard deviation) is greater than 2. If greater than 2, then constant variance is not reasonable and transformation should be considered.. If less than 2, then constant variance is reasonable. • (Highest group standard deviation/lowest group standard deviation) =(131.874/63.640)=2.07. Thus, constant variance is not reasonable for Milgram’s data.

Transformations to correct for nonconstant variance • If standard deviation is highest for high groups with high means, try transforming Y to log Y or . If standard deviation is highest for groups with low means, try transforming Y to Y2. • SD is particularly low for group with highest mean. Try transforming to Y2. To make the transformation, right click in new column, click New Column and then right click again in the created column and click Formula and enter the appropriate formula for the transformation.

Transformation of Milgram’s data to Squared Voltage Level • Check of constant variance for transformed data: (Highest group standard deviation/lowest group standard deviation) = 1.63. Constant variance assumption is reasonable for voltage squared. • Analysis of variance tests are approximately valid for voltage squared data; reanalyzed data using voltage squared.

Analysis using Voltage Squared Strong evidence that the group mean voltage squared levels are not all the same. Strong evidence that remote has higher mean voltage squared level than proximity and touch-proximity and that voice-feedback has higher mean voltage squared level than touch-proximity, taking into account the multiple comparisons.

Rule of Thumb for Checking Normality in ANOVA • The normality assumption for ANOVA is that the distribution in each group is normal. Can be checked by looking at the boxplot, histogram and normal quantile plot for each group. • If there are more than 30 observations in each group, then the normality assumption is not important; ANOVA p-values and CIs will still be approximately valid even for nonnormal data if there are more than 30 observations in each group. • If there are less than 30 observations per group, then we can check normality by clicking Analyze, Distribution and then putting the Y variable in the Y, Columns box and the categorical variable denoting the group in the By box. We can then create normal quantile plots for each group and check that for each group, the points in the normal quantile plot are in the confidence bands. If there is nonnormality, we can try to use a transformation such as log Y and see if the transformed data is approximately normally distributed in each group.

One way Analysis of Variance: Steps in Analysis • Check assumptions (constant variance, normality, independence). If constant variance is violated, try transformations. • Use the effect test (commonly called the F-test) to test whether all group means are the same. • If it is found that at least two group means differ from the effect test, use Tukey’s HSD procedure to investigate which groups are different, taking into account the fact multiple comparisons are being done.

Stat 112: Lecture 21 Notes

Stat 112: Lecture 21 Notes

Presentation Transcript