Download Presentation
## Overview of experimental research

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Chapter 9: The analysis of variance for simple experiments**(single factor, unrelated groups designs).**Overview of experimental research**• Groups start off the same on every measure. • During the experiment, groups are TREATED DIFFERENTLY • Responses thought to be effected by the different treatments are then measured • If the group means become different from each other, the differences may have been caused, in part, by the different ways the groups were treated. • Determining whether the differences between group means result simply from sampling fluctuation or are (probably) due in part to the treatment differences is the job of the statistical analysis.**Let’s take that one point at a time. At the beginning of**an experiment: • Participants are randomly selected from a population. Then they are randomly assigned to treatment groups. • Thus, at the beginning of the study, each treatment group is a random (sub)sample from a specific population.**Groups start off much the same in every possible way**• Since each treatment group is a random sample from the population, each group’s mean and variance will be similar to that of the population. • That is, each group’s mean will be a best estimate of mu, the population mean. • And the spread of scores around each group’s mean will yield a best estimate of sigma2 and sigma.**So: At the beginning of an experiment the treatment groups**differ only because of random sampling fluctuation.When there are different people in each group, the random sampling fluctuation is caused by 1.) random individual differences and 2.) random measurement problems.**Sampling fluctuation is the product of the inherent**variability of the data. That is what is indexed by sigma2, the average squared distance of scores from the population mean, mu.**To summarize:**• Since the group means and variances of random samples will be similar to that of the population, they will be similar to each other. • This is true for any and all things you can measure. • The only differences among the groups at the beginning of the study on any and all measures will be the mostly minor differences associated with random sampling fluctuation caused by the fact that there are different people in each group and that there are always random measurement problems (ID + MP).**The ultimate question**• If we then treat the groups differently, will the treatments make the groups more different from each other at the end of the experiment than if only sampling fluctuation created their differences?**In the simplest experiments (Ch 9)**• In the simplest experiments, the groups are exposed to treatments that vary on a single dimension. • The dimension on which treatments of the groups vary is called the independent variable. • We call the specific ways the groups are treated the “levels of the independent variable.”**The independent variable**• An independent variable can be any preplanned difference in the way groups are treated. Which kind of difference you chose relates to the experimental hypothesis, H1. • For example, if you think you have a new medication for bipolar disorder, you would compare the effect of various doses of the new drug to placebo in a random sample of bipolar patients. Thus, the groups would differ in terms of the dose of drug. • Proper experimental design would ensure that the differences in dose received is the only way the groups will be systematically treated differently from each other.**Why is it called the “independent variable”?**Remember, we call the different treatments the “levels” of the independent variable. Who gets which level is random. It is determined solely by the group to which a participant is randomly assigned. • So, any difference in the way a person is treated during the experiment is unrelated to or “independent of” the infinite number of pre-existing differences that precluded causal statements in correlational research.**The dependent variable**• Relevant responses (called dependent variables) are then measured to see whether the independent variable caused differences among the treatment conditions beyond those expected given ordinary sampling fluctuation. • That is, we want to see whether response are related to (dependent on) the different levels of the independent variable to which the treatment groups were exposed.**Differences after the experiment among group means on the**dependent variable may well be simple sampling fluctuation! • The groups will always differ somewhat from each other on anything you measure due to sampling fluctuation. • With 3 groups, one will score highest, one lowest and one in the middle just by chance. In four groups one will score highest, one lowest, with two in the middle, one higher than the other. Etc. • So the simple fact that the groups differ somewhat is not enough to determine that the independent variable, the different ways the groups were treated, caused the differences. • We have to determine whether the groups are more different than they should be if only sampling fluctuation is at work.**H0 & H1: If one is wrong, the other must be right.**• Either the independent variable would cause differences in responses (the dependent variable) in the population as a whole or it would not. • H0: The different conditions embodied by the independent variable would have NO EFFECT if administered to the whole population. • H1: The different conditions embodied by the independent variable would produce different responses if administered to the whole population**The population can be expected to respond to the different**levels of the IV similarly to the samples • Remember, random samples are representative of the population from which they are drawn. • If the different levels of the independent variable cause the groups to differ (more than they would from simple sampling fluctuation), the same thing should be true for the rest of the population.**For example:**• Say a new psychotherapy causes a random sample of anxious patients to become less anxious in comparison to treatment groups given more conventional approaches or pill placebo. • Then, we would expect all anxious patients to respond better to the new treatment than to the ones to which it was compared.**However:**• As in the case of correlation, we don’t want to toss out treatments that we know work because the new treatment happens to do better in an experiment. • We would want to be sure that the difference after treatment is not just a chance finding based on random sampling fluctuation.**The Null Hypothesis**• The null hypothesis (H0) states that the only reason that the treatment group means are different is sampling fluctuation. It says that the independent variable causes no systematic differences among the groups. • A corollary: Try the experiment again and a different group will score highest, another lowest. If that is so, you should not generalize from which group in your study scored highest or lowest to the population from which the samples were drawn. • Your best prediction remains that everyone will score at the mean on the dependent variable, that treatment condition will not predict response.**People often respond to random sampling fluctuation as if**something was causing a difference. • People take all kinds of food supplements because they believe the supplements (.e.g.echinesia) will make colds go away more quickly. • If you tried it, and it worked wouldn’t you tell your friends? Wouldn’t you try it again with your next cold? • Having recovered quickly after taking something provides the evidence. After all, its what happened to you!**But did the food supplement really make a difference?**• To this point NO food supplement has proved to shorten colds when carefully tested. • The mistake lay in taking random variation in the duration of a cold as evidence that the echinesia (or whatever) did something beneficial. • That’s ok if it is just your pocket book that is affected. But what if you were an FDA scientist? Wouldn’t people expect better evidence of efficacy before they gave the food supplement company an enormous amount of their money?**We call rejecting a true null hypothesis a “Type 1**Error.” • The first rule in science is “Do not increase error.” • Scientists don’t like to say something will make a difference when it isn’t true. • So, before we toss away proven treatments or say that something will cause illness or health, we want to be fairly sure that we are not just responding to sampling fluctuation.**The scientist’s answer: test the null hypothesis**• So, as we did with correlation and regression, we assume that everything is equal, all treatments have the same effect, unless we can prove otherwise. • The null hypothesis says that the treatments do not systematically differ; one is as good as another. • As usual, we test the null hypothesis by asking it to make a prediction and then establishing a range of results for the test statistic consistent with that prediction. • As usual, that range is a 95% CI for the test statistic.**The test statistic: F and t tests**• In Chapter 8, you learned to use Pearson’s r as a test statistic. • When it fell outside a 95% confidence interval consistent with the null hypothesis, we rejected the null. • In experimental research, we generally use the F and t statistics to test the null. • When there are only two groups, t is used as the test statistic.When there are three or more groups, Fischer’s ratio (called the F statistic) is used as the test statistic.**Nonsignificant results**• Each actual t or F will either fall inside or outside the CI.95 that is consistent with the null hypothesis. • Results inside the range consistent with the null are called nonsignificant. Results outside the 95% CI are called significant. One or the other must occur in each statistical analysis. • If you get nonsignificant results, you have failed to reject the null and you may not extrapolate from the differences among your experimental (treatment) groups to the population. You must go back to saying that your best prediction is that everyone will be equal and the differences among the treatments don’t matter.**If t or F falls outside the CI.95, you have statistically**significant findings. • If your results are statistically significant, then the results are not consistent with the notion that the between group differences are solely the product of sampling fluctuation. • Since that is what the null says, you must declare the null false and reject it. • If the experiment is well run, the differences in the way you treated the groups will be the only systematic difference among the groups.**Getting statistically significant findings is important.**• If you get them, you must say, as a scientist, that the responses of the different treatment groups should be mirrored by the population as a whole were it exposed to the same conditions. • Scientists tend to be cautious with making such statements, bracketing them with “more research is necessary” type phrases. • But they still have to say it.**The Experimental Hypothesis (H1)**• Unlike the null, H1 is different in each experiment. • The experimental hypothesis tells us the way(s) we must treat the groups differently and what to measure. • Therefore, the experimental hypothesis tells us (in broad terms) how to design the experiment. • For example, if we hypothesize that embarrassed people remember sad things better, we need to embarrass different groups to different degrees (not at all to a lot) and measure their memories for sad and happy events.**The Experimental Hypothesis**• The experimental hypothesis (H1) states that between group differences on the dependent variable are caused by the independent variable as well as by sampling fluctuation. • If F or t is significant and the null is shown to be false, and the only systematic difference among the groups is how they were treated (the differing levels of the IV), then H1 must be right. • In that case, we must extrapolate our findings to the rest of the population, assuming that they would respond as did our different treatment groups.**The F test**• In order to statistically test the null hypothesis, we are going to ask it to make a prediction about the relationship between two estimates of sigma2. • In an F test, we compare these two different ways of calculating mean squares to estimate the population variance. • To estimate sigma2 you always divide a sum of squares by its degrees of freedom. • Remember, random sampling fluctuation is indexed by sigma2, the population variance.**Our two estimates of sigma2**• One way to estimate sigma2 is to find the difference between each score and its group mean, square and sum those differences. This yields a sum of squares within group (SSW). To estimate sigma2 you divide SSW by degrees of freedom within group (dfW=n-k). This estimate of sigma2 is called the mean square within groups, MSW. You have been calculating it since Chapter 5. • The other way to estimate sigma2 is to square and sum the differences between each participant’s group mean and the overall mean. This yields a sum of squares between group and grand means (SSB). To estimate sigma2 you divide SSB by degrees of freedom between groups (dfB=k-p1). This is called the mean square between groups,MSB. It is new.**What is indexed by sigma2 and its best estimate:MSW**• Sigma2 indexes random sampling fluctuation. It comprises individual differences and random measurement problems (ID + MP). • MSW: Since everyone in a specific group is treated the same way, differences between participant’s scores and their own group mean, the basis of MSW, can only reflect ID + MP. • Thus, MSW is always a good estimate of sigma2, the population variance, as both index ID + MP.**What is indexed by the mean square between groups (MSB)**• Since we treat the groups differently, the distance between each group’s mean and the overall mean can reflect the effects of the independent variable (as well as the effects of random individual differences and random measurement problems). • Thus MSB = ID + MP + (?)IV • If the independent variable pushes the group means apart, MSB will overestimate sigma2 and be larger than MSW.**Testing the Null Hypothesis (H0)**• H0 says that the IV has no effect. • If H0 is true, groups differ from each other and from the overall mean only because of sampling fluctuation based on random individual differences and measurement problems (ID + MP). • These are the same things that make scores differ from their own group means. • So, according to H0, MSB and MSW are two ways of measuring the same thing (ID + MP) and are both good estimates of sigma2. • Two measurements of the same thing should be about equal to each other and a ratio between them should be about equal to 1.00.**In simple experiments (Ch.9), the ratio between MSB and MSW**is the Fisher or F ratio. In simple experiments, F=MSB/MSW. H0 says F should be about 1.00.**The Experimental Hypothesis (H1)**• The experimental hypothesis says that the groups’ means will be made different from each other (pushed apart) by the IV, the independent variable (as well as by random individual differences and measurement problems). • If the means are pushed apart, MSB will increase, reflecting the effects of the independent variable (as well as of the random factors). MSW will not. • So MSB will be larger than MSW • Therefore, H1 suggests that an F ratio comparing MSB to MSW should be larger than 1.00.**As usual, we set up 95% confidence intervals around the**prediction of the null. • In Ch. 9, the ratio MSB/MSW is called the F ratio. • If the F ratio is about 1.00, the prediction of the null is correct. • It is rare for the F ratio to be exactly 1.00. • At some point, the ratio gets too different from 1.00 to be consistent with the null. We are only interested in the case where the ratio is greater than 1.00 which means that the means are further apart than the null suggests. • The F table tells us when the difference among the means is too large to be explained as sampling fluctuation alone.**An experiment**• Population: Male, self-selected, “social drinkers” • Number of participants (9) and groups (3) • Design: Single factor, unrelated groups • Independent variable: Stress • Level 1: No Stress • Level 2: Moderate stress • Level 3: High stress • Dependent variable: ounces consumed • H0: Stress does not affect alcohol consumption. • H1: Stress will cause increased alcohol consumption.**10**10 10 13 13 13 16 16 16 -2 0 2 -1 -1 2 -3 1 2 4 0 4 1 1 4 9 1 4 10 10 10 13 13 13 16 16 16 13 13 13 13 13 13 13 13 13 -3 -3 -3 0 0 0 3 3 3 9 9 9 0 0 0 9 9 9 Computing MSW and MSB 1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 8 10 12 12 12 15 13 17 18**SS df MS F p**Divide MSB by MSW to calculate F. CPE 9.2.1 - ANOVA summary table Between Groups Stress level 54 2 27 ? 5.78 26 6 4.67 Within Groups Error We need to look at the F table to determine significance.**Mean Squares**between groups. Mean Squares within groups. Ratio of mean squares = F ratio Possibly effected by independent variable. Not effected by independent variable. If the independent variable causes differences between the group means, then MSB will be larger than MSW. If the effect is large enough and/or there are enough degrees of freedom, the result may be a statistically significant F ratio.**The F Test**• The null predicts that we will find an F ratio close to 1.00, not an unusually large F ratio. • F table tells us whether the F ratio is significant. • p<.05 means that we have found an F ratio that is large enough to occur with 5 or fewer samples in 100 when the null is true. If we find a larger F ratio than the null predicts, we have shown H0 to predict badly and reject it. • Results are statistically significant when you equal or exceed the critical value of F at p <.05.**Critical values in the F table**• The critical values in the F table depend on how good MSB and MSW are as estimates of sigma2. • The better the estimates, the closer to 1.00 the null must predict that their ratio will fall. • What makes estimates better??? DEGREES OF FREEDOM. Each degree of freedom corrects the sample statistic back towards its population parameter. • Thus, the more degrees of freedom for MSW and MSB, the closer the critical value of F will be to 1.00.**Using the F table**So, to use the F table, you must specify the degrees of freedom (df) for the numerator and denominator of the F ratio. In both Ch 9 and Ch 10 the denominator is MSW. As you know, dfW = n-k. In Ch 9, the numerator is MSB and dfB=k-1.**Degrees of freedom in Numerator**1 2 3 4 5 6 7 8 Df in denominator 3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27 6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84 8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03 9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47**Degrees of freedom in Numerator**1 2 3 4 5 6 7 8 Df in denominator 36 4.41 3.26 2.86 2.63 2.48 2.36 2.28 2.21 7.39 5.25 4.38 3.89 3.58 3.35 3.18 3.04 40 4.08 3.23 2.84 2.61 2.45 2.34 2.26 2.19 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 60 4.00 3.15 2.76 2.52 2.37 2.25 2.17 2.10 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 100 3.94 3.09 2.70 2.46 2.30 2.19 2.10 2.03 6.90 4.82 3.98 3.51 3.20 2.99 2.82 2.69 400 3.86 3.02 2.62 2.39 2.23 2.12 2.03 1.96 6.70 4.66 3.83 3.36 3.06 2.85 2.69 2.55 3.84 2.99 2.60 2.37 2.21 2.09 2.01 1.94 6.64 4.60 3.78 3.32 3.02 2.80 2.64 2.51**Degrees of freedom in Numerator**1 2 3 4 5 6 7 8 Df in denominator 3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 These are related to the number of different treatment groups. 4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 They relate to the Mean Square between groups. 5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27 6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10 k-1 7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84 8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03 9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47**Degrees of freedom in Numerator**1 2 3 4 5 6 7 8 Df in denominator 3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 These are related to the number of subjects. 4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27 n-k 6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84 They relate to the Mean Square within groups. 8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03 9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47**Degrees of freedom in Numerator**1 2 3 4 5 6 7 8 Df in denominator 3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27 The critical values in the top rows are alpha = .05. 6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84 8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03 9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47**Degrees of freedom in Numerator**1 2 3 4 5 6 7 8 Df in denominator 3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27 The critical values in the bottom rows are for bragging rights (p< .01). 6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84 8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03 9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47