Chapter 11

Chapter 11 Tukey’s Honestly Significant Difference

Other stuff to come • Ch 10.Two Way Factorial Analysis of Variance • Three null hypotheses • Graphing the means • Factorial designs Inferential Statistics Parametric • Ch 5. Inferential Statistics • Random Samples • Estimate Population Statistics • Correlation of Two variables • Experimental Methods • Standard Error of the Mean Descriptive Statistics • Ch 1. The Mean, The Number of Observations, & the Standard Deviation • N/Population/Parameters • Measures of Central Tendency –Median, Mode, Mean • Measures of Variability – Range, Sum of Squares, Variance, Standard Deviation • Ch 6. T Scores / T Curves • Estimates of Z scores • Computing t Scores • Critical Values • Degrees of Freedom • Ch 7. Correlation • Variable Relationships – linearity, direction, strength • Correlation Coefficient • Scatter plots • Best Fitting lines • Ch 2. Frequency Distributions and Histograms • Frequency Distributions • Bar Graphs / Histograms • Continuous vs Discreet Variables Chapter 11. Tukey’s Honestly Significant Difference • Ch 12. Tukeys Significant Difference • Testing differences in group means • Alpha for the whole experiment • HSD - Honestly Significant Difference • Ch 8. Regression • Predicting using the regression equation • Generalizing – The null hypothesis • Degrees of freedom and statistical significance • Ch 3. The Normal Curve • Z scores & percentiles • Least Squares, Unbiased estimates • Ch 9. Experimental Studies • Independent and dependent variables • The experimental hypothesis • The F test and the t test • Ch 12. Power Analysis • Type 1 error and alpha • Type 2 error and beta • How many subjects do you need? • Ch 4. Translating To and From Z Scores • Normal Scores • Scale Scores • Raw Scores • Percentiles Ch 13. Assumptions Underlying Parametric Statistics Sample means form a normal curveSubjects are randomly selected from the population Homgeneity of VarianceExperimental error is random across samples Non Parametric • Ch 14. Chi Square • Nominal Data

Objectives At the end of this chapter you will understand how to tell which group means are significantly different from each other. You will be able to: • Calculate the number of pair-wise comparisons for group means. • Calculate the harmonic mean. • Look up the q value in a table. • Calculate Tukey’s HSD. • Determine which means are different and which are not

Comparing Experimental Group Means

Means and Significance • A significant F test could tell us that there is a main effect or an interaction effect. • When that happens,we know that there is a difference among the means that is unlikely to occur just by chance. • But which means???? • The only ones we can be sure are significantly different from each other are the highest and lowest means.Not a problem with the t test, because there are only two means • But when more that 2 means, what can we say about the relationship between means other than the highest and lowest?

WHY WE NEED TUKEY'S HSD: Multiple Pair-wise Comparisons

Multiple comparisons • When there were 3 groups in the study, comparing all the possible combinations of the three means, two means at a time, requires us to make three comparisons, • With four groups there are six necessary comparisons, with five groups there are ten comparisons, etc. • We set alpha at .05 thinking we were only doing one t test. • What are the odds of finding significance on oneor more of 6 or 10 or more t tests because of sampling fluctuation? They quickly get much higher than .05

Example Here are the (4)(3)/2=6 comparisons for 4 groups Group # vs Group # 1 2 1 3 1 4 2 3 2 4 3 4

Here is the formula for the number of possible pairwise comparisons • Number of possible pairwise comparisons = [k(k-1)/2] • The number of possible pairwise comparisons equals the number of groups times the number of groups minus one divided by 2 • 6 groups =[(6)(6-1)/2=6*5/2= 30/2 = 15 • 7 groups =[(7)(7-1)/2]=7*6/2 = 42/2=21 • Etc.

We need a fancy t test • The rule is t for 2. • When you want to compare 2 groups to determine whether they are significantly different, use the t test. • But we need a t test designed for multiple comparisons. • Alpha adjusted so that experimentwise alpha =.05. • Easy to compare all the groups in any specific experiment, no matter how many there are.

Keeping experimentwise alpha at .05 is critical • Significance tests exist to minimize Type 1 error (rejecting a true null hypothesis). • So, we don’t want to say the difference between two group means is significant (and, therefore, shows how the independent variable would effect the whole population) when the groups differ only because of sampling fluctuation. • We must keep the odds on making that kind of mistake at 5 in 100 no matter how many comparisons we make.

What are the odds on failing to reject the null when H0 is true and you make three pairwise comparisons • In an ordinary t test, with each comparison, the odds on obtaining a strange sample and getting statistical significance when H0 is true is set, by convention, at .05 (5 in 100) • The odds on not rejecting a null hypothesis that is correct is therefore 95 in 100 (.95). • But, the odds on not rejecting 3 correct nulls in a row, using simple t tests, is (.95)(.95)(.95) =(.95)N=(.95)3=.8573 • The odds on one of the three comparisons being significant when the three nulls are true is 1-.8573 = .1427 or almost exactly 1 out of 7, not 1 out of 20. • In a three group study there are 3 pairwise comparisons. • If we do three comparisons and don’t adjust alpha for each one, the odds on sampling fluctuation (mis)leading us to finding a significant difference between at least one of the 3 pairs are over 14 in 100, not 5 in 100.

When there are more comparisons the odds on type 1 error get much higher! • Say we have four groups, that yields (4x3)/2=6 pairwise comparisons. • If we do 6 t tests with alpha at .05 for each, we have 95 chances in 100 of properly failing to reject the null (and retaining it) each time. • But the odds on properly retaining it every one of the 6 times is (.95)6= .95x.95x.95x.95x.95x.95=.735 • So there would be 1-.735 = .265 = 26.5% chance of committing at least one Type 1 error by rejecting the null hypothesis when the only reason two groups differ is random sampling fluctuation

Five groups: 10 pairwise comparisons • Let’s say that we had 5 groups in a study • That gives us (5)(4)/2=10 possible pairwise comparisons • Assume H0 is true. • With 10 comparisons, the chances of all of the 10 comparisons failing to be statistically significant are (.95)10=.5987 • So the odds on at least one significant finding (though H0 is true) is 1-.5987=.4013 or over 40%

We have to lower alpha for each comparison to keep experimentwise alpha at .05 • To make alpha stay at .05 when the all the comparisons are considered, we must lower alpha (quite a bit) • For example, with 5 groups, we need to set alpha at a little more than .005 for each of the 10 comparisons. • Then, taken together to yield an experimentwise alpha of .05 • [(1-.005116)10=(.994884)10 = .950000]

Think of it this way. • You must pass a true/false test about tensor calculus. You have a 5% chance to passing the test each time you take it just by choosing True or False randomly. • Which would you prefer, taking the test once or as many times as you wish • You know intuitively that you will pass the test eventually just by chance.

It’s the same thing with the t test • Compare 2 groups that differ by chance an you have only a 5% chance of making a Type 1 error. • Make lots of comparisons and sooner or later you will make a Type 1 error simply by chance. • The solution: change the alpha level and thus the critical value of t on each test so that there is only a 5% chance of getting any Type 1 errors given the number of comparisons you have to make.

It’s like dividing .05 by the number of comparisons • When there are three groups (and 3 comparisons) its almost like you divide .05 by 3 and set the critical value of t so that a proportion of .05/3=0167 (1.67%) stays in the tails for each t test • That means that you are creating a 1.000-.0167=98.33% confidence interval that is consistent with the null hypothesis for each t test. • Then you have 5% altogether for the 3 tests. • The actual values for the confidence interval are slightly different than that, but its close.

More than 3 groups? Divide by the number of comparisons. • Four groups = 6 comparisons • Critical value for t leaves about .05/6~.0083 in the tails, about 1-.0083=.9917 in the body • Five groups = 10 comparisons • Critical value for t leaves about .05/10~.0050 in the tails, about 1-.005=.995 in the body • (Actual values involve nth root of .95, so are a very little different than the values above (e.g., .994884 in the body for 5 groups and 10 comparisons instead of .995000. But dividing .05 by the number of comparisons is a “good enough” way to think about it.)

Summary: We have to lower alpha for each individual t test to keep experimentwise alpha at .05 • If we kept alpha for each individual t test at .05, then did 10, 20 comparisons, or 30 comparisons between pais of mean, we almost certainly will get at least one and possibly more Type 1 errors. • That is,we would get statistically significant findings that would force us to say that two treatments would differ in their effects in the population as a whole, when that isn’t true. • So we must lower alpha for each comparison to get an experimentwise alpha of .05

The q table and the Tukey test • We could find the correct critical values for t with a lot of work and a very lengthy t table. • Fortunately, Someone did it for us. A man named Tukey. He gave us the Tukey test and the q table. • The q table is a fancy t table with each value of q equal to the proper critical value for t corrected for the number of comparisons to be made and then multiplied by the square root of 2.00 (1.414) to make the equations simpler.

HSD

Tukey’s Honestly Significant Difference • An HSD is the minimum difference between two means that can be deemed statistically different, while keeping the experiment-wise alpha at .05. • Any two means separated by this amount or greater are significantly different from each other. • Any two means separated by less than this amount cannot be considered significantly different.

Calculating HSD q - look up in a table based on dfW and k. The harmonic mean is a geometric average of the number of subjects in each group. Remember that this is a post hoc comparison, therefore we have already calculated MSW, computed the ANOVA and found a statistically significant F ratio.

Oh No!! My rat died! What is going to happen to my experiment? Calculating the Harmonic Mean Notice that this technique allows different numbers of subjects in each group.

Same size groups and harmonic and ordinary mean number of participants is the same. 3 groups; 4 subjects each

When groups do not have equal numbers, harmonic mean is smaller than ordinary mean. 4 groups; 6, 4, 8 and 4 participants. Ordinary mean=22/4=5.5 participants each.

Calculating HSD – finding q q - look up in a table based on dfW and K.

dfW 2 3 4 5 6 7 8 9 10 q table for =.05 5 3.64 4.60 5.22 5.67 6.03 6.33 6.58 6.80 6.99 6 3.46 4.34 4.90 5.30 5.63 5.90 6.12 6.32 6.49 7 3.34 4.16 4.68 5.06 5.36 5.61 5.82 6.00 6.16 8 3.26 4.04 4.53 4.89 5.17 5.40 5.60 5.77 5.92 9 3.20 3.95 4.41 4.76 5.02 5.24 5.43 5.59 5.74 10 3.15 3.88 4.33 4.65 4.91 5.12 5.30 5.46 5.60 11 3.11 3.82 4.26 4.57 4.82 5.03 5.20 5.35 5.49 12 3.08 3.77 4.20 4.51 4.75 4.95 5.12 5.27 5.39 13 3.06 3.73 4.15 4.45 4.69 4.88 5.05 5.19 5.32 14 3.03 3.70 4.11 4.41 4.64 4.83 4.99 5.13 5.25 15 3.01 3.67 4.08 4.37 4.59 4.78 4.94 5.08 5.20 16 3.00 3.65 4.05 4.33 4.56 4.74 4.90 5.03 5.15 17 2.983.63 4.02 4.30 4.52 4.70 4.86 4.99 5.11 18 2.97 3.61 4.00 4.28 4.49 4.67 4.82 4.96 5.07 19 2.96 3.59 3.98 4.25 4.47 4.65 4.79 4.92 5.04 20 2.95 3.58 3.96 4.23 4.45 4.62 4.77 4.90 5.01 24 2.92 3.53 3.90 4.17 4.37 4.54 4.68 4.81 4.92 30 2.89 3.49 3.85 4.10 4.30 4.46 4.60 4.72 4.82 40 2.86 3.44 3.79 4.04 4.23 4.39 4.52 4.63 4.73 60 2.83 3.40 3.74 3.98 4.16 4.31 4.44 4.55 4.65 120 2.80 3.36 3.68 3.92 4.10 4.24 4.36 4.47 4.56 500 2.77 3.31 3.63 3.86 4.03 4.17 4.29 4.39 4.47

The table in the book has bad values. dfW 2 3 4 5 6 7 8 9 10 q table for =.05 5 3.64 4.60 5.22 5.67 6.03 6.33 6.58 6.80 6.99 6 3.46 4.34 4.90 5.30 5.63 5.90 6.12 6.32 6.49 7 3.34 4.16 4.68 5.06 5.36 5.61 5.82 6.00 6.16 8 3.26 4.04 4.53 4.89 5.17 5.40 5.60 5.77 5.92 Number of groups (means) across top. 9 3.20 3.95 4.41 4.76 5.02 5.24 5.43 5.59 5.74 10 3.15 3.88 4.33 4.65 4.91 5.12 5.30 5.46 5.60 There is a whole other table for .01 11 3.11 3.82 4.26 4.57 4.82 5.03 5.20 5.35 5.49 12 3.08 3.77 4.20 4.51 4.75 4.95 5.12 5.27 5.39 13 3.06 3.73 4.15 4.45 4.69 4.88 5.05 5.19 5.32 14 3.03 3.70 4.11 4.41 4.64 4.83 4.99 5.13 5.25 15 3.01 3.67 4.08 4.37 4.59 4.78 4.94 5.08 5.20 16 3.00 3.65 4.05 4.33 4.56 4.74 4.90 5.03 5.15 17 2.983.63 4.02 4.30 4.52 4.70 4.86 4.99 5.11 18 2.97 3.61 4.00 4.28 4.49 4.67 4.82 4.96 5.07 19 2.96 3.59 3.98 4.25 4.47 4.65 4.79 4.92 5.04 20 2.95 3.58 3.96 4.23 4.45 4.62 4.77 4.90 5.01 dfW down left. (n-k) 24 2.92 3.53 3.90 4.17 4.37 4.54 4.68 4.81 4.92 30 2.89 3.49 3.85 4.10 4.30 4.46 4.60 4.72 4.82 40 2.86 3.44 3.79 4.04 4.23 4.39 4.52 4.63 4.73 60 2.83 3.40 3.74 3.98 4.16 4.31 4.44 4.55 4.65 120 2.80 3.36 3.68 3.92 4.10 4.24 4.36 4.47 4.56 500 2.77 3.31 3.63 3.86 4.03 4.17 4.29 4.39 4.47

Two examples Effects of alcohol Vitamin B in various teas

A rat in group 3 died! Ethanol and minutes of REM sleep MSW = 65 k = 4 n = 16; 4 each group Means 0 g/kg - 79.28 min 1 g/kg - 61.54 min 2 g/kg - 47.92 min 3 g/kg - 32.76 min n=15 dfW = n-k = 15-4 = 11

dfW 2 3 4 5 6 7 8 9 10 q table for =.05 5 3.64 4.60 5.22 5.67 6.03 6.33 6.58 6.80 6.99 6 3.46 4.34 4.90 5.30 5.63 5.90 6.12 6.32 6.49 7 3.34 4.16 4.68 5.06 5.36 5.61 5.82 6.00 6.16 8 3.26 4.04 4.53 4.89 5.17 5.40 5.60 5.77 5.92 9 3.20 3.95 4.41 4.76 5.02 5.24 5.43 5.59 5.74 10 3.15 3.88 4.33 4.65 4.91 5.12 5.30 5.46 5.60 11 3.11 3.82 4.26 4.57 4.82 5.03 5.20 5.35 5.49 4.26 12 3.08 3.77 4.20 4.51 4.75 4.95 5.12 5.27 5.39 13 3.06 3.73 4.15 4.45 4.69 4.88 5.05 5.19 5.32 k=4 dfW=11 14 3.03 3.70 4.11 4.41 4.64 4.83 4.99 5.13 5.25 15 3.01 3.67 4.08 4.37 4.59 4.78 4.94 5.08 5.20 16 3.00 3.65 4.05 4.33 4.56 4.74 4.90 5.03 5.15 17 2.983.63 4.02 4.30 4.52 4.70 4.86 4.99 5.11 18 2.97 3.61 4.00 4.28 4.49 4.67 4.82 4.96 5.07 19 2.96 3.59 3.98 4.25 4.47 4.65 4.79 4.92 5.04 20 2.95 3.58 3.96 4.23 4.45 4.62 4.77 4.90 5.01 24 2.92 3.53 3.90 4.17 4.37 4.54 4.68 4.81 4.92 30 2.89 3.49 3.85 4.10 4.30 4.46 4.60 4.72 4.82 40 2.86 3.44 3.79 4.04 4.23 4.39 4.52 4.63 4.73 60 2.83 3.40 3.74 3.98 4.16 4.31 4.44 4.55 4.65 120 2.80 3.36 3.68 3.92 4.10 4.24 4.36 4.47 4.56 500 2.77 3.31 3.63 3.86 4.03 4.17 4.29 4.39 4.47

Harmonic Mean

Ethanol and sleep MSW = 65 k = 4 Means 0 g/kg - 79.28 min 1 g/kg - 61.54 min 2 g/kg - 47.92 min 3 g/kg - 32.76 min n=15 dfW = n-k = 15-4 = 11 q = 4.26 Means as far or further apart than 17.87 represent a significant difference and can be generalized.

Ethanol and Sleep – the six comparisons HSD = 17.87 Comparisons Difference p 0g/kg 79.28 1g/kg 61.54 17.74 n.s. 0g/kg 79.28 2g/kg 47.92 31.36 .05 0g/kg 79.28 3g/kg 32.76 46.52 .05 1g/kg 61.54 2g/kg 47.92 13.62 n.s. 1g/kg 61.54 3g/kg 47.92 28.78 .05 2g/kg 47.92 3g/kg 32.76 15.16 n.s.

Ethanol and Sleep Conclusion • 2 and 3 gm/kg of ethanol interrupted sleep significantly more than no ethanol, • Also, 3 gm/kg of ethanol interrupts sleep significantly more than 1 gm/kg of ethanol. • No adjoining doses differed significantly (0 vs.1, 1vs2, 2 vs.3 – all n.s.) 0 vs. 1 n.s. 0 vs. 2 .05 0 vs. 3 .05 1 vs. 2 n.s. 1 vs. 3 .05 2 vs. 3 n.s.

Tea Example The means are Brand A: 8.27 ml Brand B: 7.50 ml Brand C: 6.15 ml Brand D: 6.00 ml Brand E: 5.82 ml MSW = 1.51 k = 5 n = 50; 10 each group dfW = n-k = 50-5 = 45

dfW 2 3 4 5 6 7 8 9 10 q table for =.05 5 3.64 4.60 5.22 5.67 6.03 6.33 6.58 6.80 6.99 6 3.46 4.34 4.90 5.30 5.63 5.90 6.12 6.32 6.49 7 3.34 4.16 4.68 5.06 5.36 5.61 5.82 6.00 6.16 k=5 dfW=45 8 3.26 4.04 4.53 4.89 5.17 5.40 5.60 5.77 5.92 9 3.20 3.95 4.41 4.76 5.02 5.24 5.43 5.59 5.74 10 3.15 3.88 4.33 4.65 4.91 5.12 5.30 5.46 5.60 11 3.11 3.82 4.26 4.57 4.82 5.03 5.20 5.35 5.49 Use smaller number of df for missing degrees of freedom (or interpolate). 12 3.08 3.77 4.20 4.51 4.75 4.95 5.12 5.27 5.39 13 3.06 3.73 4.15 4.45 4.69 4.88 5.05 5.19 5.32 14 3.03 3.70 4.11 4.41 4.64 4.83 4.99 5.13 5.25 15 3.01 3.67 4.08 4.37 4.59 4.78 4.94 5.08 5.20 16 3.00 3.65 4.05 4.33 4.56 4.74 4.90 5.03 5.15 17 2.983.63 4.02 4.30 4.52 4.70 4.86 4.99 5.11 18 2.97 3.61 4.00 4.28 4.49 4.67 4.82 4.96 5.07 19 2.96 3.59 3.98 4.25 4.47 4.65 4.79 4.92 5.04 20 2.95 3.58 3.96 4.23 4.45 4.62 4.77 4.90 5.01 24 2.92 3.53 3.90 4.17 4.37 4.54 4.68 4.81 4.92 30 2.89 3.49 3.85 4.10 4.30 4.46 4.60 4.72 4.82 40 2.86 3.44 3.79 4.04 4.23 4.39 4.52 4.63 4.73 4.04 60 2.83 3.40 3.74 3.98 4.16 4.31 4.44 4.55 4.65 120 2.80 3.36 3.68 3.92 4.10 4.24 4.36 4.47 4.56 500 2.77 3.31 3.63 3.86 4.03 4.17 4.29 4.39 4.47

Harmonic Mean - Tea

Tea Example – amount of vitamin B present in various cups of tea -10 cups in each group. MSW = 1.51 k = 5 n = 50; 10 each group dfW = n-k = 50-5 = 45 The means are Brand A: 8.27 ml Brand B: 7.50 ml Brand C: 6.15 ml Brand D: 6.00 ml Brand E: 5.82 ml q = 4.04 Means as far or further apart than 1.57 represent an honestly significant difference.

Tea Example – the ten comparisons HSD = 1.57 Brand vs Brand Difference p A 8.27 B 7.50 0.77 n.s. A 8.27 C 6.15 2.12 .05 A 8.27 D 6.00 2.27 .05 A 8.27 E 5.82 2.45 .05 B 7.50 C 6.15 1.35 n.s. B 7.50 D 6.00 1.50 n.s B 7.50 E 5.82 1.68 .05 C 6.15 D 6.00 0.15 n.s. C 6.15 E 5.82 0.33 n.s. D 6.00 E 5.82 0.18 n.s.

Tea Conclusion A B n.s. A C .05 A D .05 A E .05 B C n.s. B D n.s B E .05 C D n.s. C E n.s. D E n.s. • Brand A has significantly more nutritional value, as measured by amount of vitamin B, than Brand C, D, and E. • Brand B has significantly more vitamin B than Brand E. • No other brands differed significantly in nutritional value.

Chapter 11

Chapter 11

Presentation Transcript

CHAPTER 11

Chapter 11

Chapter 11

chapter 11

Chapter 11

Chapter 11

Chapter 11

CHAPTER 11

CHAPTER 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11