1 / 46

Power and Effect Size

Power and Effect Size. Previous Weeks. A few weeks ago I made a small chart outlining all the different statistical tests we’ve covered (week 9) I want to complete that chart using information from the past week Most of this is a repeat – but a few new tests have been added

gaura
Download Presentation

Power and Effect Size

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Power and Effect Size

  2. Previous Weeks • A few weeks ago I made a small chart outlining all the different statistical tests we’ve covered (week 9) • I want to complete that chart using information from the past week • Most of this is a repeat – but a few new tests have been added • Important that you are familiar with these tests, know when they are appropriate to use, and how to run (most of) them in SPSS • Excused from running ANCOVA, RM ANOVA

  3. When to use specific statistical tests…

  4. Tonight… • A break from learning a new statistical ‘test’ • Focus will be on two critical statistical ‘concepts’ • Statistical Power • Related to Alpha/Statistical Significance • Brief overview of Effect Size • Statistically significant results vs Meaningful results • First, a quick review of error in testing…

  5. Example Hypothesis • Pretend my masters thesis topic is the influence of exercise on body composition • I believe people that exercise more, will have lower %BF • To study this: • I draw a sample and group subjects by how much they exercise –High and Low Exercise Groups (this is my IV) • I also assess %BF in each subject as a continuous variable (DV) • I plan to see if the two groups have different mean %BF • My hypotheses (HO and HA): • HA: There is a difference in %BF between the groups • HO: There is not a difference in %BF between the groups

  6. Example Continued • Now I’m going to run my statistical test, get my test statistic, and calculate a p-value • I’ve set alpha at the standard 0.05 level • By the way, what statistical test should I use…? • My final decision on my hypotheses is going to be based on that p-value: • I could reject the null hypothesis (accept HA) • I could accept the null hypothesis (reject HA)

  7. Statistical Errors… • Since there are two potential decisions (and only one of them can be correct), there are two possible errors I can make: • Type I Error • We could reject the null hypothesis although it was really true (should have accepted null) • Type II Error • We could fail to reject the null hypothesis when it was really untrue (should have rejected null)

  8. HA: There is a difference in %BF between the groups • HO: There is not a difference in %BF between the groups

  9. Statistical Errors… • Remember – My final decision is based on the p-value

  10. If p </= 0.05, our decision is reject HO • If p > 0.05, our decision is accept HO

  11. Statistical Errors… • In my analysis, I find: • High Exercise Group mean %BF = 22% • Low Exercise Group mean %BF = 26% • p = 0.08 • What is my decision? • Accept HO • There is NOT a difference in %BF between the groups • Why is that my decision? The means ARE different? • I can’t be confident that the 4% difference between the two groups is not due to random sampling error Is it possible I’ve made an error in my decision?

  12. Possible Error…? • If I did make an error, what type would it be? • Type II Error • When you find a p-value greater than alpha • The only possible error is Type II error • When you find a p-value less than alpha • The only possible error is Type I error

  13. If p </= 0.05, our decision is reject HO • If p > 0.05, our decision is accept HO

  14. Possible Error…? • Compare Type I and Type II error like this: • The only concern when you find statistical significance (p < 0.05) is Type I Error • Is the difference between groups REAL or due to Random Sampling Error • Thankfully, the p-value tells you exactly what the probability of that random sampling error is • In other words, the p-value tells you how likely Type I error is • But, does the p-value tell you how likely Type II error is? • The probability of Type II error is better provided by Power

  15. Possible Error…? • Probability of Type II error is provided by Power • Statistical Power, also known as β(actually 1 – β) • We will not discuss the specific calculation of power in this class • SPSS can calculate this for you • Power (Beta) is related to Alpha, but: • Alpha is the probability of having Type I error • Lower number is better (i.e., 0.05 vs 0.01 vs 0.001) • Power is the probability of NOT having Type II error • The probability of being right (correctly rejecting the null hypothesis) • Higher number is better (typical goal is 0.80) Let’s continue this in the context of my ‘thesis’ example

  16. Statistical Errors… • In my analysis, I found: • High Exercise Group mean %BF = 22% • Low Exercise Group mean %BF = 26% • p = 0.08 • Decided to accept the null • What do I do when I don’t find statistical significance? • What happens when the result does not reflect expectations? First, consider the situation

  17. Should it be statistically significant? • The most obvious thing you need to consider is if you REALLY should have found a statistically significant result? • Just because you wanted your test to be significant doesn’t mean it should be • This wouldn’t be Type II error – it would just be the correct decision! • In my example, researchers have shown in several studies that exercise does influence %BF • This result ‘should’ be statistically significant, right? • If the answer is yes, then you need to consider power

  18. In my ‘thesis’ • This result ‘should’ be statistically significant, right? • Probably an issue with Statistical Power • This scenario plays out at least once a year between myself and a grad student working on a thesis or research project • How can I increase the chance that I will find statistically significant results? • Why was this analysis not statistically significant? • What can I do to decrease the chance of Type II error? • Several different factors influence power • Your ability to detect a true difference

  19. How can I increase Power? • 1) Increase Alpha level • Changing alpha from 0.05 to 0.10 will increase your power (better chance of finding significant results) • Downsides to increasing your alpha level? • This will increase the chance of Type I error! • This is rarely acceptable in practice • Only really an option when working in a new area: • Researchers are unsure of how to measure a new variable • Researchers are unaware of confounders to control for

  20. How can I increase Power? • 2) Increase N • Sample size is directly used when calculating p-values • Including more subjects will increase your chance of finding statistically significant results • Downsides to increasing sample size? • More subjects means more time/money • More subjects is ALWAYS a better option if possible

  21. How can I increase Power? • 3) Use fewer groups/variables (simpler designs) • Related to sample size but different • ‘Use fewer groups’ NOT ‘Use less subjects’ • ↑ groups negatively effects your degrees of freedom • Remember, df is calculated with # groups and # subjects • Lots of variables, groups and interactions make it more difficult to find statistically significant differences • The purpose of the Family-wise error rate is to make it harder to find significant results! • Downsides to fewer groups/variables? • Sometimes you NEED to make several comparisons and test for interactions - unavoidable

  22. How can I increase Power? • 4) Measure variables more accurately • If variables are poorly measured (sloppy work, broken equipment, outdated equipment, etc…) this increases measurement error • More measurement error decreases confidence in the result • For example, perhaps I underestimated %BF in my ‘low exercise’ group? This could lead to Type II Error. • More of an internal validity problem than statistical problem • Downsides to measuring more accurately? • None – if you can afford the best tools

  23. How can I increase Power? • 5) Decrease subject variability • Subjects will have various characteristics that may also be correlated with your variables • SES, sex, race/ethnicity, age, etc… • These variables can confound your results, making it harder to find statistically significant results • When planning your sample (to enhance power), select subjects that are very similar to each other • This is a reason why repeated measures tests and paired samples are more likely to have statistically significant results • Downside to decreasing subject variability? • Will decrease your external validity – generalizability • If you only test women, your results do not apply to men

  24. How can I increase Power? • 6) Increase magnitude of the mean difference • If your groups are not different enough, make them more different! • For example, instead of measuring just high and low exercisers, perhaps I compare marathon runners vs completely sedentary people? • Compare a ‘very’ high exercise to a ‘very’ low exercise group • Sampling at the extremes, getting rid of the middle group • Downsides to using the extremes? • Similar to decreasing subject variability, this will decrease your external validity Questions on Power/Increasing Power?

  25. The Catch-22 of Power and P-values • I’ve mentioned this previously – but once you are able to draw a large sample, this will ruin the utility of p/statistical significance • The larger your sample, the more likely you’ll find statistically significant results • Sometimes miniscule differences between groups or tiny correlations are ‘significant’ • This becomes relevant once sample size grows to 100~150 subjects per group • Once you approach 1000 subjects, it’s hard not to find p < 0.05 • Example from most highly cited paper in Psych, 2004…

  26. This paper was the first to find a link between playing video games/TV and aggression in children: • Every correlation in this table except 1 has p < 0.05 • Do you remember what a correlation of 0.10 looks like?

  27. r = 0.10 Do you see a relationship between these two variables?

  28. What now? • This realization has led scientists to begin to avoid p-values (or at least avoid just reporting p-values) • Moving towards reporting with 95% confidence intervals • Especially in areas of research where large samples are common (epidemiology, psychology, sociology, etc..) • Some people interpret ‘statistically significant’ as being ‘important’ • We’ve mentioned several times this is NOT true • Statistically significant just means it’s likely not Type I error • Can have ‘important’ results that aren’t statistically significant

  29. Effect Size • To get an idea of how ‘important’ a difference or association is, we can use Effect Size • There are over 40 different types of effect size • Depends on statistical test used • SPSS will NOT always calculate effect size • Effect size is like a ‘descriptive’ statistic that tells you about the magnitude of the association or group difference • Not impacted by statistical significance • Effect size can stay the same even if p-value changes • Present the two together when possible • The goal is not to teach you how to calculate effect size, but to understand how to interpret it when you see it

  30. Effect Size • Understanding effect size from correlations and regressions is easy (and you already know it): • r2, coefficient of determination • % Variance accounted for • Pearson correlations between %BF and 3 variables: • r = 0.54, r = -0.92, r = 0.70 • Which of the three correlations has the most important association with %BF? • r2 = 0.29, r2 = 0.85, r2 = 0.49

  31. Interpreting Effect Size • Usually, guidelines are given for interpreting the effect size • Help you to know how important the effect is • Only a guide, you can use your own brain to compare • In general, r2 is interpreted as: • 0.01 or smaller, a Trivial Effect • 0.01 to 0.09, a Small Effect • 0.09 to 0.25, a Moderate Effect • > 0.25, a Large Effect

  32. Effect Size in Regression • Two regression equations contain 4 predictors of %BF. Each ‘model’ is statistically significant. Here are their r2 values: • 0.29 and 0.15 • Which has the largest effect size? Do either or the regression models have a large effect size? • 0.29 model is the most important, and has a ‘large effect size’. • 0.15 model is of ‘moderate’ importance.

  33. Effect Size for Group Differences • Effect size in t-tests and ANOVA’s is a bit more complicated • In general, effect size is a ratio of the mean difference between two groups and the standard deviation • Does this remind you of anything we’ve previously seen? • Z-score = (Score – Mean)/SD • Effect size, when calculated this way, is basically determining how many standard deviations the two groups are different by • E.g., effect size of 1 means the two groups are different by 1 standard deviation (this would be a big difference)!

  34. Example • When working with t-tests, calculating effect size by the mean difference/SD is called Cohen’s d • < 0.1 Trivial effect • 0.1-0.3 Small effect • 0.3-0.5 Medium effect • > 0.5 Large effect • The next slide is the result of a repeated measures t-test from a past lecture, we’ll calculate Cohen’s d

  35. Paired-Samples t-test Output • Mean difference = 2.9, Std. Deviation = 5.2 • Cohen’s d = 0.55, a large effect size • Essentially, the weight loss program reduced body weight by just about half a standard deviation

  36. Other example • I sample a group of 100 ISU students and find their average IQ is 103. • Recall, the population mean for IQ is 100, SD = 15. • I run a one-sample t-test and find it to be statistically significant (p < 0.05) • However, effect size is… • 0.2, or Small Effect • Interpretation: While this difference is likely not due to random sampling error – it’s not very important either

  37. Other types of effect sizes • SPSS will not calculate Cohen’s d for t-tests • However, it will calculate effect size for ANOVA’s (if you request it) • Not Cohen’s d, but Partial Eta Squared (η2) • Similar to r2, interpreted the same way (same scale) • Here is last week’s cancer example • Does Tumor Size and Lymph Node Involvement effect Survival Time • I’ll re-run and request effect size…

  38. Notice, η2 can be used for the entire ‘model’, or each main effect and interaction individually • How would you describe the effect of Tumor Size, or our interaction? • Trivial to Small Effect – How did we get a significant p-value? • Other factors not in our model are also very important

  39. Notice that the r2 is equal to the η2 of the full model • The advantage of η2 is that you can evaluate individual effects

  40. Effect Size Summary • Many other types of effect sizes are out there – I just wanted to show you the effect sizes most commonly used with the tests we know: • Correlation and Regression: r2 • T-tests: Cohen’s d • ANOVA: Partial eta squared (η2) and/or r2 • You are responsible for knowing: • The general theory behind effect size/why to use them • What tests they are associated with • How to interpret them

  41. QUESTIONS on Power?Effect Size?

  42. Upcoming… • In-class activity • Homework: • Cronk – Read Appendix A (pg. 115-19) on Effect Size • Holcomb Exercises 21 and 22 • No out-of-class SPSS work this week • Things are slowing down - next week we’ll discuss non-parametric tests • Chi-Square and Odds Ratio

More Related