180 likes | 279 Views
Effect size: Why what we teach psychologists is wrong. Dr Thom Baguley, Psychology, Nottingham Trent University Thomas.Baguley@ntu.ac.uk. 0. Overview. 1. Introduction 2. Standardized effect size (or, I do not think it means what you think it means) 3. We aren’t selling T-shirts here
E N D
Effect size: Why what we teach psychologists is wrong Dr Thom Baguley, Psychology, Nottingham Trent University Thomas.Baguley@ntu.ac.uk
0. Overview 1. Introduction 2. Standardized effect size (or, I do not think it means what you think it means) 3. We aren’t selling T-shirts here 4. Never mind the quantity feel the width 5. Big isn’t always better 6. Conclusions
1. Introduction Statistical significance does not imply practical significance (e.g., Rosenthal, 1994; Kirk, 1996) The practical significance depends (but by no means exclusively) on the magnitude of the effect Advice (e.g., from the APA) is to report effect size (e.g., alongside results of a significance test)
2. Standardized effect size (or, I do not think it means what you think it means) Simple (unstandardized) effect size uses the original units of measurement e.g., unstandardized regression slope (b) simple difference in group means (M1- M2) Standardized effect size replaces the original units with standard deviation units (or equivalents such as the variance) e.g., standardized regression slope (b or r) Cohen’s d = (M1- M2)/ SDpooled
Problems with standardized effect size • Standardized units are tricky to interpret because the original context is lost (in particular for applied research – see Baguley, 2004) - Standardized units confound the magnitude of an effect with its variability Q. Why is the latter a problem? A. The variability of an effect is not stable …
Baguley (in press) discusses some of the factors that influence the variability of an effect: • reliability (measurement error) • range restriction • design of the study
Attenuation due to unreliability According to classical test theory an observed correlation rxy depends on the reliability with which X and Y are measured: It follows that standardized effect size is distorted (and usually reduced by) measurement error Simple effect size is robust with respect to reliability (for analyses with orthogonal predictors)
(a) The unstandardized slope between two normal, random variables: X and Y; Y = 26.34 + 0.4743X. (b) The unstandardized slope, selecting only the upper and lower quartiles of X; Y = 26.10 + 0.4894X. (c) The standardized slope of X and Y (r99 = .605). (d) The standardized slope of X and Y selecting only the upper and lower quartiles of X (r49 = .735).
Study design Aspects of a study’s design (such as sample characteristics) also influence variability Consider a study on negative priming effects comparing young and old people: (Adapted from Buchner & Mayr, 2004; Experiment 1)
3. We aren’t selling T-shirts here Cohen (1988) labeled effect sizes in the behavioural sciences as: Cohen originally intended them as a last resort in sample size calculations
‘T-shirt’ effect sizes Lenth (2001) has called them ‘canned’ effect sizes or (more recently) ‘T-shirt’ effect sizes These labels are dangerous because they ignore so many important factors (e.g., Glass et al., 1981; Lenth, 2001; Baguley, 2004; 2008) Comments about the absolute magnitude of an effect (e.g., that it was ‘large’) can mislead (Robinson et al., 2003)
4. Never mind the quantity feel the width We focus too much on the point estimate of an effect size (whether standardized or simple) The uncertainty in the point estimate needs to be considered when interpreting an effect Confidence intervals (CIs) offer a convenient way to do this e.g., for a Normal distributed a 95% CI of the mean would be +/- 1.96 SEs
Example: CI for a correlation Reporting NHST for correlation: r(49) = .168, p > .05 Reporting 95% CI for correlation: r(49) = .168 (-.116, .455) Unlike the NHST it is obvious from the CI that it is implausible that r is exactly or very close to zero
5. Big isn’t always better Supposedly ‘small’ effects can be impressive too e.g., consider the classic Salk vaccine trial data r = .0106 (r2 = 0.000113 ≈ 0.01%) … but the odds ratio = 3.48 (2.36, 5.12) (The odds of getting polio are 3 to 4 times higher for unvaccinated children)
Likewise ‘big’ effects can be unimpressive in some contexts or be cause for suspicion e.g., impossibly large correlations in social neuroscience studies (Vul et al.,in press) Others have argued that unusually large effects in high impact journals are particularly likely to be false (e.g., due to publication bias) (e.g., Ionaddes,2008; Young et al., 2008)
Social neuroscience correlations reported by Vul et al. (in press) by methods of calculation. Correlations above 0.7 are implausibly high† (except through sampling error) and observed r around 0.5 or 0.60 would be pretty impressive! (Even the independent method probably overestimates r) † Because the reliablity of these measures is rarely > 0.7
6. Conclusions • Psychologists overemphasize standardized effect size • Reporting and interpreting research findings isn’t like selling T-shirts (We can’t cram everything into three sizes) • Effect sizes are imprecise estimates • Determining the practical or theoretical importance of a study is highly context dependent
A final thought As teachers of psychology we emphasize the importance of critical thinking to our students Statistics seems to be an exception to this More often than not we teach ritualized methods Why is this?