evaluating importance an overview
Skip this Video
Download Presentation
Evaluating importance: An overview

Loading in 2 Seconds...

play fullscreen
1 / 29

Evaluating importance: An overview - PowerPoint PPT Presentation

  • Uploaded on

Evaluating importance: An overview. Size (magnitude) of effect (a.k.a. practical significance) d or other Functional significance (a.k.a. clinical significance) e.g., social validity ratings

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Evaluating importance: An overview' - juliette

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
evaluating importance an overview
Evaluating importance: An overview
  • Size (magnitude) of effect (a.k.a. practical significance)
    • d or other
  • Functional significance (a.k.a. clinical significance)
    • e.g., social validity ratings
  • Cost-benefit ratio
  • Feasibility
practical vs statistical significance
Practical vs statistical significance
  • Statistical significance (alpha level; p-value) reflects the odds that a particular finding could have occurred by chance.
  • If the p-value for a difference between two groups is 0.05, it would be expected to occur by chance just 5 times out of 100 (thus, it is likely to be a “real” difference).
  • If the p-value for the difference is 0.01, it would be expected to occur by chance just one time out of 100 (thus, we can be even more confident that the difference is real rather than random).
practical significance
Practical significance
  • Reflects the magnitude, or size, of the difference, not the odds that it could have occurred by chance
  • Arguably much more important than statistical significance, especially for clinical questions
  • Measures of effect size (ES) quantify practical significance of a finding
effect size
Effect size
  • The degree to which the null hypothesis is false, e.g., not just that two groups differ significantly, but how much they differ (Cohen, 1990)
  • Several measures of ES exist; use “whatever conveys the magnitude of the phenomenon of interest appropriate to the research context” (Cohen, 1990, p. 1310)
  • IQ and height example (Cohen, 1990)
the height iq correlation cohen s 1990 example on statistical and practical significance
The height-IQ correlation: Cohen’s (1990) example on statistical and practical significance
  • A study of 14,000 children ages 6-17 showed a “highly significant” (p < .001) correlation of r = .11) between height and IQ
  • What does this p indicate?
  • What’s the magnitude of this correlation?
    • Accounts for 1% of the variance
    • Based on an r this big, you’d expect that increasing a child’s height by 4 feet would increase IQ by 30 points, and that increasing IQ by 233 points would increase height by 4 inches (as a correlation, the predicted relationship could work in either direction)
2 main types of es measures
2 main types of ES measures
  • Variance accounted for
    • a squared metric reflecting the percentage of variance in the dependent variable explained by the independent variable
    • e.g., squared correlations, odds ratios, kappa statistics
  • Standardized difference
    • scales measurements across studies into a single metric referenced to some standard deviation
    • d the most common and the easiest conceptually: our focus today
effect size7
Effect size
  • APA (2001) Publication Manual mandates: . . .it is almost always necessary to include some index of effect size or strength of relationship…provide the reader not only with information about statistical significance but also with enough information to assess the magnitude of the observed effect or relationship (pp. 25-26).
APA guidelines (2001) mandate inclusion of ES information (not just p-value information) in all published reports
  • Until that happy day, if ES information is missing, readers must estimate ES for themselves
  • When group means and SDs are reported, you often can estimate effect size quickly and decide whether to keep reading or not
finding estimating and interpreting d in group comparison studies
Finding, estimating and interpreting d in group comparison studies
  • d = Difference between the means of the two groups, divided by the standard deviation (SD)
  • Interpret as size of group difference in SD units
  • When average mean difference between tx and control groups is 0.8 to 1 SD, practical significance has been defined as “high”
estimating d
Estimating d
  • Find group means, subtract them, and divide by the standard deviation.
  • When SDs for the groups are identical, hooray. When not, arguments have been made for using the control group SD, or the average of the two SDs.
    • My preference is the second, which is more conservative and strikes me as more appropriate when dealing with the large variability we see in many groups of patients with disorders
exercise 1 calculating effect size given group means and sd s
Exercise 1: Calculating effect size, given group means and SDs
  • Data from Arnold et al. (2004) study comparing scores on SNAP composite test after four types of treatment for ADHD

(Scores on SNAP composite; lower = better):

Treatment group Mean (SD)

Combined 0.92 (0.50)

Medical management 0.95 (0.51)

Behavioral 1.34 (0.56)

Community care 1.40 (0.54)

d demonstration comparing snap performance in combined and medical mgt groups
d demonstration, comparing SNAP performance in Combined and Medical Mgt groups

Combined 0.92 (0.50)

Medical management 0.95 (0.51)

d = 0.92-0.95/0.505 = -.03/.505 = -.0594

Interpretation: The Combined group scored about 6/100s of a standard deviation better (lower) than the Medical Mgt group (an extremely tiny difference; these treatment approaches resulted in virtually the same outcomes on the SNAP measure)

d for combined vs community care treatment groups
d for Combined vs Community Care treatment groups

Combined 0.92 (0.50)

Community care 1.40 (0.54)

d = 0.92-1.40/0.52 = -0.48/.52 = -0.92

Interpretation: The Combined group scored nearly a whole standard deviation better than the Community care group; this is a large effect size. Combined treatment is substantially better than Community care.

d for medical mgt vs behavioral treatment
d for Medical Mgt vs Behavioral treatment

Medical management 0.95 (0.51)

Behavioral 1.34 (0.56)

d = ?


d for medical mgt vs behavioral treatment15
d for Medical Mgt vs Behavioral treatment

Medical management 0.95 (0.51)

Behavioral 1.34 (0.56)

d = 0.95-1.34= -.39/.535 = -.72897 = -.73:

Interpretation: The Medical Mgt group scored about 3/4s of a SD better than the behavioral group. This is a solid effect size suggesting that Medical Mgt treatment was substantially more effective than Behavioral treatment.

exercise 1 interpreting d in the happy cases when it s reported
Exercise 1: Interpreting d in the happy cases when it’s reported
  • Treatment-difference effect sizes (Cohen’s d) from Arnold et al., 2004 (Table II, p. 45)

Combined vs Medical Management 0.06

Combined vs Behavioral 0.79

Combined vs Community Care 0.92

Medical Management vs Behavioral 0.72

Medical Mgt vs Community Care 0.85

Behavioral vs Community Care 0.11

  • Note that our calculated ds match these.
on to theme 3 an overview of evaluating precision
On to theme 3: an overview of evaluating precision
  • Precision is reflected by the width of the confidence interval (CI) surrounding a given finding
  • Any given finding is acknowledged to be an estimate of the “real” or “true” finding
  • CI reflects the range of values that includes the real finding with a known probability
  • A finding with a narrower CI is more precise (and thus more clinically useful) than a finding with a broader CI
evaluating precision cont
Evaluating precision (cont.)
  • CIs are calculated by adding and subtracting a multiple of the standard error for a finding/value (e.g., value + 1.96SE to determine the 95% CI)
  • standard error depends on sample size and reliability; larger samples and higher reliability will result in narrower CIs, all else being equal
  • Sackett et al. (2000) Appendix 1 shows how to calculate CIs by hand, and easy-to-use statistical programs (many free on the web) provide CIs when raw data are available.
finding and interpreting evidence of precision
Finding and interpreting evidence of precision
  • CIs for difference between means of 206 children receiving early TTP and 196 receiving late TTP for OME (Paradise et al. 2001)

Early Late 95% CI

PPVT 92 (13) 92 (15) -2.8 to 2.8

NDW 124 (32) 126 (30) -7.6 to 4.8

PCC-R 85 (7) 86 (7) -2.1 to 0.7

  • CIs are narrow thanks to large sample
Contrast with risk estimates for low PCC-R from smaller samples of children with (n=15) and without (n=47) OME-associated hearing loss (Shriberg et al., 2000)
  • Estimated risk was 9.60 (i.e., children with hearing loss were 9.6 times more likely to have low PCC-R at age 3 than children without
  • But 95% confidence interval was 1.08-85.58 – meaning that this increased risk was somewhere between none and a lot. Not very precise!
predict precision
Predict precision
  • In one study, children with histories of OME (n=10) had significantly lower scores on a competitive listening task than children without OME histories (n=13)


-6.8 (2.8) -9.7 (2.6) .016

  • How could you quantify importance?
  • What would you predict about precision?
when multiple studies of a question are available meta analysis
When multiple studies of a question are available, meta-analysis
  • Quantitative summary of effects across a number of studies addressing particular question, usually in the form of a d (effect size) statistic
  • In EBP evidence reviews, the highest quality evidence comes from meta-analysis of studies with strong validity, precision, and importance
evidence levels for evaluating quality of treatment studies a
Evidence levels for evaluating quality of treatment studiesa

Best Ia Meta-analysis of >1 randomized controlled

trial (RCT)

Ib Well-designed randomized controlled study

IIa Well-designed controlled study without


IIb Well-designed quasi-experimental study

III Well-designed non-experimental studies,

i.e., comparative, correlational, and case


Worst IV Expert committee report, consensus

conference, clinical experience of

respected authorities

a meta analysis of ome and speech and language casby 2001
A meta-analysis of OME and speech and language (Casby, 2001)
  • Casby (2001) summarized results of available studies of OME and children’s language
  • For global language abilities, the effect size for comparing mean language scores from children with and without OME histories was d = -.07.
  • Interpretation and a graphic representation
a more informative graphic for meta analyses
A more informative graphic for meta-analyses
  • Shows d from each study as well as associated 95% CI.
d and 95 ci boundaries for ome and vocabulary comprehension casby 2001
dand 95% CI boundaries for OME and vocabulary comprehension (Casby, 2001)


Paradise 00

Black 93

Lonigan 92

Roberts 91m

Upper 95% CI


Roberts 91l

Lower 5% CI

Teele 90

Lous 88

Teele 84












Better with OME

Worse with OME

Overall d = .001

the need for meta analyses in communication disorders
The need for meta-analyses in communication disorders
  • Relatively few have been conducted, primarily because many studies in our literature
    • have not been conducted using procedures that would warrant their inclusion
    • may have been conducted carefully, but have not reported the information required
  • CONSORT (www.consort-statement.org) and STARD (Bossuyt et al., 2003) statements as one solution
given that few meta analyses are available
Given that few meta-analyses are available
  • Rapidly identify best available evidence addressing the foreground question
  • Appraise it critically with respect to validity, precision, and importance
  • Use CAT format to summarize your appraisal in an organized, readily accessible (and update-able) way