effect sizes in education research what they are what they mean and why they re important l.
Skip this Video
Loading SlideShow in 5 Seconds..
Effect Sizes in Education Research: What They Are, What They Mean, and Why They’re Important PowerPoint Presentation
Download Presentation
Effect Sizes in Education Research: What They Are, What They Mean, and Why They’re Important

Loading in 2 Seconds...

play fullscreen
1 / 44

Effect Sizes in Education Research: What They Are, What They Mean, and Why They’re Important - PowerPoint PPT Presentation

  • Uploaded on

Effect Sizes in Education Research: What They Are, What They Mean, and Why They’re Important. Howard Bloom (MDRC; Howard.Bloom2@mdrc.org) Carolyn Hill (Georgetown; cjh34@georgetown.edu) Alison Rebeck Black (MDRC; alison.black@mdrc.org) Mark Lipsey (Vanderbilt; mark.lipsey@vanderbilt.edu).

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Effect Sizes in Education Research: What They Are, What They Mean, and Why They’re Important' - Jims

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
effect sizes in education research what they are what they mean and why they re important

Effect Sizes in Education Research: What They Are, What They Mean, and Why They’re Important

Howard Bloom

(MDRC; Howard.Bloom2@mdrc.org)

Carolyn Hill

(Georgetown; cjh34@georgetown.edu)

Alison Rebeck Black

(MDRC; alison.black@mdrc.org)

Mark Lipsey

(Vanderbilt; mark.lipsey@vanderbilt.edu)

Institute of Education Sciences 2006 Research Conference

Washington DC

today s session
Today’s Session
  • Goal: introduce key concepts and issues
  • Approach: focus on nexus between analytics and interpretation
  • Agenda
    • Core concepts
    • Empirical benchmarks
    • Important applications
starting point
Starting Point
  • Statistical significance vs. substantive importance
  • Effect size measures for continuous outcomes (our focus)
  • Effect size measures for discrete outcomes
variance components framework
Variance components framework

Decomposing the total national variance

career academies and future earnings f or young men
Career Academies andFuture Earnings for Young Men

Impact on


Dollars per month increase $212

Percentage increase 18 %

Effect size 0.30s

aspirin and heart attacks
Rate of

Heart Attacks

With placebo 1.71 %

With aspirin 0.94 %

Difference 0.77 %

Effect Size0.06 s

Aspirin and heart attacks

Measures of Effect Size,” in Harris Cooper and Larry V. Hedges, The Handbook of Research Synthesis (New York: Russell Sage Foundation)

five year impacts of the tennessee class size experiment
Five-year impacts of the Tennessee class-size experiment


13-17 versus 22-26 students per class

Effect sizes:

0.11s to 0.22s for reading and math

Findings were summarized from Nye, Barbara, Larry V. Hedges and Spyros Konstantopoulos (1999) “The Long-Term Effects of Small Classes: A Five-Year Follow-up of the Tennessee Class Size Experiment,” Educational Evaluation and Policy Analysis, Vol. 21, No. 2: 127-142.

part 2 what s a big effect size and how to tell

Part 2: What’s a Big Effect Size, and How to Tell?

Carolyn Hill, Georgetown University

Alison Rebeck Black, MDRC

how big is the effect
Need to interpret an effect size when:

Designing an intervention study

Interpreting an intervention study

Synthesizing intervention studies

To assess practical significance of an effect size:

Compare to external criterion/standard

Related to outcome construct

Related to context

How Big is the Effect?
prevailing practice for interpreting effect size rules of thumb


Small = 0.20 s

Medium = 0.50 s

Large = 0.80 s

Cohen, Jacob (1988) Statistical Power Analysis for the Behavioral Sciences 2nd edition (Hillsdale, NJ: Lawrence Erlbaum).



Small = 0.15 s

Medium = 0.45 s

Large = 0.90 s

Lipsey, Mark W. (1990) Design Sensitivity: Statistical Power for Experimental Research (Newbury Park, CA: Sage Publications).

Prevailing Practice for Interpreting Effect Size: “Rules of Thumb”
preferred approaches for assessing effect size k 12
Preferred Approaches for Assessing Effect Size (K-12)
  • Compare ES from the study with:
    • ES distributions from similar studies
    • Student attainment of performance criterion without intervention
    • Normative expectations for change
    • Subgroup performance gaps
    • School performance gaps
es distribution from similar studies












Effect Size (σ)

ES Distribution from Similar Studies

Percentile distribution of 145 achievement effect sizes from meta-analysis of comprehensive school reform studies (Borman et al. 2003):


Normative Expectations for Change:Estimating Annual Reading and Math Gains in Effect Size from National Norming Samples for Standardized Tests

  • Seven tests were used for reading and six tests were used for math
  • The mean and standard deviation of scale scores for each grade were obtained from test manuals
  • The standardized mean difference across succeeding grades was computed
  • These results were averaged across tests and weighted according to Hedges (1982)
annual reading and math growth

Annual Reading and Math Growth

Reading Math

Grade Growth Growth

Transition Effect Size Effect Size


K - 1 1.59s 1.13s

1 - 2 0.94 1.02

2 - 3 0.57 0.83

3 - 4 0.37 0.50

4 - 5 0.40 0.59

5 - 6 0.35 0.41

6 - 7 0.21 0.30

7 - 8 0.25 0.32

8 - 9 0.26 0.19

9 - 10 0.20 0.22

10 - 11 0.21 0.15

11 - 12 0.03 0.00


Based on work in progress using documentation on the national norming samples for the CAT5, SAT9, Terra Nova CTBS, Gates MacGinitie, MAT8, Terra Nova CAT, and SAT10.

demographic performance gaps from selected tests
Demographic Performance Gaps from Selected Tests
  • Interventions may aim to close demographic performance gaps
  • Effectiveness of interventions can be judged relative to the size of gaps they are designed to close
  • Effect size gaps vary across grades, years, tests, and districts
performance gaps between average and weak schools
Performance Gaps between “Average” and “Weak” Schools
  • Main idea:
    • What is the performance gap (effect size) for the same types of students in different schools?
  • Approach:
    • Estimate a regression model that controls for student characteristics: race/ethnicity, prior achievement, gender, overage for grade, and free lunch status.
    • Infer performance gap (effect size) between schools at different percentiles of the performance distribution
interpreting the magnitude of effect sizes
Interpreting the Magnitude of Effect Sizes
  • “One size” does not fit all
  • Instead, interpret magnitudes of effects in context
    • Of the interventions being studied
    • Of the outcomes being measured
    • Of the samples/subsamples being examined
  • Consider different frames of reference in context, instead of a universal standard:
    • ES distributions, external performance criteria, normative change, subgroup/school gaps, etc.
part 3 using effect sizes in power analysis and research synthesis

Part 3: Using Effect Sizes in Power Analysis and Research Synthesis

Mark W. Lipsey

Vanderbilt University

statistical power
Statistical Power
  • The probability that a true intervention effect will be found statistically significant.
estimating statistical power prospectively finding the mde
Estimating Statistical Power Prospectively: Finding the MDE


  • alpha level– conventionally .05
  • sample size (at all levels if multilevel design)
  • correlation between any covariates to be used and dependent variable
  • intracluster correlation coefficients (ICCs) if multilevel design
  • target power level– conventionally set at .80

Estimate: minimum detectable effect size

assessing the mde
Assessing the MDE
  • Compare with a target effect size-- the smallest ES judged to have practical significance in the intervention context
  • Design is underpowered if MDE > target (back to the drawing board)
  • Design is adequately powered if

MDE ≤ target value

where do you get the target value for practical significance
Where Do You Get the Target Value for Practical Significance?
  • NOT some broad rule of thumb, e.g, Cohen’s “small,” “medium,” and “large”
  • Use a frame of reference appropriate to the outcome, population, and intervention
    • meaningful success criterion
    • research findings for similar interventions
    • change expected without intervention
    • gaps between relevant comparison groups
    • et cetera
selecting the target mde
Selecting the Target MDE
  • Identify one or more reference frames that may be applicable to the intervention circumstances
  • Use that frame to guide selection of an MDE; involve other stakeholders
  • Use different reference frames to consider:
    • which is most applicable to the context
    • how sensitive the choice is to the frames
    • what the most conservative selection might be
power for different target mdes 2 level design students in classrooms
Power for Different Target MDEs(2-level design: students in classrooms)






Number of Classrooms of N=20

power for different target mdes same with classroom covariate r 2 50
Power for Different Target MDEs(same with classroom covariate R2 =.50)






Number of Classrooms of N=20

interpreting effect sizes found in individual studies meta analysis
Interpreting Effect Sizes Found in Individual Studies & Meta-Analysis
  • The practical significance of empirically observed effect sizes should be interpreted using approaches like those described here
  • This is especially important when disseminating research results to practitioners and policymakers
  • For standardized achievement measures, the practical significance of ES values will vary by student population and grade.
example computer assisted instruction for beginning reading grades 1 4
Example: Computer-Assisted Instruction for Beginning Reading (Grades 1-4)

Consider an MDE = .25

  • Mean ES=.25 found in Blok et al 2002 meta-analysis
  • 27-65% increase over “normal” year-to-year growth depending on age
  • About 30% of the Grade 4 majority-minority achievement gap

Bloom, Howard S. 2005. “Randomizing Groups to Evaluate Place-Based Programs.” In Howard S. Bloom, editor. Learning More from Social Experiments: Evolving Analytic Approaches. New York: Russell Sage Foundation, pp. 115-172.

Bloom, Howard S. 1995. “Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs.” Evaluation Review 19(5): 547-56.

Borman, Geoffrey D., Gina M. Hewes, Laura T. Overman, and Shelly Brown. 2003. “Comprehensive School Reform and Achievement: A Meta-Analysis.” Review of Educational Research 73(2): 125-230.

Hedges, Larry V. 1982. “Estimation of Effect Size from a Series of Independent Experiments.” Psychological Bulletin 92(2): 490-499.

Kane, Thomas J. 2004. “The Impact of After-School Programs: Interpreting the Results of Four Recent Evaluations.” William T. Grant Foundation Working Paper, January 16. http://www.wtgrantfoundation.org/usr_doc/After-school_paper.pdf

Konstantopoulos, Spyros, and Larry V. Hedges. 2005. “How Large an Effect Can We Expect from School Reforms?” Working paper #05-04, Institute for Policy Research, Northwestern University. http://www.northwestern.edu/ipr/publications/papers/2005/WP-05-04.pdf.

Lipsey, Mark W. 1990. Design Sensitivity: Statistical Power for Experimental Research. Thousand Oaks, CA: Sage Publications.

Schochet, Peter Z. 2005. “Statistical Power for Random Assignment Evaluations of Education Programs.” Project report submitted by Mathematic Policy Research, Inc. to Institute of Education Sciences, U.S. Department of Education. http://www.mathematica-mpr.com/publications/PDFs/statisticalpower.pdf

contact information

Contact Information

Howard Bloom


Carolyn Hill


Alison Rebeck Black


Mark Lipsey