Rachel A. Gordon University of Illinois at Chicago Kerry G. Hofer Peabody Research Institute, Vanderbilt University. Assuring Quality Preschool: Where Are We and Where Do We Need to Go?.
Rachel A. GordonUniversity of Illinois at Chicago
Kerry G. Hofer
Peabody Research Institute, Vanderbilt University
Assuring Quality Preschool: Where Are We and Where Do We Need to Go?
Presentation in the Presidential Session on Universal Preschool: What Have We Learned, and What Does It Mean for Practice and Policy? Annual Meeting of the American Educational Research Association (April 6, 2014).
We present some results from our preliminary investigations in this new project. Although we have confidence in what is presented here, these analyses are the first steps towards a more thorough look at the validity of two quality measures. The results may change as we move forward, including as we revise the details of the regression models and the meta-analytic techniques used and as we take the products through peer review.
Policy initiatives focus on high-quality preschool…
high quality early childhood education
high-quality early learning programs
ECERS-R:Average overall score: At least 4.5 with no classroom below a 4.0, verified by on-site independent assessment
CLASS:Emotional support and classroom organization average scores above 5.0 with no classroom below 4.0, as verified by on-site independent assessment
I will begin by examining evidence regarding whether the scales predict child outcomes.
This aspect of validity is relevant to policy, to the extent that public investments in early care and education are meant to promote optimal child development and school readiness.
Over 400 indicators across the 43 items!
Source: Harms, T., Clifford, R.M., & Cryer, D. (1998). Early Childhood Environment Rating Scale, Revised Edition. New York, NY: Teachers College Press.
If higher scores reflect higher quality, then average quality scores should be higher for centers rated in higher categories versus lower categories.
In item response theory models, the thresholds between categories should also show a stair-step progression, if they are ordered so that higher categories mark higher quality.
Source: Gordon, Rachel A., Ken Fujimoto, Robert Kaestner, Sanders Korenman, and Kristin Abner. 2013. “An Assessment of the Validity of the ECERS-R with Implications for Assessments of Child Care Quality and its Relation to Child Development.” Developmental Psychology, 49: 146-160
SourceGordon, Rachel, Kerry Hofer, Ken Fujimoto, Nicole Colwell, Robert Kaestner, Sanders Korenman. “Measuring Aspects of Child Care Quality Specific to Domains of Child Development: An Indicator-level Analysis of the ECERS-R.” Presented in the Paper Symposium "Measuring Early Care and Education Quality: New Insights about the Early Childhood Environment System Rating Scale - Revised" (Chair: Rachel Gordon Discussant: Margaret Burchinal) (Saturday April 20 2013, Seattle WA).
Source: Pianta, R. C., La Paro, K. M., & Hamre, B. K. (2008). Classroom Assessment Scoring System Manual, PreK. Baltimore MD: Brookes Publishing.
This body of evidence highlights the way in which measures developed for other purposes have been adopted for high stakes policy uses.
Not surprisingly, there are limitations in the validity of these measures for this high stakes purpose.
As a concrete example, if it is desirable to distinguish classrooms that fall above and below specific thresholds, as in current policy uses, then measures with very high information (and low error) at those thresholds are needed.
If instead it is desirable to invest public dollars in improving quality through coaching, then we would like to have a measure (or two linked measures) that cover the continuum of quality over which growth is expected.
As another example, it is essential to think carefully about variation in quality across children, classrooms, times of day, days of week, and weeks of year.
We currently have very little evidence about such variation -- and the extent to which choices about when and how to observe classrooms affects measure validity -- including for high stakes uses.