Validity of Inferences from Test-Based Educational Accountability Systems. Robert L. Linn.
Educational Accountability Systems
Robert L. Linn
Paper presented at National Evaluation Institute Sponsored by the Consortium on Educational Accountability and Teacher Evaluation (CREATE) and the Dallas Independent School District, Dallas, TX, July 7, 2006.
States required to adopt “challenging academic content standards” that “specify what children are expected to know and be able to do; content coherent and rigorous content; [and} encourage the teaching of advanced skills” (NCLB, 2001, part A, subpart 1, Sec. 1111, a (D).
The frequently conflicting messages about how schools are performing from state accountability and NCLB accountability systems “are confusing to the public” (National Education Network, 2006, p. 8).
Inferences about school effectiveness from differences in student test performance at a fixed point in time are “scientifically indefensible” (Raudenbush, 2004)
Current status on achievement tests used for purpose of NDLB accountability is “contaminated with factors other that school performance, in particular the average level of achievement prior to entering first grade – average effects of student family and community characteristics on student growth from first grade through the grade in which the student is tested (Myers, 2000).
Progress of successive cohorts of student and longitudinal tracking of students and value-added analyses can rule out some, but not all of the alternative explanations of school differences in performance.
Apparent gains and changes in achievement gaps using NAEP achievement levels depend onchoice of level, e.g., basic or above vs. proficient or above. See, for example, Holland, P. W. (2002). Two measures of change in gaps between CDFs of test score distributions. JEBS, 27, 3-17.
“Using differences in percents above cut scores can give a confusing impression of a rather simple situation” (Holland, 2002).
Need to look beyond percents basic or above or proficient or above – average scale scores and comparisons of score distributions
Gaps measured in terms of percent proficient or above on state assessments could be quite misleading due to the wide variation in the stringency of state definitions of the proficient performance standard.
1. Test-based accountability systems used to infer relative school effectiveness, but the validity of such inferences is dubious at best.
2. Although NCLB emphasizes “scientifically-based research” the NCLB accountability results do not live up to that level of evidence. Inferences about school effectiveness based on AYP are not scientifically defensible.
3. Causal inferences about school effectiveness are not justified from test-based accountability systems regardless of whether they rely on current status measures, progress of successive cohorts, or value-added analyses of longitudinal data.
4. Accountability results can still be valuable if treated as descriptive measures, and the source of hypotheses that can be followed up by collecting information about instructional practices, teacher, and student characteristics.
5. Closing gaps in achievement is a worthwhile goal of NCLB and essential to achieving equity in education.
6. Measuring achievement gaps needs to involve more than tracking the percentage of students in various subgroups who are at the proficient level or above.