1 / 38

Robert L. Linn

Validity of Inferences from Test-Based Educational Accountability Systems. Robert L. Linn.

purity
Download Presentation

Robert L. Linn

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Validity of Inferences from Test-Based Educational Accountability Systems Robert L. Linn Paper presented at National Evaluation Institute Sponsored by the Consortium on Educational Accountability and Teacher Evaluation (CREATE) and the Dallas Independent School District, Dallas, TX, July 7, 2006.

  2. State Accountability Systems • Most states had test-based accountability systems before the enactment of NCLB • Systems varied • Grades and subjects • Reporting results • School report cards • Sanctions and/or rewards

  3. State Accountability Systems • Systems varied • Current status • Progress • Combination • Assessing progress • Comparison of successive cohorts • Longitudinal tracking of individual students

  4. NCLB States required to adopt “challenging academic content standards” that “specify what children are expected to know and be able to do; content coherent and rigorous content; [and} encourage the teaching of advanced skills” (NCLB, 2001, part A, subpart 1, Sec. 1111, a (D).

  5. NCLB • States required to assess all students in grades 3 through 8 and one grade in high school in mathematics and reading/English language arts • Assessments must be aligned with state’s academic content standards

  6. Definition of Proficient Achievement • NCLB: States must “describe two levels of high achievement (proficient and advanced) [and] a third level of achievement (basic)” • Setting levels left to the states, but must have all students at “proficient” level by 2014

  7. Adequate Yearly Progress (AYP) • Central to the Accountability System of the No Child Left Behind (NCLB) Act of 2001 • States required to define AYP for the state, school districts, and schools in a way that enables all children to meet the state’s student achievement standards by 2014

  8. Annual Measurable Objectives (AMOs) • Target percentages proficient or better in mathematics and reading/language arts • Set each year from 2002 to 2014 so that they lead to 100% proficient or above in 2014

  9. Mixed Messages • State accountability system results and NCLB results often give conflicting indications of school success • A school that fails to make AYP may look good according to state system and vise versa

  10. Florida Example • 68% of schools got a grade of A or B and only 8.8% got a grade of D or F in 2004 • 77% of schools failed to make AYP in 2004 • 56% of schools that got an A in 2004 failed to make AYP that year

  11. Confusion Regarding Mixed Messages The frequently conflicting messages about how schools are performing from state accountability and NCLB accountability systems “are confusing to the public” (National Education Network, 2006, p. 8).

  12. Test-based Accountabilityand School Effectiveness • School effectiveness – a causal inference • Must be able to eliminate plausible alternative explanations

  13. Alternative Explanations • Prior achievement differences • Differences in student characteristics relevant to achievement • Differences in home support during school year

  14. Alternative Explanations (cont’d) • Score inflation: “a gain in scores that substantially overstates the improvement in learning it implies” (Koretz, 2005) • Differential inflation of test scores

  15. Inferences About School Effectiveness From AYP Inferences about school effectiveness from differences in student test performance at a fixed point in time are “scientifically indefensible” (Raudenbush, 2004)

  16. AYP and School Effectiveness Current status on achievement tests used for purpose of NDLB accountability is “contaminated with factors other that school performance, in particular the average level of achievement prior to entering first grade – average effects of student family and community characteristics on student growth from first grade through the grade in which the student is tested (Myers, 2000).

  17. Current Status vs. Progress Measures • If NCLB benchmarks are not reached, no amount of improvement can put a school in compliance with NCLB” (Public Education Network, 2006p. 10). • A large majority (85% of the public thinks that “school performance should be judged based on improvement shown” while only 13% think that it should be judged based on basis of the percentage of students who pass a test (Rose & Gallup, 2005, p. 55).

  18. Progress Measures Progress of successive cohorts of student and longitudinal tracking of students and value-added analyses can rule out some, but not all of the alternative explanations of school differences in performance.

  19. Value-Added Models • Value-added label implies causal interpretation of results. • But, causal claims are not justified. • Value-added analyses “should not be seen as estimating causal effects of teachers or schools, but rather as descriptive measure” Rubin, Stewart & Zanutto, 2004).

  20. Vertical Scales • Scores treated as if exchangeable, but they do not meet the requirements of equating: • Measure the same constructs • Equal difficulty • Nearly equal reliability

  21. Vertical Scales (cont’d) • Test difficulty increases with grade level by design • Mix of constructs changes with grade level • “For mathematics, for example, the tests at the 3rd grade measure predominately arithmetic skills. By 8th grade, the test shifts to problem solving, pre-algebra and algebra skills” (Reckase 2004)

  22. Making AYP • Conjunctive, multiple-hurdle approach • Many ways to fail but only one way to make AYP • Small school with homogeneous student body must clear 5 hurdles • Large school with diverse student body and enough students in each of 4 subgroups for disaggregated reporting must clear 21 hurdles

  23. Reporting on Subgroup Performance • Critical for monitoring the closing of gaps in achievement • No real relevance for small schools with homogeneous student bodies • However, it leads to many hurdles that large, diverse schools must meet

  24. Subgroup Gains in NAEP Mathematics Scale Scores (1996 to 2005)

  25. Closing Achievement Gaps: NAEP Mathematics Average Scale Scores (1996 to 2005)

  26. Subgroup Gains in NAEP Reading Scale Scores (1998 to 2005)

  27. Closing Achievement Gaps: NAEP Reading Average Scale Scores (1998 to 2005)

  28. Apparent gains and changes in achievement gaps using NAEP achievement levels depend onchoice of level, e.g., basic or above vs. proficient or above. See, for example, Holland, P. W. (2002). Two measures of change in gaps between CDFs of test score distributions. JEBS, 27, 3-17.

  29. Subgroup Gains in NAEP Mathematics Percent at or Above Basic or Proficient (1996 to 2005)

  30. Closing Achievement Gaps: NAEP Mathematics Percent at or Above Basic or Proficient (1996 to 2005)

  31. Gaps and Percent Proficient or Above “Using differences in percents above cut scores can give a confusing impression of a rather simple situation” (Holland, 2002). Need to look beyond percents basic or above or proficient or above – average scale scores and comparisons of score distributions

  32. Comparing States on Closing Gaps Gaps measured in terms of percent proficient or above on state assessments could be quite misleading due to the wide variation in the stringency of state definitions of the proficient performance standard.

  33. Conclusions 1. Test-based accountability systems used to infer relative school effectiveness, but the validity of such inferences is dubious at best. 2. Although NCLB emphasizes “scientifically-based research” the NCLB accountability results do not live up to that level of evidence. Inferences about school effectiveness based on AYP are not scientifically defensible.

  34. Conclusions (continued) 3. Causal inferences about school effectiveness are not justified from test-based accountability systems regardless of whether they rely on current status measures, progress of successive cohorts, or value-added analyses of longitudinal data. 4. Accountability results can still be valuable if treated as descriptive measures, and the source of hypotheses that can be followed up by collecting information about instructional practices, teacher, and student characteristics.

  35. Conclusions (continued) 5. Closing gaps in achievement is a worthwhile goal of NCLB and essential to achieving equity in education. 6. Measuring achievement gaps needs to involve more than tracking the percentage of students in various subgroups who are at the proficient level or above.

More Related