slide1 l.
Skip this Video
Loading SlideShow in 5 Seconds..
Robert L. Linn PowerPoint Presentation
Download Presentation
Robert L. Linn

Loading in 2 Seconds...

play fullscreen
1 / 38

Robert L. Linn - PowerPoint PPT Presentation

  • Uploaded on

Validity of Inferences from Test-Based Educational Accountability Systems. Robert L. Linn.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Robert L. Linn' - purity

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Validity of Inferences from Test-Based

Educational Accountability Systems

Robert L. Linn

Paper presented at National Evaluation Institute Sponsored by the Consortium on Educational Accountability and Teacher Evaluation (CREATE) and the Dallas Independent School District, Dallas, TX, July 7, 2006.

state accountability systems
State Accountability Systems
  • Most states had test-based accountability systems before the enactment of NCLB
  • Systems varied
    • Grades and subjects
    • Reporting results
    • School report cards
    • Sanctions and/or rewards
state accountability systems3
State Accountability Systems
  • Systems varied
    • Current status
    • Progress
    • Combination
  • Assessing progress
    • Comparison of successive cohorts
    • Longitudinal tracking of individual students

States required to adopt “challenging academic content standards” that “specify what children are expected to know and be able to do; content coherent and rigorous content; [and} encourage the teaching of advanced skills” (NCLB, 2001, part A, subpart 1, Sec. 1111, a (D).

  • States required to assess all students in grades 3 through 8 and one grade in high school in mathematics and reading/English language arts
  • Assessments must be aligned with state’s academic content standards
definition of proficient achievement
Definition of Proficient Achievement
  • NCLB: States must “describe two levels of high achievement (proficient and advanced) [and] a third level of achievement (basic)”
  • Setting levels left to the states, but must have all students at “proficient” level by 2014
adequate yearly progress ayp
Adequate Yearly Progress (AYP)
  • Central to the Accountability System of the No Child Left Behind (NCLB) Act of 2001
  • States required to define AYP for the state, school districts, and schools in a way that enables all children to meet the state’s student achievement standards by 2014
annual measurable objectives amos
Annual Measurable Objectives (AMOs)
  • Target percentages proficient or better in mathematics and reading/language arts
  • Set each year from 2002 to 2014 so that they lead to 100% proficient or above in 2014
mixed messages
Mixed Messages
  • State accountability system results and NCLB results often give conflicting indications of school success
  • A school that fails to make AYP may look good according to state system and vise versa
florida example
Florida Example
  • 68% of schools got a grade of A or B and only 8.8% got a grade of D or F in 2004
  • 77% of schools failed to make AYP in 2004
  • 56% of schools that got an A in 2004 failed to make AYP that year
confusion regarding mixed messages
Confusion Regarding Mixed Messages

The frequently conflicting messages about how schools are performing from state accountability and NCLB accountability systems “are confusing to the public” (National Education Network, 2006, p. 8).

test based accountability and school effectiveness
Test-based Accountabilityand School Effectiveness
  • School effectiveness – a causal inference
  • Must be able to eliminate plausible alternative explanations
alternative explanations
Alternative Explanations
  • Prior achievement differences
  • Differences in student characteristics relevant to achievement
  • Differences in home support during school year
alternative explanations cont d
Alternative Explanations (cont’d)
  • Score inflation: “a gain in scores that substantially overstates the improvement in learning it implies” (Koretz, 2005)
  • Differential inflation of test scores
inferences about school effectiveness from ayp
Inferences About School Effectiveness From AYP

Inferences about school effectiveness from differences in student test performance at a fixed point in time are “scientifically indefensible” (Raudenbush, 2004)

ayp and school effectiveness
AYP and School Effectiveness

Current status on achievement tests used for purpose of NDLB accountability is “contaminated with factors other that school performance, in particular the average level of achievement prior to entering first grade – average effects of student family and community characteristics on student growth from first grade through the grade in which the student is tested (Myers, 2000).

current status vs progress measures
Current Status vs. Progress Measures
  • If NCLB benchmarks are not reached, no amount of improvement can put a school in compliance with NCLB” (Public Education Network, 2006p. 10).
  • A large majority (85% of the public thinks that “school performance should be judged based on improvement shown” while only 13% think that it should be judged based on basis of the percentage of students who pass a test (Rose & Gallup, 2005, p. 55).
progress measures
Progress Measures

Progress of successive cohorts of student and longitudinal tracking of students and value-added analyses can rule out some, but not all of the alternative explanations of school differences in performance.

value added models
Value-Added Models
  • Value-added label implies causal interpretation of results.
  • But, causal claims are not justified.
  • Value-added analyses “should not be seen as estimating causal effects of teachers or schools, but rather as descriptive measure” Rubin, Stewart & Zanutto, 2004).
vertical scales
Vertical Scales
  • Scores treated as if exchangeable, but they do not meet the requirements of equating:
    • Measure the same constructs
    • Equal difficulty
    • Nearly equal reliability
vertical scales cont d
Vertical Scales (cont’d)
  • Test difficulty increases with grade level by design
  • Mix of constructs changes with grade level
    • “For mathematics, for example, the tests at the 3rd grade measure predominately arithmetic skills. By 8th grade, the test shifts to problem solving, pre-algebra and algebra skills” (Reckase 2004)
making ayp
Making AYP
  • Conjunctive, multiple-hurdle approach
  • Many ways to fail but only one way to make AYP
    • Small school with homogeneous student body must clear 5 hurdles
    • Large school with diverse student body and enough students in each of 4 subgroups for disaggregated reporting must clear 21 hurdles
reporting on subgroup performance
Reporting on Subgroup Performance
  • Critical for monitoring the closing of gaps in achievement
  • No real relevance for small schools with homogeneous student bodies
  • However, it leads to many hurdles that large, diverse schools must meet

Apparent gains and changes in achievement gaps using NAEP achievement levels depend onchoice of level, e.g., basic or above vs. proficient or above. See, for example, Holland, P. W. (2002). Two measures of change in gaps between CDFs of test score distributions. JEBS, 27, 3-17.

closing achievement gaps naep mathematics percent at or above basic or proficient 1996 to 2005
Closing Achievement Gaps: NAEP Mathematics Percent at or Above Basic or Proficient (1996 to 2005)
gaps and percent proficient or above
Gaps and Percent Proficient or Above

“Using differences in percents above cut scores can give a confusing impression of a rather simple situation” (Holland, 2002).

Need to look beyond percents basic or above or proficient or above – average scale scores and comparisons of score distributions

comparing states on closing gaps
Comparing States on Closing Gaps

Gaps measured in terms of percent proficient or above on state assessments could be quite misleading due to the wide variation in the stringency of state definitions of the proficient performance standard.


1. Test-based accountability systems used to infer relative school effectiveness, but the validity of such inferences is dubious at best.

2. Although NCLB emphasizes “scientifically-based research” the NCLB accountability results do not live up to that level of evidence. Inferences about school effectiveness based on AYP are not scientifically defensible.

conclusions continued
Conclusions (continued)

3. Causal inferences about school effectiveness are not justified from test-based accountability systems regardless of whether they rely on current status measures, progress of successive cohorts, or value-added analyses of longitudinal data.

4. Accountability results can still be valuable if treated as descriptive measures, and the source of hypotheses that can be followed up by collecting information about instructional practices, teacher, and student characteristics.

conclusions continued38
Conclusions (continued)

5. Closing gaps in achievement is a worthwhile goal of NCLB and essential to achieving equity in education.

6. Measuring achievement gaps needs to involve more than tracking the percentage of students in various subgroups who are at the proficient level or above.