1 / 16

Douglas N. Harris University of Wisconsin at Madison

Evaluating and Improving Value-Added Modeling. Douglas N. Harris University of Wisconsin at Madison. Background. IES Teacher Quality Grant; Harris and Sass 2006 IES conference November mini-conference at UW-Madison Caveat: Multidisciplinary group but “econ-centric” presentation. Summary.

Download Presentation

Douglas N. Harris University of Wisconsin at Madison

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating and Improving Value-Added Modeling Douglas N. Harris University of Wisconsin at Madison

  2. Background • IES Teacher Quality Grant; Harris and Sass • 2006 IES conference • November mini-conference at UW-Madison • Caveat: Multidisciplinary group but “econ-centric” presentation

  3. Summary • Purposes of value-added modeling (VAM) • Criteria for evaluating VAM • Some problematic results • Methodological issues • A research agenda and upcoming conference

  4. Different Purposes • There are main purposes of value-added models: (1) VAM for program evaluation (VAM-P) (2) VAM for accountability (VAM-A) • In both cases, arguably trying to mimic random assignment experiments

  5. Criteria for Evaluating VAM • Different purposes, different criteria for evaluation: Criteria for VAM-P: validity and reliability of the program/policy effect parameter Criteria for VAM-A: validity and reliability of individual personnel effects • Meeting the criteria appears more difficult with VAM-A with hundreds or thousands of parameters

  6. Tentative, But Problematic, Findings • In some VAM-A models, teacher effects are unstable for individual teachers over time • When comparing teacher effects estimated from the same data but different VAM-A models, the results are weakly correlated • VAM-A teacher effects are imprecise, making it difficult to distinguish teacher effectiveness with the usual degree of confidence

  7. Methodological Issues • Assumptions about student test scores • Assumptions about teaching and learning • Others: amount of information, complexity of computation, missing data • Significance of methodological issues vary by purpose (VAM-P vs. VAM-A)

  8. Assumptions about Test Scores • VAM assumes that test scores are on an interval scale - In other words, a one-point increase means the same thing no matter where we start - In other other words, vertical scaling works • Some (many?) psychometricians believe that, despite best efforts, test scores are not really interval scale • Ad hoc adjustments may not solve the problem - non-linear term on right-hand side - grade-by-year fixed effects

  9. Assumptions about Learning • VAM models make assumptions about learning decay of past learning/inputs • All VAM models assume that nothing happens between the test administration and the beginning of the subsequent school year - summer learning loss • VAM models do NOT assume, however, that students learn “smoothly” - some express concern that students learn in spurts in ways that are independent of instructional quality

  10. Assumptions about Teaching • VAM-A assumes that the mediating factors influencing student achievement influence effectiveness of all teachers in the same way - e.g., class size • A specific and important example is the assumption that teachers are equally effective with all types of students

  11. Lots of Assumptions & Problems, But . . . • Even with modest validity and reliability, VAM-A could improve education: - The education system already uses student test scores—and uses them badly - Violations of assumptions per se do not invalidate VAM-A • Little question that VAM-P should be pursued

  12. Short-Term Research Agenda • Follow-up on earlier “problematic” findings - in progress: testing robustness of teacher effects across VAM-A models • Clarify assumptions being made in each type of VAM model • Test sensitivity of VAM results to test scaling (and test type) • Test whether teachers have different levels of effectiveness with different types of students (e.g., different initial test scores)

  13. Long-Term Research Agenda • Test VAM with experiments • Study the effects of VAM-A on school decision-making - Does VAM-A (w/o high stakes) appear to yield better decisions about, for example, the allocation of school resources? - Does VAM-A w/ merit pay result in higher test higher student scores? (i.e. use VAM-P to evaluate VAM-A) - Do these changes in scores reflect real improvements in learning or gaming the system? - Studies in progress

  14. For All Future VAM Work . . . • Be explicit about assumptions and their potential implications • Test the assumptions • Where assumptions fail, compare different models to test for robustness

  15. Steps Down the Path • A larger national conference in Madison, WI in Spring, 2008 • Co-Chairs: Harris, Gamoran, Raudenbush • Program Committee members: Braun, Lockwood, Meyer, Sass • Interdisciplinary • 10 commissioned papers, plus policy discussions

  16. Final Thoughts • There is considerable interest in VAM and policymakers are eager for direction • Is (or should be) near consensus that VAM-P is an important advance - policymakers should push forward in collecting student-level data with unique student identifiers • VAM-A is worth cautious experimentation and further study, but not yet widespread adoption with high-stakes

More Related