Value Added Measures: Implications for Policy and Practice

Value Added Measures: Implications for Policy and Practice Friday, May 23, 2008

Robert Gordon Center for American Progress May 23, 2008 Value-Added Measures: Implications for Policy & Practice

Overview • Because teachers matter so much and differ so much, we should act on, rather than ignore, the most relevant differences among teachers. • Value-added data should be one basis, though not the only one, for recognizing differences. • Given our record—especially for poor and minority children, and in spite of significant investments—we should test and evaluate a promising approach.

Certification Does Not Predict Performance Gordon, Kane, Staiger (2006)

Past Performance Predicts Future Performance Gordon, Kane, Staiger (2006)

Key Facts About Teacher Effectiveness From Aggregated Data • Teachers differ enormously in their effect on achievement: • Differences from top to bottom quartile are roughly twice the class size impacts in Tennessee STAR • Differences from top to bottom quartile are one-third to one-fourth of achievement gap • Difficult to predict success on front end based on degrees, qualifications, aptitude. • Best way to predict future performance is past performance.

Performance Matters Most, But We Don’t Act On It • Tenure often granted as a matter of course • Less than 2% of teachers with 1-3 years experience leave teaching and cite dismissal or involuntary transfer as reason (SASS 2000-01) • Although less effective teachers are somewhat likelier to exit in early years (Boyd et al. 2007), the differential is modest. Large numbers of teachers receive tenure even though they are less effective than novices. • Teachers often paid the same except • Master’s degree: no relation to achievement but $8 b in costs (Roza 2007) • Experience: relation to achievement in first 3-5 years—but difference much smaller than difference between top and bottom quartiles (less than half in Los Angeles data) • Rarely better pay in higher-need schools, even though harder work, greater need, and weaker teacher force

A New Deal for New Teachers? One model follows, but substance and implementation should vary based on dialogue with teachers. • Raise bar for tenure: Tenure only teachers top three quartiles, or tenure only teachers better than average novice. • Offer much higher pay through “master teacher” positions for high-achievers who stay in high-need schools for long periods • Focus is attraction and retention of effective teachers, not motivation • Eliminate barriers to hire and retention not related to performance, but offer mentoring, induction, and support based in practice • Offer awards and honors to all teachers at schools with strong gains

A New Deal for New Teachers? • Use combination of metrics: • Value-added measures, where available; • Qualitative evaluation (e.g., Danielson), implemented through • Principal reviews • Peer reviews • TAP, Denver offer examples of different approaches • Hold principals accountable for achievement outcomes • Apply new deal to all new teachers, but give existing teachers option to participate

Why Not? • This is the wrong approach by definition • Will drive competition rather than collaboration among teachers • Will drive teaching to the test and gaming of metrics • This could work, but metrics aren’t “ready for prime time” • No value-added measures in many subjects, grades • Tests don’t capture relevant learning • Value-added metrics are flawed

Wrong by Definition?Smart Policies Mitigate Objections • Preserving cooperation • Schoolwide as well as individual recognition • No forced choice among teachers at school • Absolute standards for achievement • Use value-added data to improve practice as well as evaluate • In other professions, recognition of excellence and teamwork are not considered mutually incompatible • Avoiding teaching to the test • Multiple measures, including observational assessment • Improved assessments, better aligned to standards and curriculum • Audits of tests for gaming, cheating

Metrics Could Be Better…But Existing Metrics Tell Us Something Too • We need improved metrics • Develop and validate more qualitative metrics • Develop value-added methods in high school • Improve tests! • Even so: • “[T]he earnings advantages to higher achievement on standardized tests are quite substantial …. [O]ne standard deviation increase in mathematics performance at the end of high schools translates into 12 percent higher annual earnings.” [Hanushek 2006] • “[T]est score gains during high school do predict subsequent employment status and earnings.” And “for low-scoring students, gains substantially affect earnings….” [Rose 2006; Jencks & Phillips 1999] • If we make high-stakes decisions about the future of children based on the tests we have, why not the same for teachers?

Are value-added metrics too flawed to use for evaluative purposes? • “Value-added data offer no useful information about teaching effectiveness.” Like height and weight statistics, or phrenology. • This is not true. • “Value-added data offer some useful information, but not enough that their evaluative use would improve student achievement.” Like the 100-yard dash in Moneyball. • There is good reason to believe this is not true. • “Use of value-added data would improve student achievement, but the improvement is not enough to justify the harms to some teachers.” Like abandoning “beyond a reasonable doubt.” • This is a value judgment.

Value-Added Estimates Persist with Random Assignment “The most important assumption made in value added modeling concerns the assignment of students to teachers.” (Rothstein 2007) Nye, Konstanotopoulos, and Hedges (2004) note “random assignment of students would assure that all observable and unobservable differences between students in different classes would be no larger than would be expected by chance.” They consider just such a study: Tennessee STAR. • Results “suggest that teacher effects are real and are of a magnitude that is consistent with that estimated by previous studies.” Value-added effects are not random effects.

Value-Added Estimates Predict Random Assignment Results Kane and Staiger (2008) compare value-added added estimates with actual test scores for classrooms randomly assigned among teachers. • “[T]hose classrooms assigned to teachers with higher non-experimental estimates of effectiveness scored higher on both math and English language arts at the end of the first school year following random assignment.” • Value-added score predicts gains under experimental conditions. • Value-added score, with student (or student + school) controls, may be an unbiased prediction of gains. Value-added estimates predict actual gains.

Value-Added Estimates Predict Principal Evaluations Value-added results are clearly correlated with principal evaluations, especially at the top and bottom of the distribution. Both measures have more predictive power than the certification and salary information on which we currently rely. • Jacob & Lefgren (2006) find that top category of teachers by VAM are correctly identified by principals 69% of the time for math (26% if random) and 52% of the time for reading (14% if random). • Past performance is best predictor of future performance. Principal evaluation is next. Teacher salary—incorporating experience and degree accumulation—has no predictive value. • In NYC sample, principals rate 11% of teachers as exceptional in math. Of these, 42% are in the top quartile for value-added in math (25% if random). Of 10% of teachers rated by principals as exceptional in ELA, 37% are in top quartile. (forthcoming) • Harris & Sass (forthcoming) find modest correlation (1 point on 1-9 scale = 0.05 standard deviations) . But they add: Experience, advanced degrees, and certification are not “statistically significant determinants of higher value added scores. In contrast, when a principal’s overall rating of a teacher is added to the model, its coefficient is positive and highly significant in both reading and math.”

Year-to-Year Comparisons • In San Diego elementary school study, 35% of teachers in top quintile in year 1 are in top quintile in year 2. Without student and school fixed effects, share is 50%. [Koedel & Betts 2007] • In Chicago high school sample, 40% of teachers in top quartile in year 1 are also in top quartile for year 2. Without school fixed effects, share is 60%. [Aaronson, Sander, and Barrow 2003] • Both of these studies compare a single year to a single year. A more policy-relevant comparison would aggregate years. In New York City sample, more than 75% of top quintile teachers after two years of data are also in top quintile in third year (without school effects). (forthcoming)

Use of Value-Added Results CouldImprove Achievement • For example: Consider tenure for only top 75% of teachers: • Even with noise in data, average gain in student achievement from retained teachers: 1.5 points • Average loss in achievement from higher share of novices: 0.3 points • Net effect: 1.2 percentile points per year. (Gordon, Kane, Staiger 2006) • If 1.2 points per year accumulated over 12 years, the gains would be enormous. • Fade-out may significantly limit the accumulation of gains. (Kane & Staiger 2008) • On the other hand, improving retention of high achieving teachers could result in significantly increased impact for each year.

Use of Value-Added Results CouldImprove Achievement • A new policy is always subject to uncertainty, and this is no exception. • Botched implementation • Changes in supply of new teachers • Changes in school ethos • We need to improve metrics, particularly qualitative proxies for achievement. This should be an immediate research focus. • Implementation uncertainties will not be resolved without implementation.

Is it fair to teachers? • How do we balance the interests of adults and children? • “Beyond a reasonable doubt”: When we worry enormously about adults. Tenure is a high-stakes decision for adults. • “Best interests of the child”: When we worry enormously about children. Tenure is a high-stakes decision for children. • New regime could be better for many teachers: • Increases legitimacy of tenure and builds supports for pay hikes • Offers new opportunities for effective teachers • Contributes to stronger team at school • Enables teachers to share evaluative responsibilities with administration • Preserves rights of existing teachers, while providing different rules for new teachers with different attitudes (Duffett et al. 2008)

“An ignorant man, who is not fool enough to meddle with his clock, is however sufficiently confident to think he can safely take to pieces, and put together at his pleasure, a moral machine of another guise, importance, and complexity.... [D]elusive good intention is no sort of excuse for … presumption.”

“The country needs and, unless I mistake its temper, the country demands, bold, persistent experimentation. It is common sense to take a method and try it: If it fails, admit it frankly and try another. But above all, try something.”

The “experiment” underway

The “experiment” underway Source: NCES

Value Added Measures: Implications for Policy and Practice

Value Added Measures: Implications for Policy and Practice

Presentation Transcript

Assessment Tools and Outcome Measures:

The Ph. Eur. policy on impurities

Managing with Measures for Performance Improvement

the slope of IS curve and its policy implications

Chapter 26 Monetary Policy

Measures of a Distribution’s Central Tendency, Spread, and Shape

Credit System: Concept, Policy, Practice and Procedure

Non-Tariff Measures

Substance Exposed Infants: Policy and Practice

Legal Implications in Nursing Practice

Chapter 3 Data Summary Using Descriptive Measures

CHAPTER 24

FOUR BASIC ARITHMETIC PROCESSES

Managing with Measures for Performance Improvement

Diamond Best Practice Principles (BPP) Training

Topic 5 Perspectives in Policy Studies: Discursive- Critical Perspective

From Policy to Practice and back to Policy

Commercial Policy – Part III History and Practice

PISA-PIRLS-Taskforce of IRA 17th European Conference on Reading 01-08-2011 (Mons, Belgium )

FSI Level IV