Teacher productivity models of employer learning
Download
1 / 51

Teacher Productivity & Models of Employer Learning - PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on

Teacher Productivity & Models of Employer Learning. Economic Models in Education Research Workshop University of Chicago April 7, 2011 Douglas O. Staiger Dartmouth College. Teacher Productivity & Models of Employer Learning. Teacher productivity Estimating value added models

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Teacher Productivity & Models of Employer Learning' - buffy-fletcher


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Teacher productivity models of employer learning

Teacher Productivity & Models of Employer Learning

Economic Models in Education Research Workshop

University of Chicago

April 7, 2011

Douglas O. Staiger

Dartmouth College


Teacher productivity models of employer learning1
Teacher Productivity & Models of Employer Learning

  • Teacher productivity

    • Estimating value added models

    • Statistical tests of model assumptions

    • Stability of the effects

  • Models of employer learning

    • Searching for effective teachers -- heterogeneity

    • Career concerns – heterogeneity & effort


Teacher productivity
Teacher Productivity

  • Huge non-experimental literature on “teacher effects”

    • Non-experimental studies estimate standard deviation in teacher-effect of .10 to .25 student-level standard deviations (2-5 percentiles) each year.

    • Key findings in non-exp literature:

      • Teacher effects unrelated to traditional teacher credentials

      • Payoff to experience steep in first 3 years but flat afterwards

      • Predict sizable differences with 1-3 years prior performance

    • One experimental study (TN class-size experiment) yields similar estimate of variance.






How are teacher effects estimated
How Are Teacher Effects Estimated? in LAUSD, by Prior Value Added

  • Growing use of “value added” estimates to identify effective teachers for pay, promotion, and professional development.

  • But growing concern that statistical assumptions needed to estimate teacher effect are strong and untested – are these “causal” effects of teachers?


Basics of value added analysis
Basics of Value Added Analysis in LAUSD, by Prior Value Added

Teacher value added compares actual student achievement to a counterfactual expectation

Difference between actual and expected achievement, averaged over teacher’s students (average residual)

Expected achievement is average achievement for students who looked similar at start of year

Same prior-year test scores

Same demographics, program participation

Same characteristics of peers in classroom


Estimating value added

Similar teacher residual by OLS, RE, FE ( in LAUSD, by Prior Value Addedβ driven by within variation). What matters is whether X includes baseline score & peer measures.

Estimating Value Added

1. Estimating Non-Experimental Teacher Effects


Estimating value added1
Estimating Value Added in LAUSD, by Prior Value Added

2. Generating Empirical Bayes Estimates of Non-Experimental Teacher Effects

.


Empirical bayes methods
Empirical Bayes Methods in LAUSD, by Prior Value Added

  • Goal: Forecast teacher performance next year (BLUP)

    • Forecast is prediction of persistent teacher component

      • “Shrinkage” estimator = posterior mean: E(μ|M) = Mβ

    • Weight (β) placed on measure increase with:

      • Correlation with persistent component of interest

      • Reliability with which measure is estimated(which may vary by teacher – e.g. based on sample size)

    • Can apply to any measure (value added, video rating, etc.)or combination of measures (composite estimates)


Error components
Error components in LAUSD, by Prior Value Added

  • Performance measure (Mjc) for teacher j in classroom c is noisy estimate of persistent teacher effect (μj).

  • Noise consists of two independent components:

    • classroom component (θjc) representing peer effects, etc.

    • sampling error (νjc) if measure averages over students, videos, raters, etc. (variance depends on sample size)

  • Model for error prone measure: Mjc = μj + θjc + νjc


Prediction in simple case
Prediction in simple case in LAUSD, by Prior Value Added

  • Using one measure (M) to predict teacher performance on a possibly different measure (M') in a different classroom simplifies to predicting the persistent teacher component:E(μ'j|Mjc) = Mjcβj

  • Optimal weights (βj) analogous to regression coefficients:βj = Cov(μ'j,Mjc)/Var(Mjc) = Cov(μ'j,μj)/[Var(μj)+Var(θjc)+Var(νjc)] = {Cov(μ'j,μj)/Var(μj)}*{Var(μj)/[Var(μj)+Var(θjc)+Var(νjc)]} = {β if Mjc had no noise}*{reliability of Mjc}


Two key measurement problems
Two Key Measurement Problems in LAUSD, by Prior Value Added

Reliability/Instability

Imprecision  transitory measurement error

E.g., low correlation across classrooms

Validity/Bias

Persistently misrepresent performance (e.g. student sorting)

Test scores capture only one dimension of performance

Depends on design, content, & scaling of test

Validity & reliability determine a measures ability to predict performance

Correlation of measure with true performance = (correlation of persistent part of measure with true performance) * (square root of reliability)

E.g., Teacher certification versus value added


Statistical tests of model assumptoins
Statistical Tests of Model Assumptoins in LAUSD, by Prior Value Added

  • Experimental forecasting test (Kane & Staiger)

  • Observational specification tests (Rothstein)

  • Quasi-experimental forecasting test (Carrell & West)


What kane staiger do
What Kane/Staiger do in LAUSD, by Prior Value Added

  • Randomly assign 78 pairs of teachers to classrooms in LAUSD elementary schools

  • Provides experimental estimate of parameter of interest

    • If a given classroom of students were to have teacher A rather than teacher B, how much different would their average test scores be at the end of the year?

  • Evaluate whether pre-experimental estimates from various value-added models predict experimental results


Experimental design
Experimental Design in LAUSD, by Prior Value Added

  • All NBPTS applicants from Los Angeles area.

  • For each NBPTS applicant, identified comparison teachers working in same school, grade, calendar track.

  • LAUSD chief of staff wrote letters to principals inviting them to draw up two classrooms that they would be willing to assign to either teacher.

  • If principal agreed, classroom rosters (not individual students) were randomly assigned by LAUSD on the day of switching. LAUSD made paper copies of rosters on day of switch.

  • Yielded 78 pairs of teachers (156 classrooms and 3500 students) for whom we had estimates of “value-added” impacts from the pre-experimental period.


Lausd data

All standardized by grade and year in LAUSD, by Prior Value Added.

LAUSD Data

  • Grades 2 through 5

  • Three Time Periods:

    • Years before Random Assignment: Spring 2000 through Spring 2003

    • Years of Random Assignment: Either Spring 2004 or 2005

    • Years after Random Assignment: Spring 2005 (or 2006) through Spring 2007

  • Outcomes:

    • California Standards Test (Spring 2004- 2007)

    • Stanford 9 Tests (Spring 2000 through 2002)

    • California Achievement Test (Spring 2003)

  • Covariates:

    • Student: baseline math and reading scores (interacted with grade), race/ethnicity (hispanic, white, black, other or missing), ever retained, Title I, Eligible for free lunch, Gifted and talented, Special education, English language development (level 1-5).

    • Peers: Means of all the above for students in classrooms.

    • Fixed Effects: School x Grade x Track x Year

  • Sample Exclusions:

    • >20 percent special education classes

    • Fewer than 5 and more than 36 students in class


Evaluating value added
Evaluating Value Added in LAUSD, by Prior Value Added

3. Test validity of VAj against experimental outcomes

.


How much of the variance in 2p 1p is explained by va 2p va 1p
How much of the variance in ( in LAUSD, by Prior Value Addedμ2p –μ1p) is “explained” by (VA2p –VA1p)?


Not clear how to interpret fade out
Not Clear How To Interpret Fade-out in LAUSD, by Prior Value Added

  • Forgetting, transitory teaching-to-test Value added overstates long-term impact

  • Knowledge that is not used becomes inoperable Need string of good teachers to maintain effect

  • Grade-specific content of tests not cumulative Later tests understate contribution of current teacher

  • Students of best teachers mixed with students of worst teachers in following year, and new teacher will focus effort on students who are behind (peer effects). no fade-out if teachers were all effective


Reconciling with rothstein 2010
Reconciling with Rothstein in LAUSD, by Prior Value Added(2010)


correlation with VAM4: .94 .93 .98 .998

  • Both of us find that past teachers have lingering effects due to fade-out.

  • Rothstein finds that richer set of covariates has negligible effects.

  • While Rothstein speculates that selection on unobservables could cause problems, our results fail to find evidence of bias.


Reconciling kane staiger with rothstein
Reconciling Kane/Staiger with Rothstein .98 .998

  • Both Rothstein and Kane/Staiger find evidence of fade-out

    • Rothstein finds current student gain is associated with past teacher assignment, conditional on student’s prior test score.

      • Consistent with fade out of prior teacher’s effect in Kane/Staiger

    • Bias in current teacher effect depends on correlation between current & past teacher value added (small in Rothstein & Kane/Staiger data).

  • Both Rothstein and Kane/Staiger find that after conditioning on prior test score, other observables don’t matter much

    • Rothstein finds prior student gain is associated with current teacher assignment, conditional on student’s prior test score .

      • i.e., current teacher assignment is associated with past 2 tests.

    • Rothstein (and others) finds that controlling for earlier tests has little effect on estimates of teacher effect (corr>.98)

  • Rothstein speculates that other unobservables used to track students may bias estimates of teacher effects

    • Kane/staiger find no substantial bias from such omitted factors


Carrell west a cautionary tale
Carrell/West – A Cautionary Tale! .98 .998

  • Quasi-experimental evidence from AF Academy

    • Randomized to classes, common test & grading

    • Estimate teacher effect in 1st-year intro classes

    • Does it predict performance in 2nd-year class?

  • Strong evidence of teaching-to-test

    • Big teacher effects in 1st year

      • Lower rank have larger “value added” & satisfaction

      • But predicts worse performance in 2nd year class

    • AF system facilitated teaching to test


Summary of statistical tests
Summary of Statistical Tests .98 .998

  • Value-added estimates in low-stakes environment yielded unbiased predictions of causal effects of teachers on short-term student achievement

    • controlling for baseline score yielded unbiased predictions

    • Further controlling for peer characteristics yielded highest explanatory power, explaining over 50% of teacher variation

  • Relative differences in achievement between teacher’s students fade-out at annual rate of .4-.6. Understanding the mechanism is key to long-term benefits of using value added.

  • Performance measures can go wrong when easily gamed.


Are teacher effects stable
Are Teacher Effects Stable? .98 .998

  • Different across students within a class? No.

  • Change over time?

    • Correlation falls slowly at longer lags

    • Teacher peer effects (Jackson/Bruegman)

    • Effect of evaluation on performance (Taylor/Tyler)

  • Depend on match/context?

    • Correlation falls when change grades, course.

    • Correlation falls when change schools (Jackson)


How should value added be used
How Should Value Added Be Used? .98 .998

Growing use of value added to identify effective teachers for pay, promotion, and professional development

Concern that current value added estimates are too imprecise & volatile to be used in high-stakes decisions

Year-to-year correlation (reliability) around 0.3-0.5

Of top quartile one year, >10% in bottom quartile next year

No systematic analysis of what this evidence implies for how measures could be used


Models of employer learning
Models of Employer Learning .98 .998

Motivating facts

Large persistent variation across teachers (heterogeneity)

Difficult to predict at hire (not inspection good)

Predictable after hire (experience good  learning)

Return to experience in first few years (cost of hiring)


Searching for effective teachers
Searching For Effective Teachers .98 .998

Use simple search model to illustrate how one could use imperfect information on effectiveness to screen teachers

Use estimates of model parameters from NYC & LAUSD to simulate the potential gains from screening teachers

Evaluate potential gains from:

Observing teacher performance for more years

Obtaining more reliable information on teacher performance

Obtaining more reliable information at time of hire


Simple search model setup
Simple search model: Setup .98 .998

Teacher effect: μ~N(0,σμ2)

Pre-hire signal (if available)

Y0~N(μ, σ02 ), reliability = σμ2 /(σμ2 + σ02)

#applicants = 10 times natural attrition

Constraint: #hired = #dismissed + natural turnover

Annual performance on the job (t=1,…,30)

Yt~N(μ + βt, σ2 ), reliability = σμ2 /(σμ2 + σ2)

Return to experience: βt<0 for early t, cost of hiring

Exogenous annual turnover rate (t<30): π

Can dismiss up until tenure at t=T


Simple search model solution
Simple search model: Solution .98 .998

Objective: Maximize student achievement by screening out ineffective teachers using imperfect performance measure

Solution is similar to Jovanovic (1979) matching model

Principal sets reservation value (rt), increasing with t

dismiss after period t if E(μ|Y0,.., Yt) < rt

from normal learning model:

Reservation value increases because of declining option value

No simple analytic solution to general model

Numerically estimate optimal rt through simulations


Tenure cutoff in simple case
Tenure cutoff in simple case .98 .998

Suppose:

No pre-hire signal (new hire is random draw)

Tenure after 1 year (no option value)

Return to experience only in year 1 (β1<0)

f.o.c.: marginal tenured teacher = average teacher next year


Simulation assumptions from nyc lausd
Simulation assumptions from NYC & LAUSD .98 .998

Maintained assumptions across all simulations

SD of teacher effect: σμ2 = 0.15 (in student SD units; national black-white gap = .8-.9)

Turnover rate if not dismissed: π = 5%

Assumptions for simplest base case (will be varied later)

No useful information at time of hire

Reliability of Yt: σμ2 /(σμ2 + σ2) =0.4 (40% reliability)

Cost of hiring new teacher: βt = -.07 in 1st year, -.02 in 2nd year

Dismissal only after first year (e.g. tenure decision after 1 year)



Why dismiss so many probationary teachers
Why dismiss so many probationary teachers? .98 .998

  • Differences in teacher effects are large & persistent, relative to short-lived costs of hiring a new teacher

  • Even unreliable performance measures predict substantial differences in teacher effects

    • Costs of retaining an ineffective teacher outweigh costs of dismissing an effective teacher

  • Option value of new hires

    • For every 5 new hires, one will be highly effective

    • Trade off short-term cost of 4 dismissed vs. long-term benefit of 1 retained


Why not dismiss so many probationary teachers
Why .98 .998not dismiss so many probationary teachers?

  • Smaller benefits than assumed in the model?

    • High turnover rates

    • Teacher differences that do not persist in future(including if PD can help ineffective teachers)

    • High stakes  distortion of performance measures

  • Larger costs than assumed in the model?

    • Direct costs of recruiting/firing (little effect if added)

    • Difficulty recruiting applicants (but LAUSD did)

    • Higher pay required to offset job insecurity(particularly if require teacher-training up front)


Requiring a 2 nd or 3 rd year to evaluate a probationary teacher is a bad idea
Requiring .98 .998 a 2nd or 3rd year to evaluate a probationary teacher is a bad idea.


Allowing a 2 nd or 3 rd year to evaluate a probationary teacher is a good idea
Allowing .98 .998 a 2nd or 3rd year to evaluate a probationary teacher is a good idea.


Obtaining more reliable information on teacher performance is valuable little effect on dismissal
Obtaining more reliable information on teacher performance is valuable, little effect on dismissal


Obtaining more reliable information at time of hire is valuable, little effect on dismissalis even more valuable, and reduces dismissal rate.


Implications
Implications is valuable, little effect on dismissal

Why do principals set a low tenure bar?

Poor incentives (private schools?)

Lack verifiable performance information

Current up-front training requirements (not necessary?)

Lose best teachers if cannot raise pay

Why don’t other occupations & professions dismiss 80%?

Job ladder – low-stakes entry-level job used to screen

MD, JD – require up-front training, job differentiation later

Alternatives to current system

No up-front investment – can train later

Rather than credentials, base certification on performance

Develop “job ladder” pre-screen – e.g. initial job where few students put at risk, but reveals your ability (summer school?)


Summary
Summary is valuable, little effect on dismissal

Potential gain is large

Could raise average annual achievement gains by ≈0.08

Similar magnitude to STAR class-size experiment and to recent results from charter school lotteries

Gains could be doubled if had more reliable performance measure, and tripled if observed this pre-hire

Select only the most effective teachers, and do it quickly

May be practical reasons limiting success of this strategy

May require rethinking teacher training & job ladder

Focused on screening, but other uses may yield large gains


Combining heterogeneity effort model of career concerns
Combining Heterogeneity & Effort: Model of Career Concerns is valuable, little effect on dismissal

  • Gibbons & Murphy (1992)

    • Output (yt) is the sum of ability (η), effort (at), and noise (et).

    • Workers risk-averse, convex costs of effort

    • Information imperfect, but symmetric, so firms pay expected output.


Combining heterogeneity effort model of career concerns1
Combining Heterogeneity & Effort: Model of Career Concerns is valuable, little effect on dismissal

  • Gibbons & Murphy (1992)

    • Simple optimal linear contract:wt = ct + bt*( yt - ct ), and ct = at* + mt-1

    • Base pay (ct) is expected value at t-1 of output at t, and is sum of two terms:

      • equilibrium effort (at*) – an experience effect.

      • Posterior mean of ability (mt-1) based on earlier output

    • Incentive payment depends on:

      • How much output exceeds expectations ( yt - ct )

      • Weight (bt*) declines with noise in yt, and grows with experience – early effort rewarded indirectly through impact on beliefs (mt-1)


Implications for teacher contract
Implications For Teacher Contract is valuable, little effect on dismissal

  • Base pay determined by experience & Empirical Bayes estimate of performance (pay grades)

  • Incentive payment later in career (after tenure) based on performance relative to others in your pay grade

  • Incentives depend on noise in value added – may be worse in some subjects (ELA) or small classes – so could base on percentile rank within class size & subject categories.


ad