The Evaluation of Teacher and School Effectiveness Using Growth Models and Value Added Modeling:
Download
1 / 60

Robert W. Lissitz University of Maryland - PowerPoint PPT Presentation


  • 54 Views
  • Uploaded on

The Evaluation of Teacher and School Effectiveness Using Growth Models and Value Added Modeling: Hope Versus Reality. Robert W. Lissitz University of Maryland. http://marces.org/Completed.htm. Thank you. First, I want to thank… The creators of this symposium Burcu Kaniskan

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Robert W. Lissitz University of Maryland' - tia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

The Evaluation of Teacher and School Effectiveness Using Growth Models and Value Added Modeling:Hope Versus Reality

Robert W. Lissitz

University of Maryland

http://marces.org/Completed.htm

Maryland Assessment Research Center for Education Success


Thank you
Thank you Growth Models and Value Added Modeling:

  • First, I want to thank…

  • The creators of this symposium

    • BurcuKaniskan

  • The State of Maryland

  • MARCES:

    • Laura Reiner, Yuan Zhang, Xiaoshu Zhu, and Dr. Bill Schafer

  • Drs. XiaodongHou and Ying Li

  • Yong Luo, Matt Griffin, Tiago Calico, and Christy Lewis


Preview
Preview Growth Models and Value Added Modeling:

  • History of VAM

  • Literature:

    • Reliability

    • Validity

  • Application of VAM

  • Direction of VAM in the future

    • Applied viewpoint

    • Psychometric viewpoint


Introduction and history
Introduction and History Growth Models and Value Added Modeling:

RACE TO THE MIDDLE

  • The federal government is asking psychometricians to help make decisions

    • Race to the Top

    • Earlier: No Child Left Behind (“Race to the Middle”)

  • The government wants a system that will

    • Pressure educational administrations to do the right thing

    • Combat the teachers’ unions perceived as obstacles


Introduction and History Growth Models and Value Added Modeling:

WHAT IS VAM?

  • Value-added modeling (VAM) is a system that we hope can determine the effectiveness of some mechanism

    • Usually teachers or schools

  • Most popular models include

    • Simple regression

    • Recording transitions between performance levels in adjacent grades

    • Mixed effects or multilevel regression models

      • Teacher or school as level 2 effect


Introduction and History Growth Models and Value Added Modeling:

WHAT IS VAM?

  • Results for each student are usually aggregated

    • Provides summaries of every student for each teacher

  • Attempt to show whether students associated with a teacher are performing above or below statistically expected values, or values associated with other teachers

  • Usually normative in nature


Introduction and History Growth Models and Value Added Modeling:

MANDEVILLE – late 1980’s

  • Investigated school effectiveness and reliability of indicators

  • Findings:

    • Some schools are better than others

    • Differences in quality are inconsistent

      • Across years

      • Within schools across grade levels and subject areas


Introduction and History Growth Models and Value Added Modeling:

DALLAS – mid-1990’S

  • 1994: School effects

  • 1995-1996: Teacher effects

  • Model with two stages:

    • Regression to control for “fairness variables”

      • Gender, ethnicity, English proficiency, SES, etc.

    • HLM to control for prior achievement, attendance, and school-level variables

  • High stakes decisions

    • Bonuses

    • Frequency of classroom observations


Introduction and History Growth Models and Value Added Modeling:

TVAAS – mid-1990’S

  • Sanders et al.

  • “Layered” multiple regression model

    • Effects of teachers and past teachers

  • Multiple years of prior performance on several subject matter exams

    • Used to covary out the effect of undesirable student characteristics on growth

  • Complex interactions could not be statistically removed

    • Effects may have different influence on students of different ability levels

    • Probably not possible to eliminate statistically

  • Future might look at latent classes of students and teachers


Introduction and History Growth Models and Value Added Modeling:

CHALLENGES– CRITICISM

  • Nonrandom assignment of students to teachers

    • Effect not controlled by use of prior performance level

    • Bias reduced by using multiple prior measures

  • “Dynamic” interaction between students and teachers

    • Association between teacher effectiveness and student characteristics

  • VAM for high-stakes decisions not for all

    • Many teachers with subjects not tested

    • Memphis, TN – VAM does not apply to 70% of teachers


reliability Growth Models and Value Added Modeling:

GENERALIZABILITY

  • Think of the reliability of VAM as a generalizability problem.

  • Is teacher effectiveness justified as a main effect, or are teachers actually effective in some circumstances and ineffective in others?

  • If interactions exist, the problem for the principal changes from “who is ineffective?” to “are there conditions in which this teacher can be effective?”


Reliability
Reliability Growth Models and Value Added Modeling:

STABILITY OVER A ONE-YEAR PERIOD

  • Mandeville (1988):

  • School effectiveness estimates were stable in the 0.34 to 0.66 range of correlations

    • Large differences across grade level and subject matter

  • McCaffrey (2009):

  • Teacher effect estimates one year apart had correlations around 0.2 to 0.3

  • Teaching itself may not be a stable phenomenon

    • Variability may be due to actual performance changes from year to year; instability may be intractable


Reliability Growth Models and Value Added Modeling:

STABILITY OVER A SHORT PERIOD OF TIME

  • Sass (2008) and Newton, et al (2010):

  • Estimates of teacher effectiveness from test-retest assessments over a short time period

  • Correlations in the range of 0.6

  • For high stakes testing, we usually require reliability greater than 0.8

  • Still may indicate a real phenomenon, but modest


Reliability Growth Models and Value Added Modeling:

STABILITY ACROSS GRADE AND SUBJECT

  • Mandeville & Anderson (1987) and others (Rockoff, 2004; Newton, et al, 2010):

  • Stability fluctuates across grade and subject matter

  • Limited stability found more often with math courses, less often with reading courses

  • Success depends on what class you are assigned rather than your ability?

    • Serious issues of fairness and comparability


Reliability Growth Models and Value Added Modeling:

STABILITY AT THE SCHOOL LEVEL

  • Perception that entire school is good or bad is very popular

  • St. Louis, early 1990’s

    • Challenged advisory committee to find a school that remained at the top 3 years in a row

    • No system that reported back had even one

  • FedBlue Ribbon Schools

    • “Winning school in one year was typically not at the top a year or two later”

  • Bottom line:

    • Rankings or groupings of schools (e.g., quintiles) are not stable.


Reliability Growth Models and Value Added Modeling:

STABILITY ACROSS TEST FORMS

  • Sass (2008):

  • Top quintile and bottom quintile seem the most stable

  • Correlation of teacher effectiveness in those groups was 0.48 across comparable exams over a short time

  • Time extended to a year between tests: correlation dropped to 0.27

  • Papay (2011):

  • Three different tests

  • Rank order correlations of teacher effectiveness across time ranged from 0.15 to 0.58 across different tests

  • Test timing and measurement error have effects


Reliability Growth Models and Value Added Modeling:

STABILITY ACROSS STATISTICAL MODELS

  • Tekwe, et al (2004):

  • Compared four regression models

  • Unless models involve different variables, results tend to be similar

  • Dawes (1979):

  • Linear composites seem to be pretty much the same regardless of how one gets the weights

  • Hill, et al (2011):

  • Convergent validity problem


Reliability Growth Models and Value Added Modeling:

STABILITY ACROSS CLASSROOMS

  • Newton, et al (2010):

  • Students who are less advantaged, ESL, or on a lower track can have a negative impact on teacher effect estimates

  • Multiple VAM models were tested

  • Success of matching teacher characteristics to VAM outcomes was modest

  • VAM could be used as a criterion to judge other variables, but validity is questionable


Reliability Growth Models and Value Added Modeling:

SOURCES OF UNRELIABILITY

  • Persistent effects (teacher consistency), non-persistent effects (inconsistency), and non-persistence due to sampling error (unknown)

  • 30-60% of variation is due to sampling error

    • In part due to small numbers of students as the basis of effectiveness estimates

  • Regression to the mean

    • Class sizes vary within a school or district

    • Classrooms with fewer students tend toward the mean

  • Bayes estimates in multilevel modeling also introduce bias that is a function of sample size

  • Other occupations: Lack of consistency is typical of complex professions – baseball players, stock investors…


Validity Growth Models and Value Added Modeling:

JOB APPLICATIONS AS PREDICTIVE MEASURES

  • Years of experience, advanced degrees, certification, licensure, school quality, etc. have low relationships (if any) to teacher effectiveness

    • Weak relationship between effectiveness and advanced degree

  • Knowledge of mathematics positively correlated with teaching mathematics effectively

  • VAM estimates provide better measures of teacher impact on student test scores than measures on teacher’s job application


Validity Growth Models and Value Added Modeling:

TRIANGULATION OF MULTIPLE INDICATORS

  • Goe, et al (2008):

  • Context forevaluation

  • Teachers should be compared to other teachers who:

    • Teach similar courses

    • In same grade

    • In a similar context

    • Assessed by same or similar examination

  • Probably necessary to establish validity


Validity Growth Models and Value Added Modeling:

COMPARABILITY

  • Ability is very likely correlated with growth and status

    • Do gifted students learn at the same rate as others?

    • Gifted students and their teachers have an advantage

  • Interaction between student ability and teachers’ ability to be effective

  • Mixture models are in development


Validity Growth Models and Value Added Modeling:

CAUSALITY, RESEARCH DESIGN, AND THEORY

  • Rubin (2004):

  • Missing data is not missing at random

    • Missing in a way that confounds results and complicates inferences

  • We do not have a clear idea what our hypothesis is

  • Multiple operational definitions of growth, but no developmental science for the phenomenon


Validity Growth Models and Value Added Modeling:

CAUSALITY, RESEARCH DESIGN, AND THEORY

  • Without carefully controlled experiments, we cannot isolate teacher effects

    • Students have multiple teachers

    • Influence of prior performance and experience

  • What do we even mean by causal effect?

    • How do teachers and schools impart their effect?

    • How is it internalized by the student?

  • Lord’s paradox

    • ANCOVA does not lead to unambiguous interpretations

  • Only experimental efforts will provide adequate results

  • Eminent faculty member: teacher decision-making - unclear what is optimal


Validity Growth Models and Value Added Modeling:

WHY SHOULD WE CARE?

  • Are teachers the most important factor determining student achievement?

    • Nye, et al (2004): 11% of variation in student gains explained by teacher effects

    • Rockoff (2004): Teacher effects 5.0-6.4%

      School effects 2.7-6.1%

      Student fixed effects 59-68%


Validity Growth Models and Value Added Modeling:

WHY SHOULD WE CARE?

  • Importance of classroom context

    • Kennedy (2010), etc.:

      • Situational factors influence teacher success

        • Time, materials, work assignments

      • Controlling behavioral issues; mainstreaming only students who are willing/capable to be non-disruptive

      • Technical assistance with teaching (computers..)

  • New teacher’s Goal: Maximize context for learning


Validity Growth Models and Value Added Modeling:

WHY SHOULD WE CARE?

  • New paradigm– different orientation toward the learning process

  • Teacher optimizes the context of the classroom

    • Adding to motivation

    • Preventing disruption

    • Providing opportunity for enhanced learning engagement

  • Use of assistive teaching devices (computers) will change teacher’s role

  • Develop a learning science

    • Current paradigm emphasizes external validity and immediate generality

    • Instead, create laboratory for education science


Validity Growth Models and Value Added Modeling:

WHY SHOULD WE CARE?

  • Fairness

    • Little evidence VAM is ready for high stakes use

    • But…

      Is it less fair than traditional personnel selection that focuses on advanced degrees and certificates, more credit hours, and working more years? Classroom observations?


OUR STUDY Growth Models and Value Added Modeling:

COMPARING MODELS USING REAL DATA

  • The MARCES Center has studied 11 of the simplest models that might be applied

  • The full VAM report and the full textsupportingthis presentation can be accessed at

  • http://marces.org/Completed.htm


OUR STUDY Growth Models and Value Added Modeling:

COMPARING MODELS USING REAL DATA

  • We obtained 3 years of data on the same students, linked to their teachers

  • Students divided into four cohorts: (N ≈ 5000 per cohort)

  • Math and reading data from yearly spring state assessment (2008-2010)

    • No vertical scale

    • Horizontally equated from year to year

  • VAM models chosen for comparison do not require vertical scaling

    • Nine models compare growth from first to second year

    • Two models compare growth from first and second to third year


TABLE 2: Growth Models and Value Added Modeling:

Data used in our study


OUR STUDY Growth Models and Value Added Modeling:

MODELS


OUR STUDY Growth Models and Value Added Modeling:

MODELS

  • BETEBENNER’S MODEL

  • Used in Colorado

  • Looks at conditional percentile of each student’s performance in the second year, compared to other students who started in same percentile the first year

  • Aggregates conditional percentiles of students exposed to the same teacher

  • QRG1 uses prior year to condition the percentile the next year

  • BETEBENNER’S MODEL

  • Used in Colorado

  • Looks at conditional percentile of each student’s performance in the second year, compared to other students who started in same percentile the first year

  • Aggregates conditional percentiles of students exposed to each teacher

  • ConD is a simplification: aggregates students into deciles one year and compares to deciles the second year

  • BETEBENNER’S MODEL

  • Used in Colorado

  • Looks at conditional percentile of each student’s performance in the second year, compared to other students who started in same percentile the first year

  • Aggregates conditional percentiles of students exposed to each teacher

  • QRG2 uses 2 prior years to condition the percentile the 3rd year


OUR STUDY Growth Models and Value Added Modeling:

MODELS

  • THUM’S MODEL

  • Similar to ConD, but looks at effect size

  • Uses z score to identify student’s performance level compared to the average student the first year

  • In second year, compares student’s z score to students who started at same z position (within a decile) in the prior year

  • Conditional z scores aggregated for each teacher to provide measure of effectiveness

  • THUM’S MODEL

  • Our simplification: z score conditional on prior deciles:

    • Rank order all students’ year one scale scores; divide into 10 deciles

    • Compute mean of year 2 scale scores for students within each decile

    • Compute deviation scores from the decile mean of year 2 scale scores for students within each decile

    • Compute pooled within-decile SD of year 2 scale scores

    • Compute growth z score for each student


OUR STUDY Growth Models and Value Added Modeling:

MODELS

  • ORDINARY LEAST SQUARES REGRESSION

  • Aggregates errors of prediction across teachers to see which teacher’s students tend to perform above or below prediction

OLS2

Independent variable: first two years’ scale scores

Effectiveness measure: deviation from expected scale score for year three

OLS1

Independent variable: first year scale score

Effectiveness measure: deviation from expected scale score for year two


OUR STUDY Growth Models and Value Added Modeling:

MODELS

  • REGRESSION USING SPLINE SCORES

  • Calculated with scores that had been transformed by a spline function

  • Gives relational meaning to points along the performance continuum across grades

    • Builds a quasi-vertical scale without common items

  • Transformation matched to cut scores for 3 proficiency levels: basic, proficient, advanced

OLSS applies ordinary least squares to the spline scale scores and looked at deviations from predicted

DIFS subtracts spline function transformed score at year 1 from the transformed score at year 2, as though they were a true vertical scale


OUR STUDY Growth Models and Value Added Modeling:

  • TRANSITION MODELS

  • Used in Delaware and Arkansas

  • Classify students into categories in year one (basic, proficient, advanced)

    • Divide each category into three subcategories

  • Observe year two category conditional on year one performance

  • Matrix associated with transition from level at year one to level at year two

    • Values represent importance of each transition; determined by educators

  • TRUG rewards students only for growth

  • Does not punish for regressing

  • TRANSITION MODELS

  • Used in Delaware and Arkansas

  • Classify students into categories in year one (basic, proficient, advanced)

    • Divide each category into three subcategories

  • Observe year two category conditional on year one performance

  • Matrix associated with transition from level at year one to level at year two

    • Values represent importance of each transition; determined by educators

  • TRUD values reflect growth as well as decreased performance

  • Does not reward for status

  • TRUG rewards students only for growth

  • Does not punish for regressing

  • Does not distinguish much between amounts of growth

  • TRUD values reflect growth as well as decreased performance

  • Does not reward for status

  • TRSG rewards students for maintaining previous status and for growth within and across performance levels

  • Reward increases with higher performance level status

  • TRANSITION MODELS

  • Used in Delaware and Arkansas

  • Classify students into categories in year one (basic, proficient, advanced)

    • Divide each category into three subcategories

  • Observe year two category conditional on year one performance

  • Matrix associated with transition from level at year one to level at year two

    • Values represent importance of each transition; determined by educators

  • TRSG rewards students for maintaining previous status and for growth within and across performance levels

  • Reward increases with higher performance level status

MODELS


OUR STUDY Growth Models and Value Added Modeling:

INTER-CORRELATION OF STUDENT GROWTH SCORES AND THEIR DIMENSIONALITY

  • Each student had growth calculation from year 1-2 and year 2-3

  • Factor analysisof student growth from these models intercorrelated for year 1-2 and replicated for 2-3

    • One dimension accounts for largest percentage of variance

    • Great deal of noise in results

    • Over 80% of variance undefined by first dimension

    • Results of factor analysis same for eachpair of years, for each cohort and foreach content area


OUR STUDY Growth Models and Value Added Modeling:

INTER-CORRELATION OF STUDENT GROWTH SCORES AND THEIR DIMENSIONALITY

  • Example: Scree Plot for Math 2008-2009, Cohort 1


OUR STUDY Growth Models and Value Added Modeling:

RELATION TO DEMOGRAPHIC VARIABLES AND PRE- AND POSTTEST SCORES

  • Growth in reading tends to be slightly more correlated with SES and race than growth in math

  • Correlations between TRSG and pre- and post-tests are strongest among all the models

    • Correlation between TRSG and pretest around 0.5

    • Correlation between TRSG and posttest around 0.8

  • Correlations otherwise…

    • Between pretest and regression-based models: low

    • Between pretest and transition-based models: medium

    • Between posttest and regression-based models: higher

    • Between posttest and transition-based models: lower


OUR STUDY Growth Models and Value Added Modeling:

THE CORRELATION BETWEEN GROWTH IN MATH AND GROWTH IN READING

  • Year 2008-2009


OUR STUDY Growth Models and Value Added Modeling:

THE CORRELATION BETWEEN GROWTH IN MATH AND GROWTH IN READING

  • Year 2009-2010


OUR STUDY Growth Models and Value Added Modeling:

THECORRELATION BETWEEN THE TWO GROWTH PERIODS (YEAR 1-2 AND YEAR 2-3)

  • Math


OUR STUDY Growth Models and Value Added Modeling:

THECORRELATION BETWEEN THE TWO GROWTH PERIODS (YEAR 1-2 AND YEAR 2-3)

  • Reading


OUR STUDY Growth Models and Value Added Modeling:

TEACHER EFFECTIVENESS AND TEACHER RELIABILITY

  • Square Root of Intra-Class Correlations for Year 2008-2009


OUR STUDY Growth Models and Value Added Modeling:

TEACHER EFFECTIVENESS AND TEACHER RELIABILITY

  • Square Root of Intra-Class Correlations for Year 2009-2010


OUR STUDY Growth Models and Value Added Modeling:

TEACHER EFFECTIVENESS AND TEACHER RELIABILITY

  • Year to Year Reliability of Teacher Effectiveness

  • Between 2008-2009 and 2009-2010


OUR STUDY Growth Models and Value Added Modeling:

SCHOOL EFFECTIVENESS AND SCHOOL RELIABILITY

  • Sq. root of School Intra-Class Correlation for Year 2008-2009


OUR STUDY Growth Models and Value Added Modeling:

SCHOOL EFFECTIVENESS AND SCHOOL RELIABILITY

  • Sq. root of School Intra-Class Correlation for Year 2009-2010


OUR STUDY Growth Models and Value Added Modeling:

SCHOOL EFFECTIVENESS AND SCHOOL RELIABILITY

  • Year to Year Reliability of School Effectiveness

  • Between 2008-2009 and 2009-2010


OUR STUDY Growth Models and Value Added Modeling:

COMPARISON BETWEEN SCHOOL AND TEACHER EFFECT

  • Levels of Effectiveness

  • 2008-2009


OUR STUDY Growth Models and Value Added Modeling:

COMPARISON BETWEEN SCHOOL AND TEACHER EFFECT

  • Levels of Effectiveness

  • 2009-2010


OUR STUDY Growth Models and Value Added Modeling:

METHODOLOGICAL ISSUES

  • Math Cohort 1 in Year 2008-2009


The model you use can make a difference

OUR CONCLUSIONS Growth Models and Value Added Modeling:

The model you use can make a difference

  • Deciding how to balance status against growth

  • No standardization for the modeling of VAM

  • Traditional qualitative approaches used by principals are not likely to be an improvement on VAM

  • Using either approach for high stakes testing and decision-making seems premature

    • Combining two procedures that are not valid will not necessarily result in a valid system


More sophisticated growth models

OUR CONCLUSIONS Growth Models and Value Added Modeling:

More sophisticated growth models

  • Would be nice to explore different models

    • Example: 4 level model

      • Many vertically scaled time points

      • Many subject matter assessments

      • Nested within students (level 2)

      • Nested within teachers (level 3)

      • Nested within school context (level 4)

    • Mixture and latent class models

      • Student and teachers as members of discrete groups that interact


Interactions should be modeled

OUR CONCLUSIONS Growth Models and Value Added Modeling:

Interactions should be modeled

  • Why model teacher effects…

    • as if all students react the same way?

    • as if all teachers are the same over time?

School and classroom context effects

Should be investigated as well

Implications for how to create a learning science

May add to the modest results for teachers and schools


Change in instruction involving supportive technology

OUR CONCLUSIONS Growth Models and Value Added Modeling:

change in instruction involving supportive technology

  • The transition (paradigm shift)may becloser than we think

  • Cognitive, computer, econometrician, engineering scientists are beginning to study education

    • Field can be expected to change as researchers and their students change

  • The nature of teachers and instructional decision-making

    • Radical changes for the better are expected


Vam for high stakes

OUR CONCLUSIONS Growth Models and Value Added Modeling:

Vam for high stakes

  • Right now, I do not encourage it

  • It makes a difference what VAM model we implement

  • Choose the model based on policy decisions that capture the goals and intent of the school system

  • Factors not in the teacher’s control have an effect


Relating vam to what teachers are doing

OUR CONCLUSIONS Growth Models and Value Added Modeling:

relating VAM to what teachers are doing

  • Create causal models and explore with experiments

Interested in implementing a vam?

Read Finlay and Manavi (2008)

  • Practical political issues of using VAM in schools

    • Unions, federal government, special education advocates…

  • Effective teaching requires good measurement and presents a great challenge and is a worthy goal…


Questions

Questions? Growth Models and Value Added Modeling:

Visit http://marces.org/Completed.htm to find references, the full text of this talk, our comparison of value-added models, and other projects.

Robert W. Lissitz

University of Maryland

Maryland Assessment Research Center for Education Success


ad