Andrew Ho, Harvard Graduate School of Education

Plausible values, plausible transformations. Or, “some of my best friends are economists!” Discussion of Rothstein & von Davier. Andrew Ho, Harvard Graduate School of Education PIAAC Methodological Seminar, Organisation for Economic Co-operation Paris, France June 14, 2019

Jacob & Rothstein (2016) vs. Braun & von Davier (2017) Andrew Ho, Harvard Graduate School of Education

Why are these LSAS issues important, now? • LSAS are Large Scale. In particular, LSAS target population-level inferences and comparisons (across subgroups, states, & countries). • LSAS are Low Stakes. They are held in high esteem by researchers and the public. They are not natural targets for political opposition or inflation. • LSAS are assessments. Not evaluations. They are designed for measurement. But they are (or would be!) natural tools for policy evaluations using current statistical and econometric techniques. • LSAS are oracular. Few understand how they work or what to do with available secondary data (plausible values). Andrew Ho, Harvard Graduate School of Education

Three Essential Questions • Are currently released plausible values useful for answering causal questions? • We can’t always tell, so, no. • Test score scales are not equal-interval. Is this a problem? • No more than many other scales, so, no. • What should we do about this? • For 1, allow select researchers access to item-level data. • For 2, assess plausible transformations as a specification check. Andrew Jesse Matthias Andrew Jesse Matthias Andrew Ho, Harvard Graduate School of Education

Andrew Ho, Harvard Graduate School of Education Source: https://nces.ed.gov/nationsreportcard/tdw/analysis/summary_proced_biases.aspx

From Braun & von Davier (2017) with my comments Note that the secondary analysis model is typically a subset of the latent regression model used to generate the PVs. I’m not sure we can assume this for some folks, especially policy analysts. However, if variables beyond those in the latent regression are used in a secondary analysis, then biased estimates may result (Mislevy, 1991; Meng, 1994). Yes, well known. On the other hand, since the PVs generating model typically includes as many factors as are available (“kitchen-sink approach”: Graham 2012), even these additional variables may be effectively included by proxy, to the extent that they are correlated with the variables incorporated in the latent regression. To what extent is this? Andrew Ho, Harvard Graduate School of Education

From Braun & von Davier (2017) with my comments JR also offer examples of situations where certain school-level characteristics are of interest but were not included in the conditioning model. Yes, this is the concern. In actual practice, this may not be a problem. Such characteristics are either drawn directly from items incorporated in the school questionnaire and are part of the conditioning, or indirectly, through inclusion of a dummy coded school identifier. It may not be a problem. If particular characteristics that become subsequently available are of interest, then supplementary latent regression models can be run to generate new PVs so as to ensure unbiased estimation. Yes, why not allow select folks to do that themselves? Andrew Ho, Harvard Graduate School of Education

Wait, IRT does provide an equal-interval scale, if the model fits the data! Slope (Discrimination) Threshold (Difficulty) Latent Scale Probability (Correct) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

Wait, IRT does provide an equal-interval scale, if the model fits the data! Threshold (Difficulty) Latent Scale Probability (Correct) Slope (Discrimination) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

The scale is linear in the probits of correct responses to items. The scale renders normal the underlying response processes of respondents. A logit (log of the odds) approximates a probit: ProbitLogit*1.7 Threshold (Difficulty) Latent Scale Probability (Correct) Slope (Discrimination) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

So IRT’s is a scale that is linear in the log of the odds of correct responses to items.* • This does NOT imply that the resulting scale has universal equal-interval properties. • The consensus view in educational measurement is that the scale from well-fit IRT models is convenient, not cardinal (Ho, 2009; Lord, 1980; Yen, 1986; Zwick, 1992). • Monotone transformations of the scale fit the data equally well. • But this is true of many equal-interval scales, as Braun & von Davier note. • The limited equal-interval properties of IRT’s makes it a good starting point from which to evaluate sensitivity to transformations (e.g., Reardon & Ho, 2015). Andrew Ho, Harvard Graduate School of Education *3PL model interpretations are less elegant.

And we know which analyses are scale-sensitive (Ho, 2008; Ho & Haertel, 2006). • A/B comparisons, whether treatment/control or focal/reference gaps, are not generally scale-sensitive. • A/B differences, whether interactions, gap trends, or differences-in-differences, are often scale-sensitive. Andrew Ho, Harvard Graduate School of Education

Gap trends are almost always transformation-reversible (Ho & Haertel, 2006; see Bond’s talk) Andrew Ho, Harvard Graduate School of Education

Andrew Ho, Harvard Graduate School of Education

Andrew Ho, Harvard Graduate School of Education

Presentation Transcript

Harvard University Graduate School of Design

Copter Commander Andrew Ho, Designer

PSU Graduate School of Education

MULTIPLE INTELLIGENCES THEORY Howard Gardiner- Professor of Cognition and Education at the Harvard Graduate School of

School of Adult and Graduate Education

Bridget Terry Long Harvard Graduate School of Education and NBER

Yiran Zhao International Education Policy Harvard Graduate School of Education

Thomas Kane Harvard Graduate School of Education

Eileen Connell Berger, MS Ed. – Harvard Graduate School of Education

William C. Symonds Director Pathways to Prosperity Project Harvard Graduate School of Education

Harvard School of Public Health

Graduate School of Education

Sarah Dryden-Peterson Doctoral Candidate, Harvard Graduate School of Education

PSU Graduate School of Education

Melbourne Graduate School of Education

Sabina Neugebauer, Harvard Graduate School of Education, United States of America

Hirokazu Yoshikawa, Ph.D. Harvard Graduate School of Education August 27, 2012

Harvard Law School

Harvard Kennedy School

Harvard School of Public Health

John B. Willett and Judith D. Singer Harvard Graduate School of Education

Bridget Terry Long Angela Boatman Harvard Graduate School of Education IES Conference June 2010