1 / 27

Andrew Ho, Harvard Graduate School of Education

Plausible values, plausible transformations. Or, “some of my best friends are economists!” Discussion of Rothstein & von Davier. Andrew Ho, Harvard Graduate School of Education PIAAC Methodological Seminar, Organisation for Economic Co-operation Paris, France June 14, 2019.

cantonio
Download Presentation

Andrew Ho, Harvard Graduate School of Education

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Plausible values, plausible transformations. Or, “some of my best friends are economists!” Discussion of Rothstein & von Davier. Andrew Ho, Harvard Graduate School of Education PIAAC Methodological Seminar, Organisation for Economic Co-operation Paris, France June 14, 2019

  2. Jacob & Rothstein (2016) vs. Braun & von Davier (2017) Andrew Ho, Harvard Graduate School of Education

  3. Why are these LSAS issues important, now? • LSAS are Large Scale. In particular, LSAS target population-level inferences and comparisons (across subgroups, states, & countries). • LSAS are Low Stakes. They are held in high esteem by researchers and the public. They are not natural targets for political opposition or inflation. • LSAS are assessments. Not evaluations. They are designed for measurement. But they are (or would be!) natural tools for policy evaluations using current statistical and econometric techniques. • LSAS are oracular. Few understand how they work or what to do with available secondary data (plausible values). Andrew Ho, Harvard Graduate School of Education

  4. Three Essential Questions • Are currently released plausible values useful for answering causal questions? • We can’t always tell, so, no. • Test score scales are not equal-interval. Is this a problem? • No more than many other scales, so, no. • What should we do about this? • For 1, allow select researchers access to item-level data. • For 2, assess plausible transformations as a specification check. Andrew Jesse Matthias Andrew Jesse Matthias Andrew Ho, Harvard Graduate School of Education

  5. Andrew Ho, Harvard Graduate School of Education Source: https://nces.ed.gov/nationsreportcard/tdw/analysis/summary_proced_biases.aspx

  6. From Braun & von Davier (2017) with my comments Note that the secondary analysis model is typically a subset of the latent regression model used to generate the PVs. I’m not sure we can assume this for some folks, especially policy analysts. However, if variables beyond those in the latent regression are used in a secondary analysis, then biased estimates may result (Mislevy, 1991; Meng, 1994). Yes, well known. On the other hand, since the PVs generating model typically includes as many factors as are available (“kitchen-sink approach”: Graham 2012), even these additional variables may be effectively included by proxy, to the extent that they are correlated with the variables incorporated in the latent regression. To what extent is this? Andrew Ho, Harvard Graduate School of Education

  7. From Braun & von Davier (2017) with my comments JR also offer examples of situations where certain school-level characteristics are of interest but were not included in the conditioning model. Yes, this is the concern. In actual practice, this may not be a problem. Such characteristics are either drawn directly from items incorporated in the school questionnaire and are part of the conditioning, or indirectly, through inclusion of a dummy coded school identifier. It may not be a problem. If particular characteristics that become subsequently available are of interest, then supplementary latent regression models can be run to generate new PVs so as to ensure unbiased estimation. Yes, why not allow select folks to do that themselves? Andrew Ho, Harvard Graduate School of Education

  8. Three Essential Questions • Are currently released plausible values useful for answering causal questions? • We can’t always tell, so, no. • Test score scales are not equal-interval. Is this a problem? • No more than many other scales, so, no. • What should we do about this? • For 1, allow select researchers access to item-level data. • For 2, assess plausible transformations as a specification check. Andrew Jesse Matthias Andrew Jesse Matthias Andrew Ho, Harvard Graduate School of Education

  9. Wait, IRT does provide an equal-interval scale, if the model fits the data! Slope (Discrimination) Threshold (Difficulty) Latent Scale Probability (Correct) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

  10. Wait, IRT does provide an equal-interval scale, if the model fits the data! Slope (Discrimination) Threshold (Difficulty) Latent Scale Probability (Correct) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

  11. Wait, IRT does provide an equal-interval scale, if the model fits the data! Slope (Discrimination) Threshold (Difficulty) Latent Scale Probability (Correct) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

  12. Wait, IRT does provide an equal-interval scale, if the model fits the data! Slope (Discrimination) Threshold (Difficulty) Latent Scale Probability (Correct) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

  13. Wait, IRT does provide an equal-interval scale, if the model fits the data! Slope (Discrimination) Threshold (Difficulty) Latent Scale Probability (Correct) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

  14. Wait, IRT does provide an equal-interval scale, if the model fits the data! Slope (Discrimination) Threshold (Difficulty) Latent Scale Probability (Correct) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

  15. Wait, IRT does provide an equal-interval scale, if the model fits the data! Slope (Discrimination) Threshold (Difficulty) Latent Scale Probability (Correct) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

  16. Wait, IRT does provide an equal-interval scale, if the model fits the data! Slope (Discrimination) Threshold (Difficulty) Latent Scale Probability (Correct) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

  17. Wait, IRT does provide an equal-interval scale, if the model fits the data! Slope (Discrimination) Threshold (Difficulty) Latent Scale Probability (Correct) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

  18. Wait, IRT does provide an equal-interval scale, if the model fits the data! Threshold (Difficulty) Latent Scale Probability (Correct) Slope (Discrimination) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

  19. Wait, IRT does provide an equal-interval scale, if the model fits the data! Threshold (Difficulty) Latent Scale Probability (Correct) Slope (Discrimination) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

  20. Wait, IRT does provide an equal-interval scale, if the model fits the data! Threshold (Difficulty) Latent Scale Probability (Correct) Slope (Discrimination) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

  21. Wait, IRT does provide an equal-interval scale, if the model fits the data! Threshold (Difficulty) Latent Scale Probability (Correct) Slope (Discrimination) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

  22. Wait, IRT does provide an equal-interval scale, if the model fits the data! Threshold (Difficulty) Latent Scale Probability (Correct) Slope (Discrimination) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

  23. The scale is linear in the probits of correct responses to items. The scale renders normal the underlying response processes of respondents. A logit (log of the odds) approximates a probit: ProbitLogit*1.7 Threshold (Difficulty) Latent Scale Probability (Correct) Slope (Discrimination) Latent Scale () Latent Scale () Andrew Ho, Harvard Graduate School of Education

  24. So IRT’s is a scale that is linear in the log of the odds of correct responses to items.* • This does NOT imply that the resulting scale has universal equal-interval properties. • The consensus view in educational measurement is that the scale from well-fit IRT models is convenient, not cardinal (Ho, 2009; Lord, 1980; Yen, 1986; Zwick, 1992). • Monotone transformations of the scale fit the data equally well. • But this is true of many equal-interval scales, as Braun & von Davier note. • The limited equal-interval properties of IRT’s makes it a good starting point from which to evaluate sensitivity to transformations (e.g., Reardon & Ho, 2015). Andrew Ho, Harvard Graduate School of Education *3PL model interpretations are less elegant.

  25. And we know which analyses are scale-sensitive (Ho, 2008; Ho & Haertel, 2006). • A/B comparisons, whether treatment/control or focal/reference gaps, are not generally scale-sensitive. • A/B differences, whether interactions, gap trends, or differences-in-differences, are often scale-sensitive. Andrew Ho, Harvard Graduate School of Education

  26. Gap trends are almost always transformation-reversible (Ho & Haertel, 2006; see Bond’s talk) Andrew Ho, Harvard Graduate School of Education

  27. Three Essential Questions • Are currently released plausible values useful for answering causal questions? • We can’t always tell, so, no. • Test score scales are not equal-interval. Is this a problem? • No more than many other scales, so, no. • What should we do about this? • For 1, allow select researchers access to item-level data. • For 2, assess plausible transformations as a specification check. Andrew Jesse Matthias Andrew Jesse Matthias Andrew Ho, Harvard Graduate School of Education

More Related