140 likes | 215 Views
This paper examines retrospective measurement using Item Response Theory to detect beta change in evaluations by comparing it with traditional methods. It discusses the concept of beta change, IRT models, item parameters, differential functioning of items and tests, and presents detailed analyses and findings. The study involved participants from the US and New Zealand using a skill assessment instrument. Results show promise for IRT-based methods in detecting beta change, while highlighting limitations in sample size and methodological considerations.
E N D
Measuring Change Retrospectively:An Examination Based onItem Response Theory S. Bartholomew CraigKaplan DeVries, Inc. Charles J. Palus & Sharon RogolskyCenter for Creative Leadership
How to obtain this paper • Download it from the Internet athttp://sbcraig.com • Request an electronic copy by sending email to bcraig@kaplandevries.com • The old fashioned way…Pick up a paper copy from the front of the room
Overview • Retrospective measurement as a solution to the problem of beta change (response shift bias) • Beta change as differential item / test functioning (DIF / DTF) • Comparison of retrospective measurement and IRT-based DIF methods for detecting beta change
What is Beta Change? • Golembiewski, Billingsley, & Yeager (1976) • “response shift bias” (Howard & Dailey, 1979) • “instrumentation bias” (Campbell & Stanley, 1963) • Occurs when the intervention being evaluated alters raters’ mental frame of reference • Renders pre- and post-intervention scores incomparable • Post-intervention metric usually more severe
Item Response Theoryand Beta Change • IRT models the probability of item responses as a function of item characteristics and the latent trait being measured (θ). • When probabilities of responses are different for different groups, differential item functioning (DIF) is said to occur. • By treating pre- and post-intervention raters as separate groups, IRT can identify beta change as DIF.
Item Response TheoryItem Parameters • As a set, an item’s parameters define its relation to the latent trait (θ). • Discrimination parameter (a) • one per item • higher values mean better discrimination among individuals • Difficulty parameters (b) • each item has one for each response category • indicates point on θ where 50% choose that category
Differential Functioning ofItems and Tests (DFIT) • Raju, Van der Linden, & Fleer (1995) • Models DIF as difference in expected item response for raters with identical perceptions of ratee performance • Allows for item- and scale-level analyses • NCDIF • DTF • CDIF
Participants and Measure • 415 raters from US and New Zealand • 29 focal program participants were rated • raters were superiors, peers, and subordinates • analyses used sample sizes from 20 to 278 • 54 items from the SkillChange 360° assessment instrument were analyzed • “now” and “about one year ago” ratings at Times 1 & 2 • 9-point response format (collapsed to 4) • unidimensional • Cronbach’s alpha = .97
Analyses • Beta change assessed as in past research • repeated measures ANOVA using observed scores (N = ~20) • examine for significant pre-then differences • Beta change assessed using IRT / DFIT • compare pre-then, pre-post, and post-then • Ns = 278, 78, 59 • Assessed convergence of the two methods
Conclusions • IRT-based DIF methods have promise for detecting beta change • Most items do not exhibit beta change, regardless of detection method used • Pre-Then differences are not a reliable indicator of beta change (FPs & FNs) • Post-Then differences are generally on the same metric, but what they reflect is uncertain
Limitations • Small sample size (especially post-program) • Rater groups collapsed(insufficient N to examine self ratings alone) • Response categories collapsed (results may not generalize to uncollapsed 9-pt. scale) • Experiment-wise Type I error rate > .05