Exploring Retrospective Measurement for Beta Change Detection

Measuring Change Retrospectively:An Examination Based onItem Response Theory S. Bartholomew CraigKaplan DeVries, Inc. Charles J. Palus & Sharon RogolskyCenter for Creative Leadership

How to obtain this paper • Download it from the Internet athttp://sbcraig.com • Request an electronic copy by sending email to bcraig@kaplandevries.com • The old fashioned way…Pick up a paper copy from the front of the room

Overview • Retrospective measurement as a solution to the problem of beta change (response shift bias) • Beta change as differential item / test functioning (DIF / DTF) • Comparison of retrospective measurement and IRT-based DIF methods for detecting beta change

What is Beta Change? • Golembiewski, Billingsley, & Yeager (1976) • “response shift bias” (Howard & Dailey, 1979) • “instrumentation bias” (Campbell & Stanley, 1963) • Occurs when the intervention being evaluated alters raters’ mental frame of reference • Renders pre- and post-intervention scores incomparable • Post-intervention metric usually more severe

Item Response Theoryand Beta Change • IRT models the probability of item responses as a function of item characteristics and the latent trait being measured (θ). • When probabilities of responses are different for different groups, differential item functioning (DIF) is said to occur. • By treating pre- and post-intervention raters as separate groups, IRT can identify beta change as DIF.

Item Response TheoryItem Parameters • As a set, an item’s parameters define its relation to the latent trait (θ). • Discrimination parameter (a) • one per item • higher values mean better discrimination among individuals • Difficulty parameters (b) • each item has one for each response category • indicates point on θ where 50% choose that category

Differential Functioning ofItems and Tests (DFIT) • Raju, Van der Linden, & Fleer (1995) • Models DIF as difference in expected item response for raters with identical perceptions of ratee performance • Allows for item- and scale-level analyses • NCDIF • DTF • CDIF

“Carefully weighs consequences of contemplated action”

“Competent at dealing with people’s feelings”

Participants and Measure • 415 raters from US and New Zealand • 29 focal program participants were rated • raters were superiors, peers, and subordinates • analyses used sample sizes from 20 to 278 • 54 items from the SkillChange 360° assessment instrument were analyzed • “now” and “about one year ago” ratings at Times 1 & 2 • 9-point response format (collapsed to 4) • unidimensional • Cronbach’s alpha = .97

Analyses • Beta change assessed as in past research • repeated measures ANOVA using observed scores (N = ~20) • examine for significant pre-then differences • Beta change assessed using IRT / DFIT • compare pre-then, pre-post, and post-then • Ns = 278, 78, 59 • Assessed convergence of the two methods

Results

Conclusions • IRT-based DIF methods have promise for detecting beta change • Most items do not exhibit beta change, regardless of detection method used • Pre-Then differences are not a reliable indicator of beta change (FPs & FNs) • Post-Then differences are generally on the same metric, but what they reflect is uncertain

Limitations • Small sample size (especially post-program) • Rater groups collapsed(insufficient N to examine self ratings alone) • Response categories collapsed (results may not generalize to uncollapsed 9-pt. scale) • Experiment-wise Type I error rate > .05

Exploring Retrospective Measurement for Beta Change Detection