1 / 40

DIF detection using OLR

DIF detection using OLR. Paul K. Crane, MD MPH Internal Medicine University of Washington. Outline. Statistical background DIFdetect package What do we do when we find DIF? DIF adjustments to PARSCALE code How good are adjusted scores? Discussion. Statistical background.

kboswell
Download Presentation

DIF detection using OLR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DIF detection using OLR Paul K. Crane, MD MPH Internal Medicine University of Washington

  2. Outline • Statistical background • DIFdetect package • What do we do when we find DIF? • DIF adjustments to PARSCALE code • How good are adjusted scores? • Discussion

  3. Statistical background • Recall definition of DIF: when demographic characteristic(s) interfere with relationship expected between ability level and responses to an item • A conditional definition; have to control for ability level, or else we can’t differentiate between DIF and differential test impact

  4. Logistic regression applied to DIF detection • Swaminathan and Rogers (1990) • Tested two models: • P(Y=1|X, group)=f(β1X+β2*group+β3*X*group) • P(Y=1|X)=f(β1X) • Compared the –2 log likelihoods of these two models to a chi squared distribution with 2 df • Uniform and non-uniform tested at same time

  5. Camilli and Shepard (1994) • Recommended a two step procedure, to first test for non-uniform DIF and then for uniform DIF • P(Y=1|X, group)=f(β1X+β2*group+β3*X*group) • P(Y=1|X, group)= f(β1X+β2*group) • P(Y=1|X)=f(β1X) • -2 log likelihoods of each pair of models compared to determine non-uniform DIF and uniform DIF in two separate steps

  6. Millsap and Everson (1994) • Dismissive of “observed score” techniques such as logistic regression • X contains several items that have DIF, so adjusting for X is theoretically problematic • Advocated latent approaches such as IRT for DIF detection • Very influential publication

  7. Zumbo (1999) • Extended Swaminathan and Rogers framework to ordinal logistic regression case to handle polytomous items • Did not address latent trait; also used a single step rather than two steps

  8. Crane, van Belle, Larson (2004) • Pointed out that logistic regression model is a re-parameterization of the IRT model as long as IRT-derived θ estimates are used as ability scores • Addressed multiple hypothesis testing of non-uniform DIF; no difference between four different techniques of adjusting

  9. Crane et al. (2004) – 2 • Biggest change in terms of specific criteria for uniform DIF • Recognized that non-uniform and uniform DIF were analogous to effect modification and confounding • Employed epidemiological thinking about how to detect confounding relationships from the data

  10. Crane et al. (2004) – 3 • Same models used (though now θ not X) • P(Y=1|θ, group)= f(β1θ+β2*group) • P(Y=1|θ)=f(β1’θ) • Determine the impact of including the group term on the magnitude of the relationship between θ and item responses • Determine size of |(β1-β1’)/β1|. If this is large, uniform DIF (confounding) is present • Maldonado and Greenland simulation study on confounder selection strategies

  11. Work still pending • “Optimal” criteria for uniform and non-uniform DIF are unknown • Adjust α for multiple hypotheses? How many multiple hypotheses? • Effect size for non-uniform DIF? In huge data sets, likely to have a significant interaction term • What proportional change in β1 is significant UDIF?

  12. DIFdetect package • Can download from the web • www.alz.washington.edu/DIFDETECT/welcome.html • STATA-based user friendly package

  13. Outline revisited • Statistical background • DIFdetect package • What do we do when we find DIF? • DIF adjustments to PARSCALE code • How good are adjusted scores? • Discussion

  14. What to do when we find DIF? • Educational settings often items with DIF are discarded • Unattractive option for us • Tests are too short as it is; lose variation • Lose precision • DIF doesn’t mean that the item doesn’t measure the underlying construct at all, just that it does so differently in different groups

  15. What do we do – 2 • Need a technique to incorporate items found to have DIF differently than DIF-free items • Precedent for this approach in Reise, Widaman, and Pugh (1993) • Constrain parameters for DIF-free items to be identical across groups • Estimate parameters for items found with DIF separately in appropriate groups

  16. Compensatory DIF • Compensatory DIF occurs when DIF in some items leads to erroneous findings in other items • Both false-positive and false-negative DIF findings • Iterative process for each covariate until stable solution is reached (i.e., same items identified with DIF on separate runs of DIFdetect)

  17. Adjustments to PARSCALE • Create a new dataset that treat items according to their DIF status

  18. 0001 12XX2 0002 12XX4 0003 01XX3 … 0132 1X2X2 0133 0X1X3 0134 1X2X4 … 0932 0XX22 0933 1XX23 0934 0XX14 … Modified data set

  19. PARSCALE code • Need new lines (new blocks) for all new items that we create • We are automating this step as an extension to DIFdetect • Current best advice is to use a huge table in Word • Creation of new items is easy; we have STATA code for creation of virtual items

  20. Preparation of data for PARSCALE

  21. Reminder of PARSCALE tips • When outfiling from STATA, use wide format • Use commas • Change missing values to .x • Open the file in Word and replace “.x” with X • Remember to change 2-digit numbers to their appropriate letters

  22. It gets complicated… • This is the CASI, first run of education DIF, after looking at gender and age :

  23. Table helps with PARSCALE code

  24. Adjusted scores related to dementia and CIND • In the ACT study, controlling for CASI score (continuous): odds ratio of 2.9 (1.8-4.9) for low DIF-adjusted IRT score (among those with low CASI scores) • Adjusted for gender, education, and age • Strict 2-stage sample design  verification bias • In the CSHA, controlling for 3MS score (continuous): weighted odds ratio of 1.6 (1.1-2.3) for dementia for low DIF-adjusted IRT score, and 1.4 (1.2-1.8) for CIND • Adjusted for education and language • Sampling and weighting to deal with verification bias

  25. Incorporation of adjusted scores into analyses • Here we are in novel territory • Is there a reason not to adjust scores for DIF? • Questions and comments

  26. Comparison of OLR with other techniques • OLR is more flexible (can look at continuous constructs, e.g., education, without dichotomizing or grouping) • DIFdetect is very fast • When using IRT-derived θ scores, a re-parameterization of IRT analyses • DIFdetect OLR incorporates epidemiology concepts of confounding and effect modification • Teresi (ed) special issue of Medical Care to come out

More Related