1 / 128

Bridging the gap from LogR to IRT

Bridging the gap from LogR to IRT. Indebted to: Wu, A. D., & Zumbo, B.D. (2007). Thinking About Item Response Theory from a Logistic Regression Perspective: A Focus on Polytomous Models.

palma
Download Presentation

Bridging the gap from LogR to IRT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bridging the gap from LogR to IRT Indebted to: Wu, A. D., & Zumbo, B.D. (2007). Thinking About Item Response Theory from a Logistic Regression Perspective: A Focus on Polytomous Models. In Shlomo S. Sawilowsky (Ed.), Real Data Analysis (pp. 241-269). Information Age Publishing, Inc.., Greenwich, CT..

  2. Bridging the gap from LogR to IRT • The explanatory variable • In IRT, the exposure is a cts latent variable • Hence IRT = generalized linear latent model • The outcome variable(s) • Logistic regression typically models ONE outcome, whereas IRT models a number of categorical outcomes simultaneously

  3. Aim of IRT • To relate a subjects’ responses to a number of test items, to an underlying ability (AKA trait) by way of a mathematical function • Due to the non-linear relationship, a logistic curve is often used, and is referred to as the Item Characteristic Curve or Item Response Function

  4. Increasing prob of correct response Increasing level of latent trait ICC / IRF

  5. Options for form of ICC Examples • Step function (Guttman) • 2 parameter normal ogive (Lord) • 2 parameter logistic (Birnbaum) • 1 parameter logistic (Rasch) • Nonparametric, monotone increasing (Mokken)

  6. Back to the logistic form • Two parameter binary logistic IRT model • θ: ability level • αi: the slope (AKA discrimination) for item i • βi: the threshold (AKA difficulty) for item I • (θ – βi): discrepancy between item & ability of respondent

  7. For a single item: Let X = (θ – βi) & add intercept c

  8. Recall: Slope Value of covariate (ability) at point of inflection

  9. So • For a uni-dimensional IRT model (a single trait θ) the 2PL IRT model is a simple LogR model • In the binary IRT setting we simultaneously model a number of items • The parameters for each item may/may-not vary across items

  10. Conditional Independence Item 1 Item 1 Item 2 Item 2 Item 3 Item 3 Trait Item 4 Item 4 Item 5 Item 5 Item 6 Item 6 Before After

  11. The Rasch model A worked example across multiple packages

  12. Abortion data Analysis of Multivariate Social Science Data, Second Edition David J. Bartholomew Fiona Steele Irini Moustaki Jane Galbraith Dataset actually comes from the first edition so hope it’s still in the second!!!

  13. Idea • Same Rasch model 4 ways • R (LTM) • Mplus • Raschtest • GLLAMM (via long format data-prepping)

  14. Table 7.1 – attitude towards abortion Abortion should be permitted if: 1] The woman decides on her own that she does not wish to have the child 2] The couple agree that they do not wish to have the child 3] The woman is not married and does not wish to marry the man 4] The couple cannot afford any more children

  15. Basic output SUMMARY OF CATEGORICAL DATA PROPORTIONS WOMAN Category 1 0.562 Category 2 0.438 COUPLE Category 1 0.406 Category 2 0.594 NOT_MARR Category 1 0.364 Category 2 0.636 AFFORD Category 1 0.383 Category 2 0.617

  16. Abortion: [R]

  17. rasch(data = abortion[, c(2, 3, 4, 5)], IRT.param = FALSE) > summary(rasch1) Model Summary: log.Lik AIC BIC -657.7894 1325.579 1345.078 Coefficients: value std.err z.vals woman -0.7843 0.2762 -2.8395 coupl 1.1288 0.2724 4.1437 nt.mr 1.7950 0.2969 6.0453 affrd 1.5129 0.2870 5.2716 z 4.9064 0.4264 11.5057 Integration: method: Gauss-Hermite quadrature points: 21 Optimization: Convergence: 0 max(|grad|): 0.00097 quasi-Newton: BFGS

  18. plot(rasch1, type = c("ICC"))

  19. par(mfrow = c(2, 2)) plot(rasch1, items = c(1), type = c("IIC"), ylim=c(0,7)) plot(rasch1, items = c(2), type = c("IIC"), ylim=c(0,7)) plot(rasch1, items = c(3), type = c("IIC"), ylim=c(0,7)) plot(rasch1, items = c(4), type = c("IIC"), ylim=c(0,7))

  20. margins(rasch2, "two") Response: (0,0) Item i Item j Obs Exp (O-E)^2/E 1 2 4 111 119.38 0.59 2 1 4 125 133.09 0.49 3 1 2 143 140.62 0.04 Response: (1,0) Item i Item j Obs Exp (O-E)^2/E 1 1 4 13 7.28 4.50 *** 2 1 2 5 9.57 2.18 3 2 4 27 20.99 1.72 Response: (0,1) Item i Item j Obs Exp (O-E)^2/E 1 2 4 37 30.81 1.24 2 1 4 80 72.37 0.80 3 3 4 19 21.27 0.24 Response: (1,1) Item i Item j Obs Exp (O-E)^2/E 1 1 4 147 152.26 0.18 2 1 2 155 149.97 0.17 3 3 4 208 203.36 0.11

  21. margins(rasch2, "three") Response: (0,0,0) Item i Item j Item k Obs Exp (O-E)^2/E 1 1 2 4 111 117.01 0.31 2 2 3 4 102 104.53 0.06 3 1 3 4 110 110.44 0.00 Response: (1,0,0) Item i Item j Item k Obs Exp (O-E)^2/E 1 1 2 4 0 2.37 2.37 2 1 2 3 0 2.04 2.04 3 2 3 4 10 7.63 0.74 Response: (0,1,0) Item i Item j Item k Obs Exp (O-E)^2/E 1 1 3 4 15 22.65 2.58 2 2 3 4 9 14.85 2.30 3 1 2 4 14 16.08 0.27 Response: (1,1,0) Item i Item j Item k Obs Exp (O-E)^2/E 1 1 2 4 13 4.90 13.38 *** 2 1 3 4 11 5.56 5.33 *** 3 2 3 4 17 13.36 0.99 Response: (0,0,1) Item i Item j Item k Obs Exp (O-E)^2/E 1 1 2 4 32 23.61 2.98 2 1 2 3 29 26.93 0.16 3 2 3 4 12 11.20 0.06 Response: (1,0,1) Item i Item j Item k Obs Exp (O-E)^2/E 1 1 3 4 2 4.19 1.15 2 2 3 4 7 10.07 0.94 3 1 2 3 5 7.53 0.85 Response: (0,1,1) Item i Item j Item k Obs Exp (O-E)^2/E 1 2 3 4 25 19.61 1.48 2 1 3 4 63 55.29 1.08 3 1 2 3 49 51.00 0.08 Response: (1,1,1) Item i Item j Item k Obs Exp (O-E)^2/E 1 1 2 3 151 146.10 0.16 2 1 2 4 142 145.07 0.06 3 1 3 4 145 148.07 0.06 '***' denotes a chi-squared residual greater than 3.5

  22. Abortion: [Mplus]

  23. Read the data into Mplus data: file is "abortion_attitude.txt"; variable: names are woman couple not_marr afford num; categorical are woman couple not_marr afford; usevariables are woman couple not_marr afford; freqweight = num; analysis: type = basic;

  24. Basic output – sample stats FIRST ORDER SAMPLE PROPORTIONS : WOMAN COUPLE NOT_MARR AFFORD ________ ________ ________ ________ 1 0.438 0.594 0.636 0.617 SECOND ORDER SAMPLE PROPORTIONS WOMAN COUPLE NOT_MARR AFFORD ________ ________ ________ ________ WOMAN COUPLE 0.420 NOT_MARR 0.420 0.538 AFFORD 0.396 0.512 0.559 SAMPLE THRESHOLDS WOMAN$1 COUPLE$1 NOT_MARR AFFORD$1 ________ ________ ________ ________ 1 0.156 -0.237 -0.347 -0.299

  25. Basic output – sample stats SAMPLE TETRACHORIC CORRELATIONS WOMAN COUPLE NOT_MARR AFFORD ________ ________ ________ ________ WOMAN COUPLE 0.902 NOT_MARR 0.866 0.882 AFFORD 0.768 0.821 0.903 STANDARD DEVIATIONS FOR SAMPLE TETRACHORIC CORRELATIONS WOMAN COUPLE NOT_MARR AFFORD ________ ________ ________ ________ WOMAN COUPLE 0.125 NOT_MARR 0.158 0.137 AFFORD 0.217 0.181 0.120

  26. Rasch model in Mplus data: file is “...abortion_attitude.txt"; variable: names are woman couple not_marr afford num; usevariables are woman couple not_marr afford; categorical are woman couple not_marr afford; freqweight = num; analysis: ESTIMATOR = MLR; model: F by woman* (1) couple (1) not_marr (1) afford (1); F@1; plot: type = plot3;

  27. Mplus results TESTS OF MODEL FIT Loglikelihood H0 Value -709.937 H0 Scaling Correction Factor 1.009 for MLR Information Criteria Number of Free Parameters 5 Akaike (AIC) 1429.874 Bayesian (BIC) 1449.562 Sample-Size Adjusted BIC 1433.698 (n* = (n + 2) / 24) Chi-Square Test of Model Fit for the Binary and Ordered Categorical (Ordinal) Outcomes Pearson Chi-Square Value 22.788 Degrees of Freedom 10 P-Value 0.0116 Likelihood Ratio Chi-Square Value 22.595 Degrees of Freedom 10 P-Value 0.0123

  28. Two-Tailed Estimate S.E. Est./S.E. P-Value F BY WOMAN 4.336 0.390 11.124 0.000 COUPLE 4.336 0.390 11.124 0.000 NOT_MARR 4.336 0.390 11.124 0.000 AFFORD 4.336 0.390 11.124 0.000 Thresholds WOMAN$1 0.776 0.311 2.496 0.013 COUPLE$1 -1.047 0.306 -3.417 0.001 NOT_MARR$1 -1.573 0.315 -4.994 0.000 AFFORD$1 -1.339 0.322 -4.161 0.000 Variance of F 1.000 0.000 999.000 999.000 IRT PARAMETERIZATION IN TWO-PARAMETER LOGISTIC METRIC WHERE THE LOGIT IS 1.7*DISCRIMINATION*(THETA - DIFFICULTY) Item Discriminations F BY WOMAN 2.551 0.229 11.124 0.000 COUPLE 2.551 0.229 11.124 0.000 NOT_MARR 2.551 0.229 11.124 0.000 AFFORD 2.551 0.229 11.124 0.000 Item Difficulties WOMAN$1 0.179 0.071 2.514 0.012 COUPLE$1 -0.241 0.072 -3.353 0.001 NOT_MARR$1 -0.363 0.074 -4.913 0.000 AFFORD$1 -0.309 0.074 -4.199 0.000 Variance of F 1.000 0.000 999.000 999.000 Mplus results

  29. ICC’s

  30. Abortion: [Stata] –Raschtest–

  31. raschtest woman couple not_marr afford, meandifficc Estimation method: Conditional maximum likelihood (CML) Number of items: 4 Number of groups: 5 (3 of them are used to compute the statistics of test) Number of individuals: 365 (0 individuals removed for missing values) Number of individuals with null or perfect score: 242 Conditional log-likelihood: -131.2562 Log-likelihood: -320.5403 Difficulty Standardized Items parameters Std. Err. R1c df p-value Outfit Infit U ----------------------------------------------------------------------------- woman 1.64747 0.19064 1.940 2 0.3790 -1.232 -0.422 -1.411 couple -0.19486 0.16979 2.342 2 0.3100 -0.574 -0.313 -0.838 not_marr -0.87046 0.18302 1.580 2 0.4538 -1.272 -1.467 -0.854 afford -0.58216 0.17588 3.937 2 0.1397 2.336 2.113 3.015 ----------------------------------------------------------------------------- R1c test R1c= 15.343 6 0.0177 Andersen LR test Z= 14.594 6 0.0237 ----------------------------------------------------------------------------- The mean of the difficulty parameters is fixed to 0 You have groups of scores with less than 30 individuals. The tests can be invalid. Ability Expected Group Score parameters Std. Err. Freq. Score ll -------------------------------------------------------------- 0 0 -2.560 2.860 102 0.38 -------------------------------------------------------------- 1 1 -1.114 0.814 29 1.15 -31.4221 -------------------------------------------------------------- 2 2 -0.109 0.664 33 1.97 -39.3159 -------------------------------------------------------------- 3 3 1.054 0.984 61 2.84 -53.2214 -------------------------------------------------------------- 4 4 2.833 3.626 140 3.66 --------------------------------------------------------------

  32. raschtest woman couple not_marr afford, method(mml) Estimation method: Marginal maximum likelihood (MML) Number of items: 4 Number of groups: 5 (5 of them are used to compute the statistics of test) Number of individuals: 365 (0 individuals removed for missing values) Number of individuals with null or perfect score: 242 Marginal log-likelihood: -665.8056 Log-likelihood: -281.5298 Difficulty Standardized Items parameters Std. Err. R1m df p-value Outfit Infit ---------------------------------------------------------------------- woman 1.25298 0.26213 4.606 2 0.0999 -2.624 0.164 couple -0.66034 0.30265 18.408 2 0.0001 . -3.567 not_marr -1.27117 0.29512 11.668 2 0.0029 . -0.046 afford -1.02314 0.29784 26.037 2 0.0000 . -0.611 ---------------------------------------------------------------------- R1m test R1m= 31.056 8 0.0001 ---------------------------------------------------------------------- Sigma 4.12109 0.28776 ---------------------------------------------------------------------- You have groups of scores with less than 30 individuals. The tests can be invalid. Ability Expected Group Score parameters Std. Err. Freq. Score --------------------------------------------------- 0 0 -4.33624 1.60486 102 0.11 --------------------------------------------------- 1 1 -2.24208 0.68050 29 0.70 --------------------------------------------------- 2 2 -1.29596 1.34362 33 1.34 --------------------------------------------------- 3 3 2.05542 0.97822 61 3.55 --------------------------------------------------- 4 4 3.91275 1.56707 140 3.91 ---------------------------------------------------

  33. Abortion: [Stata] –GLLAMM–

  34. Rasch model – data prepping Difficulty

  35. The data +---------------------------------------+ | woman couple not_marr afford num | |---------------------------------------| 1. | 1 1 1 1 141 | 2. | 0 0 0 0 103 | 3. | 0 1 1 1 44 | 4. | 0 0 1 1 21 | 5. | 0 0 0 1 13 | 6. | 1 1 1 0 12 | 7. | 0 0 1 0 10 | 8. | 0 1 0 0 9 | |---------------------------------------| 9. | 0 1 1 0 7 | 10. | 1 0 1 1 6 | 11. | 0 1 0 1 6 | 12. | 1 1 0 1 3 | 13. | 1 1 0 0 3 | 14. | 1 0 0 0 1 | 15. | 1 0 1 0 0 | 16. | 1 0 0 1 0 | +---------------------------------------+

  36. The data +---------------------------------------+ | woman couple not_marr afford num | |---------------------------------------| 1. | 1 1 1 1 141 | 2. | 0 0 0 0 103 | 3. | 0 1 1 1 44 | 4. | 0 0 1 1 21 | 5. | 0 0 0 1 13 | 6. | 1 1 1 0 12 | 7. | 0 0 1 0 10 | 8. | 0 1 0 0 9 | |---------------------------------------| 9. | 0 1 1 0 7 | 10. | 1 0 1 1 6 | 11. | 0 1 0 1 6 | 12. | 1 1 0 1 3 | 13. | 1 1 0 0 3 | 14. | 1 0 0 0 1 | 15. | 1 0 1 0 0 | 16. | 1 0 0 1 0 | +---------------------------------------+ +---------------------+ | i1 i2 i3 i4 num | |---------------------| 1. | 1 1 1 1 141 | 2. | 0 0 0 0 103 | 3. | 0 1 1 1 44 | 4. | 0 0 1 1 21 | 5. | 0 0 0 1 13 | 6. | 1 1 1 0 12 | 7. | 0 0 1 0 10 | 8. | 0 1 0 0 9 | |---------------------| 9. | 0 1 1 0 7 | 10. | 1 0 1 1 6 | 11. | 0 1 0 1 6 | 12. | 1 1 0 1 3 | 13. | 1 1 0 0 3 | 14. | 1 0 0 0 1 | 15. | 1 0 1 0 0 | 16. | 1 0 0 1 0 | +---------------------+

  37. Reshape +-----------------------------------+ | i1 i2 i3 i4 num pattern | |-----------------------------------| | 1 1 1 1 141 1 | | 0 0 0 0 103 2 | | 0 1 1 1 44 3 | | 0 0 1 1 21 4 | |-----------------------------------| +------------------------------+ | pattern item score num | |------------------------------| | 1 1 1 141 | | 1 2 1 141 | | 1 3 1 141 | | 1 4 1 141 | |------------------------------| | 2 1 0 103 | | 2 2 0 103 | | 2 3 0 103 | | 2 4 0 103 | |------------------------------| | 3 1 0 44 | | 3 2 1 44 | | 3 3 1 44 | | 3 4 1 44 | |------------------------------| gen pattern = _n reshape long i, i(pattern) j(item) rename i score

  38. Create dummies for the 4 items +------------------------------------------------------------------------------+ | pattern item score num d1 d2 d3 d4 negd1 negd2 negd3 negd4 | |------------------------------------------------------------------------------| | 1 1 1 141 1 0 0 0 -1 0 0 0 | | 1 2 1 141 0 1 0 0 0 -1 0 0 | | 1 3 1 141 0 0 1 0 0 0 -1 0 | | 1 4 1 141 0 0 0 1 0 0 0 -1 | |------------------------------------------------------------------------------| | 2 1 0 103 1 0 0 0 -1 0 0 0 | | 2 2 0 103 0 1 0 0 0 -1 0 0 | | 2 3 0 103 0 0 1 0 0 0 -1 0 | | 2 4 0 103 0 0 0 1 0 0 0 -1 | +------------------------------------------------------------------------------+ …etc. tab item, gen(d) forvalues i=1/4 { gen negd`i' = -d`i' }

  39. Relate this back to original equation +------------------------------------------------------------------------------+ | pattern item score num d1 d2 d3 d4 negd1 negd2 negd3 negd4 | |------------------------------------------------------------------------------| | 1 1 1 141 1 0 0 0 -1 0 0 0 | | 1 2 1 141 0 1 0 0 0 -1 0 0 | | 1 3 1 141 0 0 1 0 0 0 -1 0 | | 1 4 1 141 0 0 0 1 0 0 0 -1 | |------------------------------------------------------------------------------|

  40. Rasch model using GLLAMM • Rename num wt2 (shows that weighting applies at level-2 i.e. person level) • constraint def 1 [patt1]_cons = 1 • gllamm score negd1-negd4, i(pattern) /// weight(wt) /// link(logit) /// family(binomial) /// frload(1) constr(1) /// nip(15) nocons adapt trace

  41. Rasch results log likelihood = -852.8413222195184 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- negd1 | .333879 .1287913 2.59 0.010 .0814527 .5863054 negd2 | -.4827996 .1301057 -3.71 0.000 -.7378022 -.2277971 negd3 | -.7138122 .1321945 -5.40 0.000 -.9729087 -.4547158 negd4 | -.6117231 .131175 -4.66 0.000 -.8688214 -.3546249 ------------------------------------------------------------------------------ Variances and covariances of random effects ------------------------------------------------------------------------------ ***level 2 (pattern) var(1): 1 (0) ------------------------------------------------------------------------------ We constrained this to unit variance

  42. Plot ICC’s ---------------------------- | Coef. ------------------+--------- negd1 (woman) | .334 negd2 (couple) | -.483 negd3 (not_marr) | -.714 negd4 (afford) | -.612 ---------------------------- Curves are parallel First item is most “difficult” twoway(function Woman =invlogit(x-[score]negd1), range(-6 6)) /// (function Couple =invlogit(x-[score]negd2), range(-6 6) lpatt(".")) /// (function Not_married =invlogit(x-[score]negd3), range(-6 6) lpatt("-")) /// (function Afford =invlogit(x-[score]negd4), range(-6 6) lpatt("_"))

  43. GLLAMM versus raschtest • Raschtest • Avoids the need to derive dummy variables • Needs complete dataset, not frequency-weights • Reformats the dataset in the background so no need to do it yourself • Can employ CML (an estimation specific to Rasch models) which requires no integration

  44. Polytomous IRT

  45. Extension to polytomous IRT • We now have a hierarchy of parameters to model • At the test level • A number of items models simultaneously with the potential for parameters to vary across items • At the item level • Contrasts are used to model the response categories within each item. Parameters may/may-not vary across response categories.

  46. So when faced with a set of polytomous items We must decide • The payoff from not collapsing into binary items • The form of contrasts needed to model over response categories within items • Any constraints required across these response categories • Any parameter constraints across items within a single test

  47. 4 commonly used polytomous IRT models • Partial Credit model (PCM) • Masters (1982) • Rating Scale model (RSM) • Andrich (1978a/b) • Graded Response model (GRM) • Samejima (1969) • Nominal Response model (NRM) • Bock (1972)

  48. Partial Credit Model

  49. Partial Credit Model (PCM) • Designed for items where you can obtain a “partial credit”, e.g. 0 = solved nothing, 1 = solved part A, 2 = solved parts A and B • i.e. those who scored a ‘2’ can also be thought of as having achieved a ‘1’

More Related