1 / 53

IRT Fixed Parameter Calibration and Maintaining Item Parameters on Common Scale

This presentation discusses the nature of IRT ability scale, approaches to maintaining item parameters on a common scale, and the use of fixed parameter calibration. It includes applications of fixed parameter calibration and discusses the need for a fixed common ability scale.

pbower
Download Presentation

IRT Fixed Parameter Calibration and Maintaining Item Parameters on Common Scale

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IRT Fixed Parameter Calibration and Other Approaches to Maintaining Item Parameters on a Common Ability Scale Seonghoon Kim, PhD Keimyung University Email: seonghoonkim@empal.com Presented at Measured Progress on July 10, 2008

  2. Overview • I. Nature of IRT Ability Scale • II. Three Approaches to Maintaining Item Parameters on a Common Scale • III. Principle of Fixed Parameter Calibration (FPC) • IV. Use of Computer Programs for FPC • V. Applications of FPC for Scaling and Equating

  3. Reference Guide • This presentation was prepared based on my articles, • Kim, S. (2006a). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43 (4), 355-381. • Kim, S. (2006b). A study on IRT fixed parameter calibration methods using BILOG-MG. Journal of Educational Evaluation, 19 (1), 323-342. • Kim, S., & Kolen, M. J. (2006). Robustness to format effects of IRT linking methods for mixed-format tests. Applied Measurement in Education, 19 (4), 357-381. • Kim, S., & Lee, W. (2006). An extension of four IRT linking methods for mixed-format tests. Journal of Educational Measurement, 43 (1), 53-76. • Kim, S., & Kolen, M. J. (2007). Effects on scale linking of different definitions of criterion functions for the IRT characteristic curve methods. Journal of Educational and Behavioral Statistics, 32 (4), 371-397. • and my recent thoughts and works on FPC

  4. I. Nature of IRT Ability ScaleIndeterminacy in IRT modeling • Item response function (IRF) and metrics • Two-parameter logistic (2PL) model • IRF = P(θ | a, b) = 1/[1+exp(-Da(θ-b))] • Suppose that θO = AθN + B • If aO = aN /A and bO = A bN + B,P(θO | aO, bO) = P(θN | aN, bN)Therefore, IRF and item parameters are invariant conditional on linear transformation • Thus, in practice, either θO or θN can be used, which means scale indeterminacy.

  5. I. Nature of IRT Ability Scale“0, 1” Scaling vs. Rasch Scaling • “0, 1” scaling • Scaling by arbitrarily assuming that the mean (M) and standard deviation (SD) of the ability distribution are equal to 0 (origin) and 1 (unit). • Such arbitrary but “standardized” fixing is unavoidable when the M and SD are unknown. • Rasch scaling • Setting the origin (0) of the scale at the average difficulty of all items involved, while fixing the unit at 1. • The fixed unit is guaranteed by the Rasch modeling.

  6. I. Nature of IRT Ability ScaleNeed for a Fixed Common Ability Scale • A fixed common scale should be used across test administrations for several reasons • To check the invariance property of item parameters • To achieve comparability between item parameters from different administrations • To develop an item pool • To conduct IRT equating

  7. θO θN1 θN3 θN2 I. Nature of IRT Ability ScaleNeed for a Fixed Common Ability Scale • To develop a common ability scale requires all new scales to be linked to the fixed old scale θO.

  8. I. Nature of IRT Ability ScaleFactors for Development of a Common Scale • Development of a fixed common scale is subject to • Data collection design for IRT scaling and equating test forms • Random groups design vs. Common-item nonequivalent groups design • Scaling convention • “0,1” scaling vs. Rasch scaling • Item parameter estimation method • Marginal maximum likelihood (MML) estimation vs. Joint maximum likelihood (JML) estimation

  9. The ContextAssumed in This Presentation • Data collection design for IRT scaling and equating test forms • Common-item nonequivalent groups (CING) design • Anchor items (i.e., common items) link two test forms • Scaling convention • “0, 1” scaling • Group dependent • In a random groups design, two “0, 1” scales from alternative forms may be considered equivalent. • Marginal Maximum Likelihood (MML) Estimation • Estimation of Item parameters • Estimation of Underlying Ability Distribution • Quadrature weights are estimated at quadrature points.

  10. Old Form Unique Itemsto Old Group (1) Items Old Form (Group 1) Common Items (Anchor)to Old and New Groups New Form (Group 2) New Form Unique Itemsto New Group (2) Data Structure Illustration for the CING Design

  11. II. Three Approaches to Maintaining a Common Scale • Separate calibration by form and linking • Estimate transformation coefficients A and B using two sets of item parameter estimates for the anchor items • Use A and B to transform new form item parameter estimates into those on the old scale • Fixed parameter calibration (FPC) • Holding the old form anchor item parameters fixed and estimating the new form non-anchor items • Concurrent calibration (aka multiple-group estimation) • Combining new and old form data and estimating both all item parameters and underlying ability distributions, with the old group being designated as the reference-scale group • Will not be addressed in details in this presentation

  12. -1 0 1 A B -1 0 1 II. Maintaining the Old ScaleSeparate Calibration by Form and Linking • “0, 1” scales from two test forms • Old form scale: θO (reference) • New form scale: θN (arbitrary) • Scheme of linking two “0, 1” scales • θO = AθN + B θN(arbitrary origin & unit) θO(fixed origin & unit)

  13. II. Maintaining the Old ScaleSeparate Calibration by Form and Linking • Linking ability scales is completed by placing all item parameters from separate calibrations onto the fixed old scale. • In the case of the 2PL model, given A and B, aN and bN parameters from a new scale are transformed into a* = aN /A and b* = A bN + B • In practice, A and B are estimated with item parameter estimates from the old and new scales. • Mean-Sigma Method (Marco, 1977) • Mean-Mean Method (Loyd & Hoover, 1980) • Haebara Method (Haebara, 1980) • Stocking-Lord Method (Stocking & Lord, 1983)

  14. II. Maintaining the Old ScaleComparative Performance • Suppose that the characteristic curve (Haebara or Stocking-Lord) method is employed as a linking method for the “separate calibration and linking” approach. • The performance of the three alternative approaches to maintaining the old scale is differential depending on whether the new form items are common or not (Hanson & Béguin, 2002; Kim, 2006b; Kim & Kolen, in process). • For the common items, concurrent calibration would perform best, due mainly to larger sample size (new group + old group), compared to the non-common items. • For the non-common items, the three approaches would perform almost equally.

  15. II. Maintaining the Old ScaleComparative Performance

  16. II. Maintaining the Old ScaleWhen is FPC most appropriate? • When using the “stable” old form anchor item parameters to obtain or diagnose the parameters of new form non-anchor items on the fixed old scale • Note • Placing the parameters of new form non-anchor items on the old scale is the focus. • Updating of the old form item parameters is not concerned at all. • The old form anchor items are assumed to have stable parameter estimates because a large sample was used for obtaining them.

  17. III. Principle of FPCBasics • Why • To place the parameters of new form non-anchor items onto the fixed old scale • How • Holding the old form anchor item parameters fixed and estimating the new form non-anchor items • Critical Process • Estimating the underlying distribution of ability for the new form on the fixed old scale so that the new item parameters may be properly expressed on the old scale. • By the IRT modeling, the underlying distribution can be estimated using both the new form data and the fixed anchor item parameters.

  18. 1st Initial Prior Fixinga1O, b1O, a2O, b2O, … 1st Est. Ability Dist.= 2nd Initial Prior θO 2nd Est. Ability Dist.= 3rd Initial Prior a1N b1N … bJN θO FinalEst. Ability Dist. EM Iterations a1N b1N … bJN θO Estimated New Item Parameters on the θO Scale a1N b1N … bJN III. Principle of FPCSchematic Illustration of Updating Priors and Underlying Distributions of Ability

  19. Likelihood Function for Estimating New Form Non-Anchor Item Parameters(Iteration s, quadrature point k, person i, data y, parameters Δ) Closed-Form Formula for Estimating Quadrature Weights of the Underlying Ability Distribution from the New Form Data III. Principle of FPCNumerical Expression: Multiple Prior Weights Updating and Multiple EM Cycles (MWU-MEM) Refer to Kim (2006a) for numerical details.

  20. III. Principle of FPCSummary of Key Points • The values of the fixed anchor item parameters are expressed on the fixed old scale, so the origin and unit of the ability scale for the new form data have been already set. That is, we do not need to use “0, 1” scaling for the new form data. • New form non-anchor item parameters should be estimated using the new form underlying distribution that is properly recovered on the fixed old scale. • As with ability estimates, the underlying distribution can be estimated using the new form data and the fixed anchor item parameters. • Fixing the anchor item parameters pulls the underlying distribution onto the old scale gradually. Accordingly, the new form item parameters are also pulled onto the old scale.

  21. III. Principle of FPCConcerns about the Unstable Estimates of Anchor Item Parameters • Unstable estimates of the fixed item parameters might adversely affect the performance of FPC. • However, Kim (2006a) showed that FPC is robust to sampling errors of the fixed item parameter estimates in calibrating non-anchor items. • This seems to be because the new form data collaborate with the fixed item parameters in “revealing” the old scale. • In other words, as long as the sample size of the new group is large enough, unstable estimates of the fixed item parameters would not much affect the proper estimation of both the underlying distribution for the new group and the non-anchor item parameters.

  22. III. Principle of FPCTwo Alternatives to the MWU-MEM Method • Some computer programs, such as BILOG-MG, do not update the prior quadratureweights during EM cycles when conducting FPC. • The resulting posterior (quadrature) weights would not properly represent the underlying ability distribution for the new form data. • Two ad-hoc methods can be used to obtain good estimates of the quadrature weights for the underlying distribution. • Simple Transformation Prior Update (STPU) Method • Iterative-Run Prior Update (IRPU) Method

  23. III. Principle of FPCTwo Alternatives to the MWU-MEM Method • Simple Transformation Prior Update (STPU) Method • Uses A and B from a linking method to simply update the prior ability distribution by transforming the posterior distribution from the regular, separate calibration with the new form. Then, conduct FPC with the updated prior ability distribution. • Iterative-Run Prior Update (IRPU) Method • Uses iteratively updated prior ability distributions through multiple FPC runs of BILOG-MG. An estimated posterior distribution in a calibration run is used as a prior distribution in the next calibration until the sequential procedure minimizes the difference between the two distributions.

  24. III. Principle of FPCTwo Alternatives to the MWU-MEM Method • Kim (2006b) shows that the two ad hoc methods for updating the prior ability distribution work very well. • In recovering the parameters of non-anchor items, the two methods perform almost equally to the Stocking-Lord linking method and concurrent calibration. • In practice, the STPU method may be preferred due to simplicity. • The IRPU method has the same feature as the MWU-MEM method, except for multiple runs of FPC. Thus, theoretically, the IRPU method may be more acceptable than the STPU method.

  25. III. Principle of FPCCaveats against Using “Constrained” Estimation for FPC • Someone might think that imposing strong Bayesian priors on the fixed item parameters and freeing the non-anchor item parameters would function similarly to FPC. • A rationale for such constrained estimation can be found in, for example, the BILOG (Mislevy & Bock, 1990) manual. • In theory, it sounds reasonable. • But, my experiences suggest that using strong priors to fix the anchor item parameters tends to distort the non-fixed item parameters.

  26. III. Principle of FPCCaveats against Using “Constrained” Estimation for FPC • Note that in constrained estimation the anchor item parameters are to be estimated (although almost fixed), while in FPC they are excluded from the parameter list to be estimated. • Without a facility to update ability prior weights, both the underlying distribution and non-anchor item parameters would be distorted.

  27. IV. Use of Computer Programs for FPC • BILOG-MG 7.0 (Zimowski et al., 2003) • The “FIX” option does not function properly because the prior weights are not updated during EM cycles (Kim, 2006a). • The STPU or IRPU method can be used. • PARSCALE 4.1 (Muraki & Bock, 2003) • For FPC to work properly, the “POSTERIROR” option should be used (Kim, 2006a). • Without the “POSTERIOR” option, the STPU or IRPU method can be used.

  28. IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG • Data • 3,000 examinees for the new form data • The data were obtained by simulating examinees from Normal (1, 1) distribution, against the old group of N(0, 1) distribution. • 25-item multiple-choice (MC) test • FPC • First 20 items fixed (item parameters are ready for use) • Last 5 items freed • The three-parameter logistic (3PL) model is used for item analyses. • Comparison of Default, STPU, and IRPU FPC methods

  29. IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG Command File (to Use the Default FPC Facility) Default FPC with BILOG-MG The examinee group (2) was sampled from N(1,1) >COMMENT Fixed-parameter calibration >GLOBAL DFNAME=‘New.txt', PRNAME='Sample.PRM', NPARM=3, SAVE; >SAVE PAR='itempar'; >LENGTH NITEMS=25; >INPUT NTOT=25, SAMPLE=3000, NALT=5, NID=4; >ITEMS INUM=(1(1)25), INAMES=(O01(1)O20, P01, P02, P03, P04, P05); >TEST TNAME=G2_FIX, INUM=(1(1)25), FIX=(1(0)20, 0(0)5); (4A1, T1, 25A1) >CALIB NQPT=31, CYCLE=100, CRIT=0.001, NEWTON=1,NOADJUST; >SCORE NOPRINT;

  30. IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG Data File (New.txt) 1111111111111111110111111 1111110100111111110011100 1111101000111111010111111 1111110111111111111111111 1111111110111111011011111 1111111110111011101011111 0111100100000100001001111 0110011110111111010011111 1111111111111111111111111 0111101111111011110011111 1111111111111110111111111 1111010111111111111011111 1111111110011111100011110 1111111111111111111111111 . . . . . . . . . . . . . . . . . . . . . . . . Item Responses for Anchor Items

  31. a b c IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG No. of Fixed Items Fixed Parameter File (Sample.PRM) 20 01 0.48877 -1.76191 0.18850 02 0.78980 -1.51222 0.18301 03 0.86113 -1.46012 0.17266 04 0.59502 -1.07553 0.20835 05 0.81096 -0.79854 0.20981 06 0.84988 -0.62070 0.12481 07 0.59386 -0.30609 0.17302 08 0.79144 -0.07422 0.23463 09 0.51684 0.48596 0.20394 10 0.90287 1.19854 0.16761 11 0.50175 -2.00058 0.21263 12 0.81267 -1.53418 0.15649 13 1.16172 -1.22405 0.13872 14 0.52306 -1.01148 0.18519 15 0.74785 -0.84378 0.20893 16 0.77883 -0.68332 0.19013 17 0.88805 -0.41610 0.18126 18 0.90752 0.08592 0.17534 19 0.62818 0.65946 0.26229 20 0.85275 1.82052 0.13813

  32. IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG Command File for the STPU Method (Before Transformation) Single Group “0, 1” Scaling, Although the examinee group was sampled from N(1,1). >COMMENT STPU FPC before Transformation of Ability Points >GLOBAL DFNAME='New.txt', NPARM=3, SAVE; >SAVE PAR='sampleSim01.PAR'; >LENGTH NITEMS=25; >INPUT NTOT=25, SAMPLE=3000, NALT=5, NID=4; >ITEMS INUM=(1(1)25), INAMES=(O01(1)O20, P01, P02, P03, P04, P05); >TEST TNAME=NO_FIX; (4A1, T1, 25A1) >CALIB NQPT=31, CYCLE=100, CRIT=0.001, NEWTON=0, IDIST=0; >SCORE NOPRINT;

  33. IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG Posterior Distribution from “0, 1” Scaling for the STPU Method QUADRATURE POINTS, POSTERIOR WEIGHTS, MEAN AND S.D.: 1 2 3 4 5 POINT -0.4036E+01 -0.3767E+01 -0.3498E+01 -0.3229E+01 -0.2960E+01 POSTERIOR 0.2163E-04 0.7268E-04 0.2169E-03 0.5802E-03 0.1392E-02 6 7 8 9 10 POINT -0.2691E+01 -0.2422E+01 -0.2153E+01 -0.1884E+01 -0.1615E+01 POSTERIOR 0.3030E-02 0.6054E-02 0.1104E-01 0.1842E-01 0.2878E-01 11 12 13 14 15 POINT -0.1346E+01 -0.1076E+01 -0.8074E+00 -0.5384E+00 -0.2693E+00 POSTERIOR 0.4281E-01 0.5985E-01 0.7752E-01 0.9294E-01 0.1036E+00 16 17 18 19 20 POINT -0.2361E-03 0.2688E+00 0.5379E+00 0.8069E+00 0.1076E+01 POSTERIOR 0.1074E+00 0.1034E+00 0.9265E-01 0.7725E-01 0.6001E-01 21 22 23 24 25 POINT 0.1345E+01 0.1614E+01 0.1883E+01 0.2152E+01 0.2421E+01 POSTERIOR 0.4343E-01 0.2927E-01 0.1837E-01 0.1073E-01 0.5841E-02 26 27 28 29 30 POINT 0.2690E+01 0.2959E+01 0.3228E+01 0.3498E+01 0.3767E+01 POSTERIOR 0.2957E-02 0.1399E-02 0.6105E-03 0.2514E-03 0.9631E-04 31 POINT 0.4036E+01 POSTERIOR 0.3212E-04 MEAN 0.00000 S.D. 1.00000

  34. IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG Command File for the STPU Method (After Transformation) STPU FPC with Transformed Prior Points The examinee group was sampled from N(1,1). Omitted (The same as the commands for before-transformation “0, 1” calibration) >TEST TNAME=G2_FIX, INUM=(1(1)25), FIX=(1(0)20, 0(0)5); (4A1, T1, 25A1) >CALIB NQPT=31, CYCLE=100, CRIT=0.001, NEWTON=0, IDIST=1, NOADJUST; >QUAD POINTS=( -3.1663E+000 -2.8864E+000 -2.6065E+000 -2.3266E+000 -2.0467E+000 -1.7668E+000 -1.4869E+000 -1.2070E+000 -9.2710E-001 -6.4720E-001 -3.6730E-001 -8.6352E-002 1.9314E-001 4.7304E-001 7.5305E-001 1.0330E+000 1.3130E+000 1.5930E+000 1.8729E+000 2.1529E+000 2.4328E+000 2.7127E+000 2.9926E+000 3.2725E+000 3.5524E+000 3.8323E+000 4.1122E+000 4.3921E+000 4.6731E+000 4.9530E+000 5.2329E+000), WEIGHTS=( 2.1630E-005 7.2680E-005 2.1690E-004 5.8020E-004 1.3920E-003 3.0300E-003 6.0540E-003 1.1040E-002 1.8420E-002 2.8780E-002 4.2810E-002 5.9850E-002 7.7520E-002 9.2940E-002 1.0360E-001 1.0740E-001 1.0340E-001 9.2650E-002 7.7250E-002 6.0010E-002 4.3430E-002 2.9270E-002 1.8370E-002 1.0730E-002 5.8410E-003 2.9570E-003 1.3990E-003 6.1050E-004 2.5140E-004 9.6310E-005 3.2120E-005); >SCORE NOPRINT; Rescaled points byθ* = Aθ+B, A = 1.040535 B = 1.033264 From “0, 1” Scaling (Not Transformed)

  35. IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG 2nd Command File for the IRPU Method IRPU FPC with Updated Prior Weights The examinee group was sampled from N(1,1). Omitted (The same as the commands for the default FPC run) >TEST TNAME=G2_FIX, INUM=(1(1)25), FIX=(1(0)20, 0(0)5); (4A1, T1, 25A1) >CALIB NQPT=31, CYCLE=100, CRIT=0.001, NEWTON=0, IDIST=1, NOADJUST; >QUAD POINTS=( -4.0000E+000 -3.7330E+000 -3.4670E+000 -3.2000E+000 -2.9330E+000 -2.6670E+000 -2.4000E+000 -2.1330E+000 -1.8670E+000 -1.6000E+000 -1.3330E+000 -1.0670E+000 -8.0000E-001 -5.3330E-001 -2.6670E-001 -7.7720E-016 2.6670E-001 5.3330E-001 8.0000E-001 1.0670E+000 1.3330E+000 1.6000E+000 1.8670E+000 2.1330E+000 2.4000E+000 2.6670E+000 2.9330E+000 3.2000E+000 3.4670E+000 3.7330E+000 4.0000E+000), WEIGHTS=( 8.8370E-007 3.0840E-006 1.0040E-005 3.1720E-005 9.4690E-005 2.5560E-004 6.3580E-004 1.4490E-003 3.0500E-003 6.0110E-003 1.1060E-002 1.8890E-002 3.0200E-002 4.5590E-002 6.4400E-002 8.4190E-002 1.0160E-001 1.1300E-001 1.1550E-001 1.0830E-001 9.2970E-002 7.3160E-002 5.2690E-002 3.4660E-002 2.0800E-002 1.1400E-002 5.7180E-003 2.6290E-003 1.1160E-003 4.3390E-004 1.5790E-004); >SCORE NOPRINT; Fixed Points (-4.0 to 4.0) Updated Weights (= Posterior weights from the 1st run of IRPU FPC)

  36. IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG History of Updated Posterior Distributions by the IRPU Method Iter# Mean Std. Dev. 0 0.000 1.000 1 0.699 0.923 2 0.876 0.921 3 0.933 0.932 4 0.954 0.943 5 0.963 0.951 6 0.967 0.956 7 0.969 0.960 8 0.971 0.963 9 0.972 0.965 10 0.973 0.966 11 0.973 0.967 12 0.974 0.968 From Default FPC Iterations stopped because the M and SD were not changed beyond the 0.001 limit

  37. IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG FPC Estimates of Non-Anchor Item Parameterson the Fixed Old Scale Mean/SigmaDefault FPC Item a b c a b c 21 0.591 -1.947 0.212 0.650 -1.994 0.214 22 0.831 -1.643 0.222 0.922 -1.699 0.230 23 1.027 -1.781 0.196 1.128 -1.850 0.198 24 0.566 -0.988 0.213 0.635 -1.089 0.220 25 0.605 -0.727 0.206 0.681 -0.847 0.216STPU FPC IRPU FPC Item a b c a b c 21 0.605 -1.909 0.210 0.624 -1.844 0.208 22 0.863 -1.587 0.222 0.887 -1.542 0.217 23 1.065 -1.723 0.196 1.100 -1.663 0.195 24 0.575 -0.991 0.209 0.594 -0.952 0.207 25 0.614 -0.729 0.205 0.637 -0.689 0.206

  38. IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG FPC Estimates of Mean and SD of the Underlying Distribution on the Fixed Old Scale Under-estimation Method Mean Std. Dev. Default FPC 0.699 0.923 STPU FPC 1.003 1.018 IRPU FPC 0.974 0.968 Mean-Sigma B = 1.033 A = 1.041 Note. The new group examinees were from a N(1,1) distribution that was expressed on the fixed old scale.

  39. IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE • Data • 3,000 examinees for the new form data • The data were obtained by simulating examinees from Normal (0.5, 1.22) distribution, against the old group of N(0, 1) distribution. • A mixed-format test of 15 MC items and 2 five-category constructed-response (CR) items • FPC • First 10 MC items fixed (item parameters are ready for use) • Last 5 MC and 2 CR items freed • The 3PL model for MC items and the generalized partial credit (GPC) model for CR items • Comparison of STPU and MWU-MEM methods

  40. IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE Command File (MWU-MEM FPC) MWU-MEM FPC with PARSCALE The examinee group was sampled from N(0.5, 1.2^2) >COMMENT 10 common items fixed and 2 CR items calibration >FILE DFNAME='new.txt', IFNAME='MC10FIX.IFN', SAVE; >SAVE PARM='MC10FIX'; >INPUT NTOT=17, TAKE=3000, NID=5, NTEST=1, LENGTH=17; (5A1, T1, 17A1) >TEST TNAME=I10FIX, ITEMS=(1(1)45), NBLOCK=17; >BLOCK BNAME=FIXED, NITEMS=1, NCAT=2, ORI=(0, 1), MOD=(1, 2), REP=10, SKIP; >BLOCK BNAME=FREEMC, NITEMS=1, NCAT=2, ORI=(0, 1), MOD=(1, 2), GPARM=0.2, GUESS=(2, EST), REP=5; >BLOCK BNAME=FREED, NITEMS=1, NCAT=5, ORI=(0,1,2,3,4), MOD=(1,2,3,4,5), REP=2; >CALIB NQPT=41, PAR, LOG, SCALE=1.7, CYCLE=200, NEWTON=0, FREE=(NOADJUST, NOADJUST), ESTORDER, SPRI, GPRI, POSTERIOR; >SCORE NOSCORE;

  41. Item Responses for CR Items Item Responses for Anchor Items IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE Data File (New.txt) 11111101111111132 11111111111111144 11111011001111032 11111111101111134 11111100011111031 11111111111111144 11110110000101113 11010100011111111 01111101001111144 00000101000100001 00011000001100100 11111101101111122 . . . . . . . . . . . . . . . . . . . . . . . . . . .

  42. IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE Command File to Prepare IFNAME File (MC10FIX.IFN) MWU-MEM FPC with PARSCALE No Fix, “0, 1” Scaling >COMMENT 10 common items fixed and 2 CR items calibration >FILE DFNAME='new.txt', SAVE; >SAVE PARM='MC10FIX'; >INPUT NTOT=17, TAKE=3000, NID=5, NTEST=1, LENGTH=17; (5A1, T1, 17A1) >TEST TNAME=I10FIX, ITEMS=(1(1)45), NBLOCK=17; >BLOCK BNAME=FIXED, NITEMS=1, NCAT=2, ORI=(0, 1), MOD=(1, 2), GPARM=0.2, GUESS=(2, EST), REP=10; >BLOCK BNAME=FREEMC, NITEMS=1, NCAT=2, ORI=(0, 1), MOD=(1, 2), GPARM=0.2, GUESS=(2, EST), REP=5; >BLOCK BNAME=FREED, NITEMS=1, NCAT=5, ORI=(0,1,2,3,4), MOD=(1,2,3,4,5), REP=2; >CALIB NQPT=41, PAR, LOG, SCALE=1.7, CYCLE=200, NEWTON=0, FREE=(NOADJUST, NOADJUST), ESTORDER, SPRI, GPRI, POSTERIOR; >SCORE NOSCORE; No IFNAME No SKIP

  43. IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE Item Parameter Output File from “0, 1” Scaling MWU-MEM FPC with PARSCALE No Fix, “0, 1” Scaling I10FIX 17 17 7 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 GROUP 01 FIXED 20001 0.94308 0.07058 -1.12375 0.14908 0.26134 0.07792 0.00000 0.00000 0.00000 0.00000 BLOCK 20002 0.98019 0.06877 -0.93880 0.12173 0.21813 0.06540 0.00000 0.00000 0.00000 0.00000 BLOCK 20003 1.18582 0.07723 -0.72689 0.08253 0.19030 0.04856 0.00000 0.00000 0.00000 0.00000 (Omitted) FREED 50016 1.16556 0.03437 -0.14845 0.01309 0.00000 0.00000 0.00000 1.25729 0.29044 -0.33537 -1.21236 0.00000 0.04262 0.03157 0.02902 0.03037 BLOCK 50017 1.42147 0.04095 -0.19171 0.01178 0.00000 0.00000 0.00000 1.29058 0.38858 -0.50917 -1.16999 0.00000 0.03895 0.02653 0.02434 0.02606

  44. Replaced with fixed a Replaced with fixed b Replaced with fixed c IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE Modified Item Parameter File (MC10FIX.IFN) MWU-MEM FPC with PARSCALE No Fix, “0, 1” Scaling I10FIX 17 17 7 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 GROUP 01 FIXED 20001 0.69300 0.00000 -1.50000 0.00000 0.12500 0.00000 0.00000 0.00000 0.00000 0.00000 BLOCK 20002 0.78600 0.00000 -1.00000 0.00000 0.18500 0.00000 0.00000 0.00000 0.00000 0.00000 BLOCK 20003 0.89700 0.00000 -0.60000 0.00000 0.23300 0.00000 0.00000 0.00000 0.00000 0.00000 (Omitted) FREED 50016 1.16556 0.03437 -0.14845 0.01309 0.00000 0.00000 0.00000 1.25729 0.29044 -0.33537 -1.21236 0.00000 0.04262 0.03157 0.02902 0.03037 BLOCK 50017 1.42147 0.04095 -0.19171 0.01178 0.00000 0.00000 0.00000 1.29058 0.38858 -0.50917 -1.16999 0.00000 0.03895 0.02653 0.02434 0.02606 Replacing for the 10 fixed items

  45. IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE Command File for the STPU Method (After Transformation) STPU FPC with Transformed Prior Points The examinee group was sampled from N(1,1). Omitted (The same as the commands for MWU-MEM >CALIB NQPT=31, PAR, LOG, SCALE=1.7, CYCLE=200, NEWTON=0, FREE=(NOADJUST, NOADJUST), ESTORDER, SPRI, GPRI, DIST=4, QPREAD; >QUADP POINTS=( -5.2976E+000 -4.9280E+000 -4.5598E+000 -4.1902E+000 -3.8206E+000 -3.4524E+000 -3.0828E+000 -2.7132E+000 -2.3450E+000 -1.9754E+000 -1.6059E+000 -1.2377E+000 -8.6808E-001 -4.9891E-001 -1.2988E-001 2.3929E-001 6.0846E-001 9.7749E-001 1.3467E+000 1.7162E+000 2.0844E+000 2.4540E+000 2.8236E+000 3.1918E+000 3.5614E+000 3.9310E+000 4.2992E+000 4.6688E+000 5.0384E+000 5.4066E+000 5.7761E+000), WEIGHTS=( 1.2430E-005 3.4290E-005 8.7330E-005 2.0480E-004 4.4420E-004 9.1150E-004 1.8720E-003 4.1960E-003 1.0550E-002 2.5160E-002 4.2780E-002 5.0290E-002 5.8510E-002 8.6110E-002 9.9290E-002 8.6880E-002 9.7990E-002 1.0840E-001 9.0140E-002 7.7730E-002 6.4860E-002 4.3230E-002 2.4440E-002 1.3010E-002 6.7400E-003 3.3710E-003 1.6010E-003 7.1410E-004 2.9710E-004 1.1500E-004 4.1380E-005); >SCORE NOSCORE; Rescaled points byθ* = Aθ+B, A = 1.38 B = 0.24 From “0, 1” Scaling (Not Transformed)

  46. IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE FPC Estimates of Non-Anchor Item Parameterson the Fixed Old Scale STPU Method a b c Item c2 c3 c4 c5 11 0.741 -1.361 0.194 12 0.767 -0.995 0.238 13 0.741 -0.906 0.185 14 0.942 -0.442 0.140 15 1.181 -0.113 0.234 16 0.920 0.025 -1.569 -0.343 0.449 1.562 17 1.120 -0.031 -1.667 -0.522 0.615 1.452 MWU-MEM Method a b c Item c2 c3 c4 c5 11 0.741 -1.361 0.194 12 0.768 -0.994 0.238 13 0.741 -0.908 0.184 14 0.942 -0.444 0.139 15 1.180 -0.113 0.234 16 0.921 0.025 -1.568 -0.342 0.450 1.561 17 1.120 -0.030 -1.666 -0.522 0.615 1.454

  47. IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG FPC Estimates of Mean and SD of the Underlying Distribution on the Fixed Old Scale Method Mean Std. Dev. STPU FPC 0.460 1.242 MWU-MEM FPC 0.456 1.227 Mean-Sigma B = 0.239 A = 1.384 Note. The new group examinees were from a N(0.5,1.22) distribution that was expressed on the fixed old scale. Over-estimation Under-estimation

  48. V. Applications of FPC for Scaling and Equating • Online Calibration in Computerized Adaptive Testing (CAT) • Calibration of Pretest Items on the Fixed Operational Scale in Regular, Non-CAT Administration • In a Mixed-Format Test, Separate Calibration of CR Items from MC Items To Minimize Effects of Bad CR Items on MC Item Calibration • Equating Test Forms in the CING Design

  49. V. Applications of FPCOnline Calibration in CAT • In CAT, different sets of operational items are adaptively administered to examinees, with pretest items “seeded” in a certain common block of examinee groups. • Because the operational items were already calibrated, their parameters are known in CAT • Thus, FPC may be the best way to calibrate and diagnose the pretest items on the scale of the operational items, without affecting the operational item parameters.

  50. V. Applications of FPCCalibration of Pretest Items on the Fixed Operational Scale • To develop test forms, pretest items are often administered together with operational items to examinees. • However, it would be wise to calibrate operational items separately from pretest items, because the operational item parameters could be contaminated by bad pretest items. • In this case, the ability distribution that is estimated using only the operational items can be reasonably used as the prior ability distribution for FPC with the pretest items, while the operational item parameters are used to fix the operational items in the FPC.

More Related