IRT basics: Theory and parameter estimation

IRT basics: Theory and parameter estimation Wayne C. Lee, David Chuah, Patrick Wadlington, Steve Stark, & Sasha Chernyshenko

On-line! Overview • How do I begin a set of IRT analyses? • What do I need? • Software • Data • What do I do? • Input/ syntax files • Examination of output

“Eye-ARE-What?” • Item response theory (IRT) • Set of probabilistic models that… • Describes the relationship between a respondent’s magnitude on a construct (a.k.a. latent trait; e.g., extraversion, cognitive ability, affective commitment)… • To his or her probability of a particular response to an individual item

But what does that buy you? • Provides more information than classical test theory (CTT) • Classical test statistics depend on the set of items and sample examined • IRT modeling not dependent on sample examined • Can examine item bias/ measurement equivalence and provide conditional standard errors of measurement

Before we begin… • Data preparation • Raw data must be recoded if necessary (negatively worded items must be reverse coded such that all items in the scale indicate a positive direction) • Dichotomization (optional) • Reducing multiple options into two separate values (0, 1; right, wrong)

Calibration and validation files • Data is split into two separate files • Calibration sample for estimating IRT parameters • Validation sample for assessing the fit of the model to the data • Data files for the programs that we will be discussing must be in ASCII/ text format

Investigating dimensionality • The models presented make a common assumption of unidimensionality • Hattie (1985) reviewed 30 techniques • Some propose the ratio of the 1st eigenvalue to the 2nd eigenvalue (Lord, 1980) • On-line we describe how to examine the eigenvalues following Principal Axis Factoring (PAF)

PAF and scree plots • If the data are dichotomous, factor analyze tetrachoric correlations • Assume continuum underlies item responses Dominant first factor

Two models presented • The Three Parameter Logistic model (3PL) • For dichotomous data • E.g., cognitive ability tests • Samejima's Graded Response model • For polytomous data where options are ordered along a continuum • E.g., Likert scales Common models among applied psychologists

The 3PL model • Three parameters: • a = item discrimination • b = item extremity/ difficulty • c = lower asymptote, “pseudo-guessing” • Theta refers to the latent trait

Small “a,” poor discrimination Effect of the “a” parameter

Larger “a,” better discrimination Effect of the “a” parameter

Low “b,” “easy item” Effect of the “b” parameter

Higher “b,” more difficult item Effect of the “b” parameter “b” inversely proportional to CTT p

c=0, asymptote at zero Effect of the “c” parameter

“low ability” respondents may endorse correct response Effect of the “c” parameter

Estimating 3PL parameters • DOS version of BILOG (Scientific Software) • Multiple files in directory, but small size overall • Easier to estimate parameters for a large number of scales or experimental groups • Data file must be saved as ASCII text • ID number • Individual responses • Input file (ASCII text)

Title line BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT;

Data File Name Parameters File for missing Characters in ID field BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT;

Requested files for: Scoring, Parameters, Covariances BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT;

Number of items Sample size BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT;

FORTRAN statement for reading data Name of scale/ measure BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT;

BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Estimation specifications (not the default for BILOG)

Scoring: Maximum likelihood, no prior distribution of scale scores, no rescaling BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT;

Phase one output file (*.PH1) CLASSICAL ITEM STATISTICS FOR SUBTEST AGR NUMBER NUMBER ITEM*TEST CORRELATION ITEM NAME TRIED RIGHT PERCENT LOGIT/1.7 PEARSON BISERIAL --------------------------------------------------------------------- 1 0001 1500.0 1158.0 0.772 0.72 0.535 0.742 2 0002 1500.0 991.0 0.661 0.39 0.421 0.545 3 0003 1500.0 1354.0 0.903 1.31 0.290 0.500 4 0004 1500.0 1187.0 0.791 0.78 0.518 0.733 5 0005 1500.0 970.0 0.647 0.36 0.566 0.728 6 0006 1500.0 1203.0 0.802 0.82 0.362 0.519 7 0007 1500.0 875.0 0.583 0.20 0.533 0.674 8 0008 1500.0 810.0 0.540 0.09 0.473 0.594 9 0009 1500.0 1022.0 0.681 0.45 0.415 0.542 10 0010 1500.0 869.0 0.579 0.19 0.426 0.538 --------------------------------------------------------------------- Can indicate problems in parameter estimation

Check for convergence Phase two output file (*.PH2) CYCLE 12: LARGEST CHANGE = 0.00116 -2 LOG LIKELIHOOD = 15181.4541 CYCLE 13: LARGEST CHANGE = 0.00071 [FULL NEWTON STEP] -2 LOG LIKELIHOOD = 15181.2347 CYCLE 14: LARGEST CHANGE = 0.00066

Phase three output file (*.PH3) • Theta estimation • Scoring of individual respondents • Required for DTF analyses

“b” “c” “a” Parameter file (specified, *.PAR) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT 1 10 10 0001AGR 111 1.130784 1.533393 -0.737439 0.652148 0.147203 0.101834 0.185726 0.135455 0.078989 0.053688 0002AGR 211 0.360630 0.870309 -0.414371 1.149018 0.132796 0.087236 0.097709 0.098866 0.129000 0.054461 0003AGR 311 1.474175 0.743095 -1.983831 1.345723 0.197127 0.108974 0.084487 0.250499 0.153003 0.087578 0004AGR 411 1.196368 1.256263 -0.952323 0.796012 0.090901 0.087856 0.114710 0.123613 0.072684 0.042937 0005AGR 511 0.544388 1.403904 -0.387767 0.712300 0.056774 0.071490 0.133486 0.080438 0.067727 0.026086 0006AGR 611 0.892399 0.777440 -1.147869 1.286273 0.173882 0.093109 0.082096 0.152846 0.135828 0.075829 0007AGR 711 0.174395 1.369223 -0.127368 0.730341 0.088135 0.083777 0.159712 0.085084 0.085190 0.032376 (32X,2F12.6,12X,F12.6)

PARTO3PL output (*.3PL) 0001AGR 111 1.130784 1.533393 -0.737439 0.652148 0.147203 0002AGR 211 0.360630 0.870309 -0.414371 1.149018 0.132796 0003AGR 311 1.474175 0.743095 -1.983831 1.345723 0.197127 0004AGR 411 1.196368 1.256263 -0.952323 0.796012 0.090901 0005AGR 511 0.544388 1.403904 -0.387767 0.712300 0.056774 0006AGR 611 0.892399 0.777440 -1.147869 1.286273 0.173882 0007AGR 711 0.174395 1.369223 -0.127368 0.730341 0.088135 0008AGR 811 0.042231 0.979045 -0.043135 1.021403 0.056546 0009AGR 911 0.441586 0.839144 -0.526234 1.191691 0.129646 0010AGR 1011 0.104452 0.879683 -0.118738 1.136773 0.101087 a b c

Scoring and covariance files • Like the *.PAR file, specifically requested • *.COV - Provides parameters as well as the variances/covariances between the parameters • Necessary for DIF analyses • *.SCO - Provides ability score information for each respondent

Samejima's Graded Response model • Used when options are ordered along a continuum, as with Likert scales • v = response to the polytomously scored item i • k = particular option • a = discrimination parameter • b = extremity parameter

“High option” “Low option” Low discrimination (a=0.4) Sample SGR Plot

Better discrimination (a=2) Sample SGR Plot

Running MULTILOG • MULTILOG for DOS • Example with DOS batch file • INFORLOG with MULTILOG • INFORLOG is typically interactive • Process automated with batch file and an input file (described on-line) • *.IN1 (parameter estimation) • *.IN2 (scoring)

The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; 5 01234 1111111111 2222222222 3333333333 4444444444 5555555555 (4A1,10A1) Title line

The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; 5 01234 1111111111 2222222222 3333333333 4444444444 5555555555 (4A1,10A1) Number of items, examinees, characters in the ID field, single group

SGR model Number of options for each item The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; 5 01234 1111111111 2222222222 3333333333 4444444444 5555555555 (4A1,10A1)

The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; 5 01234 1111111111 2222222222 3333333333 4444444444 5555555555 (4A1,10A1) Number of cycles for estimation End of command syntax

The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; 5 01234 1111111111 2222222222 3333333333 4444444444 5555555555 (4A1,10A1) Five characters Denoting five options

The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; 5 01234 1111111111 2222222222 3333333333 4444444444 5555555555 (4A1,10A1) Recoding of options for MULTILOG

The second input file (*.IN2) SCORING AGREEABLENESS SCALE SGR MODEL >PRO SCORE IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >START; Y >SAVE; >END; 5 12345 1111111111 2222222222 3333333333 4444444444 5555555555 (4A1,10A1) Scoring Yes to INFORLOG (parameters in a separate file)

Running MULTILOG • Run the batch file • *.IN1  *.LS1 (*.lis file renamed as *.ls1) • ensure that the data were read in and the model specified correctly • also provides a report of the estimation procedure with the estimated item parameters • Things of note…

“a” includes a 1.7 scaling factor Frequencies for each option 0ITEM 1: 5 GRADED CATEGORIES P(#) ESTIMATE (S.E.) A 1 1.99 (0.12) B( 1) 2 -3.03 (0.18) B( 2) 3 -2.35 (0.11) B( 3) 4 -0.98 (0.06) B( 4) 5 2.01 (0.10) 0 @THETA: -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 I(THETA): 1.08 1.04 1.05 0.81 0.49 0.35 0.47 0.79 0.99 0 OBSERVED AND EXPECTED COUNTS/PROPORTIONS IN CATEGORY(K): 1 2 3 4 5 OBS. FREQ. 21 44 277 1050 108 OBS. PROP. 0.01 0.03 0.18 0.70 0.07 EXP. PROP. 0.01 0.03 0.19 0.70 0.07 Collapsing options

Scoring output • *.IN2  *.LS2 • Last portion of the file contains the person parameters (estimated theta, standard error, the number of iterations used, and the respondent's ID number).

What now? • Review • Data requirements for IRT • Two models: 3PL (dichotomous), SGR (polytomous), more on-line! • MODFIT • Can plot IRF’s, ORF’s • Model-data fit: Input parameters, validation sample

IRT basics: Theory and parameter estimation