A Different Way to Think About Measurement Development:

A Different Way to Think About Measurement Development: An Introduction to Item Response Theory (IRT) Joseph Olsen, Dean Busby, & Lena ChiuJan 23, 2015

Content • Introduction • Item Response Models and Outcomes • Software Packages • Demonstration • Additional Concepts • References and Resources

Introduction • IRT surfaced in the 1970s (originally called “latent trait models”) • Became popular in the 1980s, and was adapted in ability tests like SAT and GRE • Social scientists starting using IRT in the past decade. • How it works:

Classical Test Theory (CTT) versus IRT • Generally speaking, if you have continuous variables, you use CTT. When you have categorical (Dichotomous/polytoumous) variables, you use IRT. • In personality and attitude assessment we’re more likely to use CTT. But IRT provides advantages, including item characteristics and item information curves. • IRT provides more precise and accurate measures to model the latent trait. A well-build IRT model is more precise than CTT. (But you can mess up IRT just like you can mess up CTT) • IRT is trying to achieve reliable measurement across the whole trait continuum from the lowest to the highest. That is usually not a consideration for CTT analyses.

IRT Models and Outcomes • Item Difficulty: How difficult the item is. When conducting social science studies, sometimes people call it “item endorsability” . (Some items are more readily endorsed than others. You’re more likely to say yes or no on these items.) • Item Discrimination: How strongly related the response on the item is on the underlying latent trait, or how well the item discriminates among participants located at different points on the latent continuum. • Pseudo-change parameter: The probability of choosing a correct answer by chance.

IRT Models and Outcomes • 3 Parameter Logistic (3PL) model: A model that contains all 3 parameters. Not usually used in social science scales. • 2 Parameter Logistic (2PL) model: A model that estimates item difficulty and item discrimination (while pseudo-chance constrained to 0). • 1 Parameter Logistic (1PL) model: A model that measures only item difficulty (and holds item discrimination constant across all items, while pseudo-chance constrained to 0). We compare the model fit indices to decide which model is the most appropriate to use.

Example for Deciding between 2PL and 1PL Models • Syntax for Mplus 1PL model:Avoidance by T1v755* T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762 (1);Avoidance @ 1; • Syntax for Mplus 2 PL model:Avoidance by T1v755* T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762;Avoidance @ 1;

Software Packages - MPLUS • Mplus is capable of measuring basic IRT models. • See demonstrations later on in this presentation. • For more complex models, software designed just for IRT is required.

Software Packages - FlexMirt • https://flexmirt.vpgcentral.com/

Software Packages – IRT Pro • http://www.ssicentral.com/irt/

Demonstration: Graded Response Item Response Theory (IRT) Model for Avoidant Attachment Items

Sample and Measures • The Avoidant Attachment Scale in RELATE. • 6089 individuals that took READY and answered the Avoidant Attachment Scale questions.

Eight Items Measuring Avoidant Attachment Items: 755. I find it relatively easy to get close to others. 756. I’m not very comfortable having to depend on other people. 757. I’m comfortable having others depend on me. 758. I don’t like people getting too close to me. 759. I’m somewhat uncomfortable being too close to others. 760. I find it difficult to trust others completely. 761. I’m nervous whenever anyone gets too close to me. 762. Others often want me to be more intimate than I feel comfortable being. Reverse Coded Items - 756, 758, 759, 760, 761, 762. Original Response Categories: 1 = Strongly Disagree; 2 = Disagree; 3 = Somewhat Disagree; 4 = Undecided; 5 = Somewhat Agree; 6 = Agree; 7 = Strongly Agree

Mplus Commands for Single-Factor EFA with Categorical Items Title: Single-Factor Exploratory Factor Analysis (EFA) Data: File is READY_attachment scale.dat; Variable: Names are Gender Culture T1vAge T1v755 T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762 T1v763 T1v764 T1v765 T1v766 T1v767 T1v768 T1v769 T1v770 T1v771; MISSING ARE ALL (-9999); CATEGORICAL ARE ALL; USEVARIABLES ARE T1v755 T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762; ANALYSIS: ESTIMATOR IS ML; TYPE IS EFA 1 1; PLOT: TYPE IS PLOT2;

Establishing Construct Unidimensionality: Scree Plot for 8 Avoidant Attachment Items • Categorical Exploratory Factor Analysis • Eigenvalues: • 4.085, .897, .796, .656, .544, .470, .306, .246

Mplus Commands for Single-Factor CFA with Categorical Items Title: Single-Factor Categorical Confirmatory Factor Analysis(CFA)- Equivalent to a 2-Parameter Logistic (2PL) Graded Response Model Data: File is READY_attachment scale.dat; Variable: Names are Gender Culture T1vAge T1v755 T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762 T1v763 T1v764 T1v765 T1v766 T1v767 T1v768 T1v769 T1v770 T1v771; MISSING ARE ALL (-9999) ; CATEGORICAL ARE ALL; USEVARIABLES ARE T1v755 T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762; ANALYSIS: ESTIMATOR IS ML; MODEL: avoid BY T1v755* T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762; avoid@1; PLOT: TYPE IS PLOT2; OUTPUT: STDYX;

EFA and Standardized CFA Factor Loadings with Maximum Likelihood (Logistic) Estimation for Categorical Items Factor Loadings EFA CFA .642 .646 Item 755. I find it relatively easy to get close to others. .489 .495 Item 756. I’m not very comfortable having to depend on other people. .402 .405 Item 757. I’m comfortable having others depend on me. .850 .856 Item 758. I don’t like people getting too close to me. .834 .844 Item 759. I’m somewhat uncomfortable being too close to others. .625 .630 Item 760. I find it difficult to trust others completely. .837 .848 Item 761. I’m nervous whenever anyone gets too close to me. .576 .582 Item 762. Others often want me to be more intimate than I feel comfortable being.

Item Characteristic Curves: Category Usage For Items 755 and 756 756. I’m not very comfortable having to depend on other people. (reversed) 755. I find it relatively easy to get close to others.

Item Characteristic Curves: Category Usage For Items 757 and 758 758. I don’t like people getting too close to me. (reversed) 757. I’m comfortable having others depend on me.

Item Characteristic Curves: Category Usage For Items 759 and 760 759. I’m somewhat uncomfortable being too close to others. (reversed) 760. I find it difficult to trust others completely. (reversed)

Item Characteristic Curves: Category Usage For Items 761 and 762 761. Others often want me to be more intimate than I feel comfortable being. (reversed) 761. I’m nervous whenever anyone gets too close to me. (reversed)

Item and Test Information Curves for the Avoidant Attachment Items (Items 755-762) Item Information Curves Test (Total) Information Curve

Partial Total Information Curves for Two Sets of Items Items 755, 756, 757, 760, and 762 Items 758, 759, 761

Quick Comparison between CTT and IRT Models and Output

CTT Reliability Test (Cronbach’s Alpha=0.833)

CTT Confirmatory Factor Analysis MODEL FIT INFORMATION Akaike (AIC) 163581.316 Bayesian (BIC) 163742.454 RMSEA (RootMean Square Error Of Approximation) Estimate 0.087 ProbabilityRMSEA <= .05 0.000 CFI 0.944 TLI 0.921STDYX Standardization Two-Tailed Estimate S.E. Est./S.E. P-Value ATTACHME BY T1V755 0.577 0.010 60.529 0.000 T1V756 0.473 0.011 43.298 0.000 T1V757 0.337 0.012 27.213 0.000 T1V758 0.808 0.006 139.126 0.000 T1V759 0.791 0.006 129.527 0.000 T1V760 0.598 0.009 64.402 0.000 T1V761 0.802 0.006 135.732 0.000 T1V762 0.538 0.010 53.443 0.000

755. I find it relatively easy to get close to others. 756. I’m not very comfortable having to depend on other people. 757. I’m comfortable having others depend on me. 760. I find it difficult to trust others completely. 762. Others often want me to be more intimate than I feel comfortable being. 758. I don’t like people getting too close to me. 759. I’m somewhat uncomfortable being too close to others. 761. I’m nervous whenever anyone gets too close to me.

Additional Information:More Concept Introductions

Item Response Theory Models for Dichotomous and Polytomous items Introduction to the Graded Response Model

The Rasch Model • Threshold parameterization (IRTPRO, FlexMirt): • is the latent trait • is the estimated difficulty of itemj • is the estimated variance of the latent trait • The expected value (mean) of the latent trait is 0 • Intercept parameterization (Mplus): • Probability model:

Logistic Item Characteristic Curves for Five Equally Discriminating Items Differing only in Difficulty

The One Parameter Logistic (1PL) Model • Threshold parameterization: • is the latent trait • is the difficulty of itemj • is a common discrimination parameter for all items • The estimated variance of the latent trait is fixed at 1 • The expected value of the latent trait is 0 • Intercept parameterization:

The Two Parameter Logistic (2PL) Model • Threshold parameterization: • is an item-specific discrimination parameter • The estimated variance of the latent trait is fixed at 1 • Intercept parameterization: • / • Probability model:

Item Characteristic Curves for Five Equally Difficult Items Differing only in their Discrimination Parameters

Parameter Constraints for Selected Dichotomous Item IRT Models item1 • Rasch: • One Parameter Logistic (1PL): • Two Parameter Logistic (2PL): • Confirmatory Factor Analysis (CFA) • Effect-coded 2PL • Average slope is 1 • All other parameters are freely estimated Latent Trait Variance Intercepts/ Thresholds Loadings item2 1 item3

Logits for Dichotomous and Polytomous (Graded Response) Logistic IRT Models • Dichotomous • Polytomous (Graded Response Model) • Cumulative Probability

The Graded Response Model (GRM) • Threshold parameterization (IRTPRO, FlexMirt): • is the estimated discrimination parameter for item j • is the estimated category boundary threshold between categories k and k-1 for item j • The estimated variance of the latent trait () is fixed at 1 • Intercept parameterization (Mplus): • / • is the estimated factor loading for item j • is the estimated (rescaled and sign-reversed) category boundary threshold between categories k and k-1 for item j • The estimated variance of the latent variable () is fixed at 1

The Graded Response Model with a Common Discrimination Parameter • Threshold parameterization: • is a common discrimination parameter for all items • The estimated variance of the latent trait is fixed at 1 • Intercept parameterization:

The Graded Response “Rasch” Model • Graded Response Rasch threshold parameterization: • is the estimated variance of the latent trait • Graded Response Rasch intercept parameterization:

Model Constraints for the Graded Response Model Thresholds item1 • Graded Response Rasch Model: • Graded Response Model with a Common Discrimination Parameter • Traditional Graded Response Model • CFA Graded Response Model • Effect-coded Graded Response Model 1 Loadings item2 1 item3 1

Estimating the Graded Response Model as Constrained Item Factor Analysis models with MPlus • Graded Response Rasch model • Graded Response Model with a Common Discrimination Parameter • Traditional Graded Response Model • Graded Response model with a reference indicator (Mplus default) • Effect coded Graded Response model MODEL: f by item1-item7@1; ! Fix all of the loadings at 1 f*; ! Estimate the latent trait variance MODEL: f by item1-item7* (a); ! Estimate a common factor loading f@1; ! Fix the latent trait variance at 1 MODEL: f by item1-item7*; ! Freely estimate all of the loadings f@1; ! Fix the latent trait variance at 1 MODEL: f by item1-item7; ! Fix the loading for the first item at 1 f*; ! Estimate the latent trait variance MODEL: f by item1-item7* (a1-a7); ! Estimate and label the loadings f*; ! Estimate the latent trait variance MODEL CONSTRAINT: ! Constrain the loadings to average 1 a1=7-a2-a3-a4-a5-a6-a7; !

Graded Response and Generalized Partial Credit Logistic IRT Models for Polytomous Data

References and Resources de Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press,

References and Resources: Joint Committee on Standards for Educational and Psychological Testing of the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational ResearchAssociation. http://www.apa.org/science/programs/testing/standards.aspx

References and Resources:

A Different Way to Think About Measurement Development: