Assessing the fit of irt models in language testing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

Assessing the Fit of IRT Models in Language Testing PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on
  • Presentation posted in: General

Assessing the Fit of IRT Models in Language Testing. Muhammad Naveed Khalid Ardeshir Geranpayeh. Outline. Item Response Theory (IRT) Importance of Model Fit within IRT Fit Procedures Issues and Limitations Lagrange Multiplier (LM) Test An empirical study using LM Fit statistics

Download Presentation

Assessing the Fit of IRT Models in Language Testing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Assessing the fit of irt models in language testing

Assessing the Fit of IRT Models in Language Testing

Muhammad Naveed Khalid

Ardeshir Geranpayeh


Outline

Outline

  • Item Response Theory (IRT)

  • Importance of Model Fit within IRT

  • Fit Procedures

    • Issues and Limitations

    • Lagrange Multiplier (LM) Test

  • An empirical study using LM Fit statistics

    • Sharing Results

  • Conclusions


Item response theory irt

Item Response Theory (IRT)

  • A family of mathematical models that provide a common framework for describing people and items

  • Examinee performance can be predicted in terms of the underlying trait

  • Provides a means for estimating abilities of people and characteristics of items


Irt models

IRT Models

  • Dichotomous or Discrete

  • 1 Parameter Logistic Model / Rasch (1PL)

  • 2 Parameter Logistic Model (2PL)

  • 3 Parameter Logistic Model (3PL)

  • Polytomous or Scalar

  • Partial Credit Model (PCM)

  • Generalized Partial Credit Model (GPCM)

  • Graded Response Model (GRM)


Shape of item response function

Shape of Item Response Function


Model for item with 5 response categories

Model for Item with 5 response categories

Probability

Response

Category


Irt applications

IRT Applications

IRT applications in language testing are mainly used in

  • Test development

  • Item banking

  • Differential item functioning (DIF)

  • Computerized adaptive testing (CAT)

  • Test equating, linking and scaling

  • Standard setting

    The utility of the IRT model is dependent upon the extent to which the model accurately reflects the data


Model fit from item perspective

Model Fit from Item Perspective

Measurement Invariance (MI): Item responses can be described by the same parameters in all sub-populations.

Item Characteristic Curve (ICC): Describes the relation between the latent variable and the observable responses to items.

Local Independence (LI):Responses to different items are independent given the latent trait variable value.

Uni-dimensionalty

Speededness

Global


Consequences of misfit

Consequences of Misfit

Yen (2000) and Wainer & Thissen (2003) have shown the inadequacy of model-data fit

Some of the adverse consequences are:

  • Biased ability estimates

  • Unfair ranks

  • Wrongly equated scores

  • Student misclassifications

  • Score precision

  • Validity


Existing item fit procedures

Existing Item Fit Procedures

Chi – Square Statistics

Tests of the discrepancy between the observed and expected frequencies.

Pearson-Type Item-Fit Indices (Yen, 1984; Bock, 1972).

Likelihood Ratio Based Item-Fit Indices (McKinley & Mills, 1985).


Issues in existing fit procedures

Issues in Existing Fit Procedures

  • The standard theory for chi-square statistics does not hold.

  • Failure to take into account the stochastic nature of the item parameter estimates.

  • Forming of subgroups for the test are based on model-dependent trait estimates.

  • There is an issue of the number of degrees of freedom.

  • It is sensitive to test length and sample size.


Lagrange multiplier lm test

Lagrange Multiplier (LM) Test

Glas(1999) proposed the LM test to the evaluation of model fit.

The LM tests are used for testing a restricted model against a more general alternative one.

Consider a null hypothesis about a model with parameters

This model is a special case of a general model with parameters


Lm item fit statistics

LMItem Fit Statistics

MI / DIF

LI

ICC

Null Model

Alternative Model

Null Model

Alternative Model

Alternative Model

Null Model


Empirical example

Empirical Example

  • Data from Cambridge English First (FCE)

    • Reading 3 parts/30 questions

    • Listening 4 parts/30 questions

  • Sample size over 35000

  • The approach can be applied to any other language exam


Conclusions

Conclusions

  • LM statistics overcome existing FIT issues

  • Less computational intensive

  • Size of residuals in the form of Abs.Dif is highly valuable

  • Fit of IRT model holds reasonably (FCE)

  • Items violated - MI (4); ICC (3); LI (7)

  • Magnitude of violation is not severe


Assessing the fit of irt models in language testing

Thank you!

&

Questions


  • Login