1 / 89

A Practitioner’s Introduction to Equating

A Practitioner’s Introduction to Equating. with Primers on Classical Test Theory (CTT) and Item Response Theory (IRT). Joseph Ryan , Arizona State University Frank Brockmann , Center Point Assessment Solutions. Workshop:. Assessment, Research and Evaluation Colloquium

mahina
Download Presentation

A Practitioner’s Introduction to Equating

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Practitioner’s Introduction to Equating with Primers on Classical Test Theory (CTT) and Item Response Theory (IRT) Joseph Ryan, Arizona State University Frank Brockmann, Center Point Assessment Solutions Workshop: Assessment, Research and Evaluation Colloquium Neag School of Education, University of Connecticut October 22, 2010

  2. Acknowledgments • Council of Chief State School Officers (CCSSO) • Technical Issues in Large Scale Assessment (TILSA) and Subcommittee on Equating, • part of the State Collaborative on Assessment and Student Standards (SCASS) • Doug Rindone and Duncan MacQuarrie, CCSSO TILSA Co-Advisers • Phoebe Winter, Consultant • Michael Muenks, TILSA Equating Subcommittee Chair • Technical Special Interest Group of National Assessment of Educational Progress (NAEP) • coordinators • Hariharan Swaminathan, University of Connecticut • Special thanks to Michael Kolen, University of Iowa

  3. Workshop Topics The workshop covers the following topics: • Overview - Key concepts of assessment, linking, and equating • Measurement Primer – Classical and IRT theories • Equating Basics • The Mechanics of Equating • Equating Issues

  4. 1. Overview Key Concepts in Assessment, Linking, Equating

  5. Assessment, Linking, and Equating Validity is… … an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment. (Messick, 1989, p. 13) Validity is the essential motivation for developing and evaluating appropriate linking and equating procedures.

  6. Assessment, Linking, and Equating

  7. Linking and Equating • Equating • Scale aligning • Predicting/Projecting Holland in Dorans, Pommerich and Holland (2007)

  8. Misconceptions About Equating Equating is… • …a threat to measuring gains. • …a tool for universal applications. • …a repair shop. • …a semantic misappropriation. MYTH WISHFUL THOUGHT MISCONCEPTION MISUNDERSTANDING

  9. 2. Measurement Primer Classical Test Theory (CTT) Item Response Theory (IRT)

  10. Classical Test Theory The Basic Model = + E error (with some MAJOR assumptions) O observed score T true score • Reliability is derived from the ratio of error score to true score • Key item features include: • Difficulty • Discrimination • Distractor Analysis

  11. Classical Test Theory Reliability reflects the consistency of students' scores • Over time, test retest • Over forms, alternate form • Within forms, internal consistency Validity reflects the degree to which scores assess what the test is designed to measure in terms of • Content • Criterion related measures • Construct

  12. Item Response Theory (IRT) The Concept • An approach to item and test analysis that estimates students’ probable responses to test questions, based on • the ability of the students • one or more characteristics of the test items

  13. Item Response Theory (IRT) • IRT is now used in most large-scale assessment programs • IRT models apply to items that use • dichotomous scoring with right (1) or wrong (0) answers and • polytomous scoring with items scored with ordered categories (1, 2, 3, 4) common with written essays and open-ended constructed response items • IRT is used in addition to procedures from CTT INFO

  14. Item Response Theory (IRT) IRT Models • All IRT models reflect the ability of students. In addition, the most common basic IRT models include: • The 1-parameter model – (aka Rasch model) models item difficulty • The 2-parameter model – models item difficulty and discrimination • The 3-parameter model – models item difficulty, discrimination • and pseudo guessing

  15. Item Response Theory (IRT) IRT Assumptions • Item Response Theory requires major assumptions: • Unidimensionality • Item Independence • Data-Model Fit • Fixed but arbitrary scale origin

  16. Item Response Theory (IRT) A Simple Conceptualization BASIC PROFICIENT ADVANCED -1.5 +2.25

  17. Item Response Theory (IRT) Probability of a Student Answer

  18. Item Response Theory (IRT) Item Characteristic Curve for Item 2

  19. Item Response Theory (IRT)

  20. IRT and Flexibility IRT provides considerable flexibility in terms of • constructing alternate tests forms • administering tests well matched or adapted to students’ ability level • building sets of connected tests that span a wide range (perhaps two or more grades) • inserting or embedding new items into existing test forms for field testing purposes so new items can be placed on the measurement scale INFO

  21. 3. Equating Basics Basic Terms (Sets 1, 2, and 3) Equating Designs (a, b, c) Item Banking (a, b, c, d)

  22. Basic Terms Set 1 Column A Column B __Anchor Items A. Sleepwear __Appended Items B. Nautically themed __Embedded Items apparel C. Vestigial organs D. EMIP learning module USEFUL TERMS

  23. Basic Terms Set 2 For each term, make some notes on your handout: Pre-equating - Post equating - USEFUL TERMS

  24. Basic Terms Set 3 For each term, make some notes on your handout: Horizontal Equating – Vertical Equating (Vertical Scaling) – Form-to-Form (Chained) Equating – Item Banking – USEFUL TERMS

  25. Equating Designs Random Equivalent Groups Single Group Anchor Items

  26. Equating Designs a. Random Equivalent Groups

  27. Equating Designs b. Single Group The potential for order effects is significant--equating designs that use this data collection method should always be counterbalanced! CAUTION

  28. Equating Designs b. Single Group with Counterbalance

  29. Equating Designs c. Anchor Item Design not always at the end

  30. Equating Designs c. Anchor Item Set

  31. Equating Designs c. Anchor Item Designs • Internal/Embedded • Internal/Appended • External USEFUL TERMS

  32. Equating Designs Internal Embedded Anchor Items

  33. Equating Designs Internal Appended Anchor Items

  34. Equating Designs External Anchor Items

  35. Equating Designs Guidelines for Anchor Items • Mini-Test • Similar Location • No Alterations • Item Format Representation RULES of THUMB

  36. 3. Equating Basics Basic Terms (Sets 1, 2, and 3) Equating Designs (a, b, c) Item Banking (a, b, c, d)

  37. Item Banking Basic Concepts Anchor-item Based Field Test Matrix Sampling Spiraling Forms

  38. Item Banking a. Basic Concepts • An item bank is a large collection of calibrated and scaled test items representing the full range, depth, and detail of the content standards • Item Bank development is supported by field testing a large number of items, often with one or more anchor item sets. • Item banks are designed to provide a pool of items from which equivalent test forms can be built. • Pre-equated forms are based on a large and stable item bank.

  39. Item Banking b. Anchor Item Based Field Test Design Field test items are most appropriately embedded within, not appended to, the common items. RULE of THUMB

  40. Item Banking c. Matrix Sampling • Items can be assembled into relatively small blocks (or sets) of items. • A small number of blocks can be assigned to each test form to reduce test length. • Blocks may be assigned to multi forms to enhance equating. • Blocks need not be assigned to multi forms if randomly equivalent groups are used.

  41. Item Banking c. Matrix Sampling

  42. Item Banking d. Spiraling Forms • Tests forms can be assigned to individual students, or students grouped in classrooms, schools, districts, or some other units.  • “Spiraling” at the student level involves assigning different forms to different students within a classroom. • “Spiraling” at the classroom level involves assigning different forms to different classrooms within a school. • “Spiraling” at the school or district level follows a similar pattern.

  43. Item Banking d. Spiraling Forms

  44. Item Banking d. Spiraling Forms • Spiraling at the student level is technically desirable: • provides randomly equivalent groups • minimizes classroom effect on IRT estimates • (most IRT procedures assume independent • responses) • Spiraling at the student level is logistically problematic: • exposes all items in one location • requires careful monitoring of test packets and • distribution • requires matching test form to answer key at the • student level

  45. It’s Never Simple! Linking and equating procedures are employed in the broader context of educational measurement which includes, at least, the following sources of random variation (statistical error variance) or imprecision. • Content and process representation • Errors of measurement • Sampling errors • Violations of assumptions • Parameter estimation variance • Equating estimation variance IMPORTANT CAUTION

  46. 4. The Mechanics of Equating The Linking-Equating Continuum Classical Test Theory (CTT) Approaches Item Response Theory (IRT) Approaches

  47. The Linking-Equating Continuum • Linking is the broadest terms used to refer to a collection of procedures through which performance on one assessment is associated or paired with performance on a second assessment. • Equating is the strongest claim made about the relationship between performance on two assessments and asserts that the scores that are equated have the same substantive meaning. USEFUL TERMS

  48. The Linking-Equating Continuum different forms of linking equating (strongest kind of linking)

  49. The Linking-Equating Continuum Frameworks • There are a number of frameworks for describing various forms of linking: • Mislevy, 1992 • Linn, 1993 • Holland, 2007 • (in Dorans, Pommerich, and Holland 2007)

  50. The Linking-Equating Continuum In 1992, Mislevy described four typologies of linking test forms: moderation, projection, calibration, and equating (Mislevy, 1992, pp. 21-26). In his model, moderation is the weakest form of linking tests, while equating is considered the strongest type. Thus, equating is done to make scores as interchangeable as possible.

More Related