Agenda

Introduction to IRT/Rasch Measurement with WinstepsKen Conrad, University of Illinois at Chicago Barth Riley and Michael Dennis,Chestnut Health Systems

Agenda 12:30. Ken Conrad: Power-point presentation on classical test theory compared to Rasch, includes history and introduction to the Rasch model. 2:15. Break 2:30. Discussion of an application of Rasch analysis in the measurement of posttraumatic stress disorder with interpretation of Rasch/Winsteps output. 3:15. Barth Riley: Implications and Extensions of Rasch Measurement. 4:15. Break. 4:30. Mike Dennis: Practical applications of IRT/Rasch in SUD screening and outcome assessment 5:15. Open discussion and Q & A. 5:30. End of workshop.

The Dream of Rulers of Human Functioning • Beyond organ function to human function—WHO, 1947 • E.g., quality of life, need to ask person • 1970’s--Physical, social, and mental health issues • Measuring many constructs requires many items—time, $, burden • Today—need for psychometric efficiency w/o loss of reliability and construct validity

Prevailing Paradigm, Classical Test Theory • CTT—more items for more reliability • Since we seek efficiency (fewer items), items tend to be where most of the people are—around the mean. • Result—redundancy at mid-range, few items at extremes, ceiling and floor effects • Impossible to measure improvement of those in ceiling and decline of those in floor.

How children measure wooden rods (from Piaget) • Classification—separate the rods from the cups, the balls, etc. (nominal) • Seriation—line them up by size (ordinal) • Iteration—develop a unit to know how much bigger (interval) • Standardization—make a rule(r) and a process for determining how many units each rod has • Children know that classification and seriation are not measurement, Stevens did not: nominal, ordinal, interval, ratio

Improvement: IRT/Rasch measurement and computers • Rasch measurement model enables construction of a ruler with as many items as we want at any level of the construct • The computer enables choice of items based on each person’s pattern of responses. • Each test is tailored to the individual, and not all of the items are needed.

Classical Test Theory A measure is a sample of items from an infinite domain of items that represent the attribute of interest. • Items are treated as replicates of one another in the sense that differences among the items are ignored in scaling. • More items=more reliability • Everyone gets the same items • Answers needed to all items

Ranking is sample dependent E.g., NBA players, jockeys. Height could be in the same 1-5 ordinal metric where both a jockey and NBA player could be rated 5, but this could only be interpreted with reference to a particular sample. The sample defines height. With interval scaling, height defines the sample. Over 6’=NBA, under 6’=jockey.

Classical Test Theory • Uses ordinal data as interval. • Using presumably impermissible transformations, i.e. using ordinal as interval, usually makes little, if any, difference to results of most analyses. • Thus, if it behaves like an interval scale, it can be treated as one. • Just use the raw scores. Add ‘em up. • Clean and easy

Assumption: all items are created equal But we know that is not true. Is that how we measure potatoes? How about spelling? Items actually range from: Easy->hard Like addition -> division E.g., Guttman: 1111100000 Lack of recent practice on item 5: 1111011000 Educated guess on item 8: 1111100100 Slow, nervous start: 0111111000

No Difficulty Parameter in CTT.What if two students both got 5 out of 10 correct, but one got the 5 easiest right and the other the 5 hardest?Easy->hard Peter 1111100000 Paul 0000011111Do they have the same ability? Wouldn’t you like to get a better idea of what happened on Paul’s test? Did he arrive late? Were test pages missing? Maybe they were word problems, and Paul is a foreign student.

With CTT, extremely difficult to compare a person’s scores on two or more different tests—usually compare z-scores. • Assumes that samples of both tests center on the same mean. • Assumes that all of the tests are normally distributed, which is rarely the case.

Assumptions of CTT • CTT =take the test, e.g., SD, D, A, or SA on 50 items. What if there is missing data? • CTT uses ordinal scaling, but assumes equal intervals in the rating scale. However, we know that distances between scale points usually are not equal, e.g., The President is doing a good job. SD D A SA To WWII veterans: Do you wear fashionable shoes? N SD D A SA CTT gives us very limited ability to examine the performance of our rating scales. Do they really work the way we want them to?

Cronbach’s Alpha • Adding items improves alpha, but are they good items? • Ceiling and floor effects improve alpha. • CTT assumes homoscedasticity—that the error of measurement is the same at the high end of the scale as in the middle or at the low end. • However, ordinal measures are biased, especially at the extremes where there is much more error.

To Count > To Measure E.G., From counting potatoes to measuring their quality. From counting number of drinks to measuring substance use disorders. From summing Likert ratings to linear, interval measurement.

Agenda

Agenda

Presentation Transcript

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda:

Agenda

Agenda

AGENDA