Models for Measuring Respondent Abilities

Models for Measuring

What do the models have in common? • They are all cases of a general model. • How are people responding? • What are your intentions in the analysis? • The items and persons are separable. • They all start with a “number correct” (test) or an “integer score” (Likert scale). • You must have whole-number responses • They do not use a slope parameter • Slopes do not vary from person to person (or item to item) • All person parameters and item parameters are expressed in same scale units.

Dichotomous Model • Pass / Fail…Right / Wrong…Yes / No • One step: Successfully complete it or not • : a person’s (n) probability of scoring 1 rather than 0 on item i • :ability of person n • :difficulty of item i (the step from 0 to 1)

Item Characteristic Curves for Five Dichotomous Items

What happens to the probability of getting a 0 as ability increases? A 1?

What happens if we add another category?

Interpreting the curves • Between the 0 and 2 curves is the curve which shows the probability of a score of 1. • When a person has very low “ability” relative to the item’s difficulty, the most likely response is 0 • When a person is of moderate “ability” relative to the item’s difficulty, the most likely response is 1 • When a person has an ”ability” much greater than the item’s difficulty, the most likely response is 2.

The τs are Thresholds • Show the points where the probability of a response of either 0 or 1, and 1 or 2 are equally likely. • In the case of a dichotomous response (with two categories), the only threshold is the difficulty, which is the point where the probability of either 0 or 1 is the same. • In the case of three categories there are two thresholds, each of which qualifies the average difficulty of the item.

Rating Scale • Specifies that a set of items share the same rating scale structure. • Originates in attitude surveys where the respondent is presented the same response choices for several items. • When measures are communicated to others, it is impractical to present a different rating scale structure for each item. • Perhaps the audience can comprehend two structures, one for positively worded items and one for negatively worded items.

Rating Scale Model • Probability of person n responding in category x to item i. • A position on the variable βn is estimated for each person n • δi is the location of item i on the variable, and τk is the location of the kth step in each item relative to that item’s scale value • m response “thresholds” τ1, τ2,…τm,are estimated for the m+1 rating categories

Partial Credit • We can take the second step only if we have successfully completed the first • Responses that are incorrect, but indicate some knowledge, are given partial credit toward a correct response. The amount of partial correctness varies across items. • Response structure and process: the response of one person to one item in one of the categories. • Specifies that each item has its own rating scale structure.

Partial Credit Model • :probability of person n completing x steps on • item i. • :ability of person n • :difficulty of item i on step j

Rasch Reliability: “Reproducibility of Relative Measure Location” • High reliability: There is a high probability that persons (or items) estimated with high measures actually do have higher measures than persons (or items) estimated with low measures. • Winsteps reports a “model” and a “real” reliability: • The "model" reliability is an upper bound to this value. • The "real" reliability is a lower bound to this value Raw score-based reliability vs. Measure-based reliability: www.rasch.org/rmt/rmt113l.htm

Person Reliability • Equivalent to the traditional "test" reliability. • Does your instrument discriminate the sample into enough levels for your purpose? • 0.9 = 3 or 4 levels. 0.8 = 2 or 3 levels. 0.5 = 1 or 2 levels • Low values indicate a narrow range of person measures OR a small number of items. • To Improve person reliability: • Test persons with a wider range of abilities • Lengthen the instrument • Improving the test targeting may help slightly Note: Person reliability is independent of sample size.

Item Reliability • Low reliability means that your sample is not big enough to precisely locate the items on the latent variable. • To improve item reliability: • Increase item difficulty variance • Increase person sample size Note: Item reliability is independent of test length.

What is Separation? • Separation is the number of statistically different performance strata that the test can identify in the sample. • A separation of "2" implies that only two levels of performance can be consistently identified by the test for samples like the one tested. • 0.95 corresponds to a separation of 4.5, meaning 4 consistently identifiable strata.

Relationship of Reliability and Separation http://www.rasch.org/rmt/rmt63i.htm

Models for Measuring Respondent Abilities

Models for Measuring Respondent Abilities

Presentation Transcript

Models for Measuring and Hedging Risks in a Network Plan

Slides 13b: Time-Series Models; Measuring Forecast Error

Measuring Prototype Structures for Models

Measuring for improvement

Measuring for Improvement

Units for Measuring

An approach for measuring the convergence of group mental models

Measuring for Success

Measuring for Success

Measuring for Success

Measuring for Success

Measuring and Explaining Differences in Wireless Simulation Models

Measuring and building models of behavior

Measuring for Success

COMBINING HETEROGENEOUS MODELS FOR MEASURING RELATIONAL SIMILARITY

Measuring the Effort for Creating and Using Domain-Specific Models

Measuring for change:

Measuring for Improvement

Measuring for success –

Measuring for Resilience

“ Measuring for Success ”

Measuring for Success