170 likes | 180 Views
Models for Measuring. What do the models have in common?. They are all cases of a general model. How are people responding? What are your intentions in the analysis? The items and persons are separable. They all start with a “number correct” (test) or an “integer score” (Likert scale).
E N D
What do the models have in common? • They are all cases of a general model. • How are people responding? • What are your intentions in the analysis? • The items and persons are separable. • They all start with a “number correct” (test) or an “integer score” (Likert scale). • You must have whole-number responses • They do not use a slope parameter • Slopes do not vary from person to person (or item to item) • All person parameters and item parameters are expressed in same scale units.
Dichotomous Model • Pass / Fail…Right / Wrong…Yes / No • One step: Successfully complete it or not • : a person’s (n) probability of scoring 1 rather than 0 on item i • :ability of person n • :difficulty of item i (the step from 0 to 1)
What happens to the probability of getting a 0 as ability increases? A 1?
Interpreting the curves • Between the 0 and 2 curves is the curve which shows the probability of a score of 1. • When a person has very low “ability” relative to the item’s difficulty, the most likely response is 0 • When a person is of moderate “ability” relative to the item’s difficulty, the most likely response is 1 • When a person has an ”ability” much greater than the item’s difficulty, the most likely response is 2.
The τs are Thresholds • Show the points where the probability of a response of either 0 or 1, and 1 or 2 are equally likely. • In the case of a dichotomous response (with two categories), the only threshold is the difficulty, which is the point where the probability of either 0 or 1 is the same. • In the case of three categories there are two thresholds, each of which qualifies the average difficulty of the item.
Rating Scale • Specifies that a set of items share the same rating scale structure. • Originates in attitude surveys where the respondent is presented the same response choices for several items. • When measures are communicated to others, it is impractical to present a different rating scale structure for each item. • Perhaps the audience can comprehend two structures, one for positively worded items and one for negatively worded items.
Rating Scale Model • Probability of person n responding in category x to item i. • A position on the variable βn is estimated for each person n • δi is the location of item i on the variable, and τk is the location of the kth step in each item relative to that item’s scale value • m response “thresholds” τ1, τ2,…τm,are estimated for the m+1 rating categories
Partial Credit • We can take the second step only if we have successfully completed the first • Responses that are incorrect, but indicate some knowledge, are given partial credit toward a correct response. The amount of partial correctness varies across items. • Response structure and process: the response of one person to one item in one of the categories. • Specifies that each item has its own rating scale structure.
Partial Credit Model • :probability of person n completing x steps on • item i. • :ability of person n • :difficulty of item i on step j
Rasch Reliability: “Reproducibility of Relative Measure Location” • High reliability: There is a high probability that persons (or items) estimated with high measures actually do have higher measures than persons (or items) estimated with low measures. • Winsteps reports a “model” and a “real” reliability: • The "model" reliability is an upper bound to this value. • The "real" reliability is a lower bound to this value Raw score-based reliability vs. Measure-based reliability: www.rasch.org/rmt/rmt113l.htm
Person Reliability • Equivalent to the traditional "test" reliability. • Does your instrument discriminate the sample into enough levels for your purpose? • 0.9 = 3 or 4 levels. 0.8 = 2 or 3 levels. 0.5 = 1 or 2 levels • Low values indicate a narrow range of person measures OR a small number of items. • To Improve person reliability: • Test persons with a wider range of abilities • Lengthen the instrument • Improving the test targeting may help slightly Note: Person reliability is independent of sample size.
Item Reliability • Low reliability means that your sample is not big enough to precisely locate the items on the latent variable. • To improve item reliability: • Increase item difficulty variance • Increase person sample size Note: Item reliability is independent of test length.
What is Separation? • Separation is the number of statistically different performance strata that the test can identify in the sample. • A separation of "2" implies that only two levels of performance can be consistently identified by the test for samples like the one tested. • 0.95 corresponds to a separation of 4.5, meaning 4 consistently identifiable strata.
Relationship of Reliability and Separation http://www.rasch.org/rmt/rmt63i.htm