- 114 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'CRT Dependability' - libitha

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### CRT Dependability

Consistency for criterion-referenced decisions

Challenges for CRT dependability

- Raw scores may not show much variation (skewed distributions)
- CRT decisions are based on acceptable performance rather than relative position
- A measure of the dependability of the classification (i.e., master / non-master) is needed

Approaches using cut-score

- Threshold loss agreement
- In a test-retest situation, how consistently are the students classified as master / non-master
- All misclassifications are considered equally serious
- Squared error loss agreement
- How consistent are the classifications
- The consequences of misclassifying students far above or far below cut-point are considered more serious

Berk, R. A. (1984). Selecting the index of reliability. In R. A. Berk (Ed.), A guide to criterion-referenced test construction (pp. 231-266). Baltimore, MD: The Johns Hopkins University Press.

Issues with cut-scores

- “The validity of the final classification decisions will depend as much upon the validity of the standard as upon the validity of the test content” (Shepard, 1984, p. 169)
- “Just because excellence can be distinguished from incompetence at the extremes does not mean excellence and incompetence can be unambiguously separated at the cut-off.” (p. 171)

Shepard, L. A. (1984). Setting performance standards. In R. A. Berk (Ed.), A guide to criterion-referenced test construction (pp. 169-198). Baltimore, MD: The Johns Hopkins University Press.

Methods for determining cut-scores

- Method 1: expert judgments about performance of hypothetical students on test
- Method 2: test performance of actual students

Setting cut-scores

(Brown, 1996, p. 257)

Institutional decisions

(Brown, 1996, p. 260)

(p – pchance)

K=

(1 – pchance)

Agreement coefficient (po), kappaPo = (A + D) / N

77

6

83

6

21

27

83

27

110

Po = (A + D) / N

Pchance = [(A+B)(A+C)+(C+D)(B+D)]/N2

Po = (77+21) / 110

K = (.89 - .63) / (1 - .63)

K = .70

Po = .89

Short-cut methods for one administration

- Calculate an NRT reliability coefficient
- Split-half, KR-20, Cronbach alpha
- Convert cut-score to standardized score
- Z = [(cut-score - .5 – mean)] / SD
- Use Table 7.9 to estimate Agreement
- Use Table 7.10 to estimate Kappa

Estimate the dependability for the HELP Reading test

Assume a cut point of 60%. What is the raw score?

27

z = -0.36

Look at Table 9.1. What is the approximate value of the agreement coefficient?

Look at Table 9.2. What is the approximate value of the kappa coefficient?

Squared-error loss agreement

- Sensitive to degrees of mastery / non-mastery
- Short-cut form of generalizability study
- Classical Test Theory
- OS = TS + E
- Generalizability Theory
- OS = TS + (E1 + E2 + . . . Ek)

Brennan, Robert (1995). Handout from generalizability theory workshop.

Phi (lambda) dependability index

# of items

Cut-point

Mean of proportion scores

Standard deviation of proportion scores

Domain score dependability

- Does not depend on cut-point for calculation
- “estimates the stability of an individual’s score or proportion correct in the item domain, independent of any mastery standard” (Berk, 1984, p. 252)
- Assumes a well-defined domain of behaviors

Confidence intervals

- Analogous to SEM for NRTs
- Interpreted as a proportion correct score rather than raw score

Reliability Recap

- Longer tests are better than short tests
- Well-written items are better than poorly written items
- Items with high discrimination (ID for NRT, B-index for CRT) are better
- A test made up of similar items is better
- CRTs – a test that is related to the objectives is better
- NRTs – a test that is well-centered and spreads out students is better

Download Presentation

Connecting to Server..