Crt dependability
Download
1 / 18

CRT Dependability - PowerPoint PPT Presentation


  • 110 Views
  • Uploaded on

CRT Dependability. Consistency for criterion-referenced decisions. Challenges for CRT dependability. Raw scores may not show much variation (skewed distributions) CRT decisions are based on acceptable performance rather than relative position

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CRT Dependability' - libitha


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Crt dependability

CRT Dependability

Consistency for criterion-referenced decisions


Challenges for crt dependability
Challenges for CRT dependability

  • Raw scores may not show much variation (skewed distributions)

  • CRT decisions are based on acceptable performance rather than relative position

  • A measure of the dependability of the classification (i.e., master / non-master) is needed


Approaches using cut score
Approaches using cut-score

  • Threshold loss agreement

    • In a test-retest situation, how consistently are the students classified as master / non-master

    • All misclassifications are considered equally serious

  • Squared error loss agreement

    • How consistent are the classifications

    • The consequences of misclassifying students far above or far below cut-point are considered more serious

Berk, R. A. (1984). Selecting the index of reliability. In R. A. Berk (Ed.), A guide to criterion-referenced test construction (pp. 231-266). Baltimore, MD: The Johns Hopkins University Press.


Issues with cut scores
Issues with cut-scores

  • “The validity of the final classification decisions will depend as much upon the validity of the standard as upon the validity of the test content” (Shepard, 1984, p. 169)

  • “Just because excellence can be distinguished from incompetence at the extremes does not mean excellence and incompetence can be unambiguously separated at the cut-off.” (p. 171)

Shepard, L. A. (1984). Setting performance standards. In R. A. Berk (Ed.), A guide to criterion-referenced test construction (pp. 169-198). Baltimore, MD: The Johns Hopkins University Press.


Methods for determining cut scores
Methods for determining cut-scores

  • Method 1: expert judgments about performance of hypothetical students on test

  • Method 2: test performance of actual students


Setting cut scores
Setting cut-scores

(Brown, 1996, p. 257)


Institutional decisions
Institutional decisions

(Brown, 1996, p. 260)


Agreement coefficient p o kappa

(p – pchance)

K=

(1 – pchance)

Agreement coefficient (po), kappa

Po = (A + D) / N

77

6

83

6

21

27

83

27

110

Po = (A + D) / N

Pchance = [(A+B)(A+C)+(C+D)(B+D)]/N2

Po = (77+21) / 110

K = (.89 - .63) / (1 - .63)

K = .70

Po = .89


Short cut methods for one administration
Short-cut methods for one administration

  • Calculate an NRT reliability coefficient

    • Split-half, KR-20, Cronbach alpha

  • Convert cut-score to standardized score

    • Z = [(cut-score - .5 – mean)] / SD

  • Use Table 7.9 to estimate Agreement

  • Use Table 7.10 to estimate Kappa


Estimate the dependability for the help reading test
Estimate the dependability for the HELP Reading test

Assume a cut point of 60%. What is the raw score?

27

z = -0.36

Look at Table 9.1. What is the approximate value of the agreement coefficient?

Look at Table 9.2. What is the approximate value of the kappa coefficient?


Squared error loss agreement
Squared-error loss agreement

  • Sensitive to degrees of mastery / non-mastery

  • Short-cut form of generalizability study

  • Classical Test Theory

    • OS = TS + E

  • Generalizability Theory

    • OS = TS + (E1 + E2 + . . . Ek)

Brennan, Robert (1995). Handout from generalizability theory workshop.


Phi lambda dependability index
Phi (lambda) dependability index

# of items

Cut-point

Mean of proportion scores

Standard deviation of proportion scores


Domain score dependability
Domain score dependability

  • Does not depend on cut-point for calculation

  • “estimates the stability of an individual’s score or proportion correct in the item domain, independent of any mastery standard” (Berk, 1984, p. 252)

  • Assumes a well-defined domain of behaviors



Confidence intervals
Confidence intervals

  • Analogous to SEM for NRTs

  • Interpreted as a proportion correct score rather than raw score


Reliability recap
Reliability Recap

  • Longer tests are better than short tests

  • Well-written items are better than poorly written items

  • Items with high discrimination (ID for NRT, B-index for CRT) are better

  • A test made up of similar items is better

  • CRTs – a test that is related to the objectives is better

  • NRTs – a test that is well-centered and spreads out students is better


ad