. Measurements are more or less variable from one occasion to the next.If they are reliable, then they are relatively stableIf unreliable, then they are relatively unstable. Example:. You have an orthopedic patient and it is necessary to frequently measure the ROM in the wrist.Usually you do the
1. Psychometrics Reliability: Refers to the dependability, consistency, and stability of a test
2. Measurements are more or less variable from one occasion to the next.
If they are reliable, then they are relatively stable
If unreliable, then they are relatively unstable
3. Example: You have an orthopedic patient and it is necessary to frequently measure the ROM in the wrist.
Usually you do the measurement, but on some occasions another therapist measures for you.
If the test is unreliable, differences in performance may be due to the raters rather than due to the patientís performance.
4. Think about reliability in terms of accuracy
How close are our measurements to the ďtrueĒ value?
Observed score = True score + Error
The reliability refers to the error in the above equation.
A more reliable test will have smaller error and the observed score will be closer to the true score.
5. Reliable Vs. Unreliable Test
6. Sources of Measurement Error Individual Factors: Health, motivation, mental efficiency, concentration, luck
Situational Factors: Room environment
Subjectivity: A problem with non-objective tests
Instrumental Factors: Errors in equipment
7. Reliability Coefficient Is a correlation coefficient
Ranges from -1.0 to +1.0
Reliabilities in the (-) range are not reported, and if obtained, may indicate a severe problem with the test.
8. Types of Reliability 1. Test-Retest
2. Alternate Form
3. Internal Consistency
9. Test-Retest Reliability Results in a coefficient of test stability
Refers to how stable (or consistent) a test is from one administration to the next (by the same examiner).
It tells you how stable the test results are over a time period.
10. Example You are evaluating a child with a test that looks at tactile hypersensitivity. If the test is unreliable, then your test results will differ each time you use it.
The problem with this is that you canít tell if the test results are different due to a poorly designed test, or due to differences in the patient.
11. Is Test-Retest Reliability Necessary for Every Test? No, in fact it isnít appropriate for every test.
If the test doesnít measure stable traits, then you wouldnít expect to have high test-retest reliability.
For example: tests that measure attitudes would have lower test-retest reliability because attitudes vary over time.
12. How is Test-Retest Reliability Computed? Need one group of subjects
Give the test to all the subjects
Wait a period of time
Give the same test to the same group (using the same examiner)
Correlate the results of the two test administrations
13. How Long to Wait Between Administrations? Depends on the nature of the test
For developmental tests wait less time (no more than a couple of weeks or so).
If you wait too long you canít tell if differences in scores are due to the test being unreliable or due to developmental factors.
14. For tests that measure stable characteristics you can wait longer.
You would probably not wait more than two months in any case.
The longer the time period, the more variation there will be in the scores from one administration to the next.
15. Alternate Form Reliability Also called parallel-form or equivalent form
Tells you about the similarity between two forms of a test.
Only appropriate (or necessary) if you need to have multiple forms of the same test.
Not usually an issue for tests used in OT.
16. How to Compute Alternate Form Reliability Construct two forms of a test
Forms need to have the same number of questions
Each item on one test must be related to an item on the other test
Tests should have the same means and SDs when given to a group.
17. Have one group of subjects
Half take form A of the test, then take form B
The other half take form B first, then form A.
This is called counterbalancing. It removes the variables of fatigue, inability to finish the test, etc.
Correlate scores between forms A and B to get reliability coefficient.
18. Internal Consistency This type of reliability tells you if a test is homogeneous in its content
1. split-half: Performance on two halves of the test are compared.
Example: test with 100 items
Even items: 1st form
Odd items: 2nd form
Calculate two scores for each person: one for each form of the test
Correlate the two scores (odd and even)
Each person only has to take one test.
Can determine reliability with one test, one administration, one group of subjects
Reliability is affected by the number of items. The larger the number, the better the reliability (in general).
This is adjusted for using the Spearman-Brown Prophecy Formula.
20. 2. Kuder-Richardson (KR 20, KR 21)
Also a good way to obtain a reliability coefficient using only one test administration and one group of subjects.
These procedures do numerous split-half estimates (splitting the test up a different way each time). Then, the average of all these split half estimates is taken.
21. Inter-Rater Reliability Estimates how consistent the test is when used by different raters.
Percent agreement between raters
Correlation of ratersí scores
Kappa statistic (percent agreement that is corrected for chance)
22. Factors that Influence Reliability Length of the test (longer test, higher reliability)
Range of scores in the sample (reliability higher if sample is more heterogeneous)
Difficulty of a test (items of average difficulty give the most information and make the test more reliable)
23. Reliability coefficient also affected by what method was used to estimate it.
Parallel form generally higher than test-retest
Length of time between tests affects test-retest
Internal consistency may be high, but donít tell you about stability of test over time or stability between raters.
24. How High Should Reliability Be? Depends on the type of instrument.
Tests in the cognitive domain should have reliability coefficients in the .90s
Tests in the affective domain should have reliabilities in the .80s and above
Motor tests frequently in the .79 to .80 range.
25. What About Tests with Low Reliability? Some tests may have overall reliability that is sufficient, but have some subtests with low coefficients.
How should you use these tests?
Be careful of using them for placement decisions
Try to use another test in addition so that you can best-estimate the patientís abilities.
26. Standard Error of Measurement (SEM) Reliability is a statistic about a group. It doesnít tell you anything about an individualís performance.
SEM is related to reliability and is used to make inferences about an individual.
We can use it to better estimate a patientís true performance on a test.
27. Calculating SEM SEM = SD * ?(1-reliability)
The higher the reliability, the smaller the SEM
The SEM can never be larger than the SD
If a test is perfectly unreliable, then the SEM is equal to the SD
The poorer the reliability, the larger the SEM.
28. Interpreting SEM SEM is interpreted the same way that SDs are interpreted.
Think about an individual taking a test many times.
These test results (observed scores) can be plotted on a graph and will eventually form a normal curve.
The SD for this distribution is the SEM
29. Since the SEM is like a SD, we can use the same percentages to interpret scores
68% of the time the true score will be in the interval +/- 1 SEM
95% of the time the true score will be in the interval +/- 2 SEM
99.7% of the time the true score will be in the interval +/- 3 SEM
30. SEM gives you a confidence interval around a personís score
Using this interval, you can avoid reporting one score (in some cases)
Instead, you report a range of scores, as is seen in a profile.
In a highly reliable test, the SEM is relatively smaller, and the confidence intervals around a personís score are smaller.
31. Conversely, when a test is very unreliable, the SEM is relatively larger.
This means that the confidence intervals around a personís score will be larger.
You are more uncertain of their true performance on the test.
32. The SEM is a powerful concept and will allow you to more accurately determine a test score.
SEM should be reported in the test manual, and you should be able to use it for any test or subtest.