Loading in 5 sec....

Chapter 3 Reliability and ObjectivityPowerPoint Presentation

Chapter 3 Reliability and Objectivity

- 137 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Chapter 3 Reliability and Objectivity' - len-hill

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Chapter 3Reliability and Objectivity

Chapter 3 Outline

- Selecting a Criterion Score
- Types of Reliability
- Reliability Theory
- Estimating Reliability – Intraclass R
- Spearman-Brown Prophecy Formula
- Standard Error of Measurement
- Objectivity
- Reliability of Criterion-referenced Tests
- Reliability of Difference Scores

Objectivity

- Interrater Reliability
- Agreement of competent judges about the value of a measure.

Reliability

- Dependability of scores
- Consistency
- Degree to which a test is free from measurement error.

Selecting a Criterion Score

- Criterion score – the measure used to indicate a person’s ability.
- Can be based on the mean score of the best score.

- Mean Score – average of all trials.
- Usually a more reliable estimate of a person’s true ability.

- Best Score – optimal score a person achieves on any one trial.
- May be used when criterion score is to be used as an indicator of maximum possible performance.

Potential Methods to Select a Criterion Score

- Mean of all trials.
- Best score of all trials.
- Mean of selected trials based on trials on which group scored best.
- Mean of selected trials based on trials on which individual scored best (i.e., omit outliers).
Appropriate method to use depends on the situation.

Norm-referenced Test

- Designed to reflect individual differences.

In Norm-referenced Framework

- Reliability - ability to detect reliable differences between subjects.

Types of Reliability

- Stability
- Internal Consistency

Stability (Test-retest) Reliability

- Each subject is measured with same instrument on two or more different days.
- Scores are then correlated.
- An intraclass correlation should be used.

Internal Consistency Reliability

- Consistent rate of scoring throughout a test or from trial to trial.
- All trials are administered in a single day.
- Trial scores are then correlated.
- An intraclass correlation should be used.

Sources of Measurement Error

- Lack of agreement among raters (i.e., objectivity).
- Lack of consistent performance by person.
- Failure of instrument to measure consistently.
- Failure of tester to follow standardized procedures.

X = T + E

Observed score = True score + Error

2X = 2t + 2e

Observed score variance = True score variance + Error variance

Reliability = 2t ÷ 2X

Reliability = (2X - 2e) ÷ 2X

Reliability depends on:

- Decreasing measurement error
- Detecting individual differences among people
- ability to discriminate among different ability levels

Reliability

- Ranges from 0 to 1.00
- When R = 0, there is no reliability.
- When R = 2, there is maximum reliability.

- ANOVA is used to partition the variance of a set of scores.
- Parts of the variance are used to calculate the intraclass R.

Estimating Reliability

- Intraclass correlation from one-way ANOVA:
- R = (MSA – MSW) MSA
- MSA = Mean square among subjects (also called between subjects)
- MSw = Mean square within subjects
- Mean square = variance estimate

- This represents reliability of the mean test score for each person.

Estimating Reliability

- Intraclass correlation from two-way ANOVA:
- R = (MSA – MSR) MSA
- MSA = Mean square among subjects (also called between subjects)
- MSR = Mean square residual
- Mean square = variance estimate

- Used when trial to trial variance is not considered measurement error (e.g., Likert type scale).

What is acceptable reliability?

- Depends on:
- age
- gender
- experience of people tested
- size of reliability coefficients others have obtained
- number of days or trials
- stability vs. internal consistency coefficient

What is acceptable reliability?

- Most physical measures are stable from day- to-day.
- Expect test-retest Rxx between .80 and .95.

- Expect lower Rxx for tests with an accuracy component (e.g., .70).
- For written test, want RXX > .70.
- For psychological instruments, want RXX > .70.
- Critical issue: time interval between 2 test sessions for stability reliability estimates. 1 to 3 days apart for physical measures is usually appropriate.

- Type of test.
- Maximum effort test expect Rxx .80
- Accuracy type test expect Rxx .70
- Psychological inventories expect Rxx .70

- Range of ability.
- Rxx higher for heterogeneous groups than for homogeneous groups.

- Test length.
- Longer test, higher Rxx

- Scoring accuracy.
- Person administering test must be competent.

- Test difficulty.
- Test must discriminate among ability levels.

- Test environment, organization, and instructions.
- favorable to good performance, motivated to do well, ready to be tested, know what to expect.

- Fatigue
- decreases Rxx

- Practice trials
- increase Rxx

- AKA Cronbach’s alpha
- Most widely used with attitude instruments
- Same as two-way intraclass R through ANOVA
- An estimate of Rxx of a criterion score that is the sum of trial scores in one day

Ralpha = [K / (K-1)] x [(S2x - S2trials) / S2x]

• K = # of trials or items

• S2x = variance for criterion score (sum of all trials)

• S2trials = sum of variances for all trials

- Estimate of internal consistency reliability by determining how all items on a test relate to the total test.
- KR formulas 20 and 21 are typically used to estimate Rxx of knowledge tests.
- Used with dichotomous items (scored as right or wrong).
- KR20 = coefficient alpha

KR20

- KR20 = [K / (K-1)] x [(S2x - pq) / S2x]
• K = # of trials or items

• S2x = variance of scores

• p = percentage answering item right

• q = percentage answering item wrong

• pq = sum of pq products for all k items

KR20 Example

Item p q

1 .50 .50

2 .25 .75

3 .80 .20

4 .90 .10

If Mean = 2.45 and

SD = 1.2, what is KR20?

pq

.25

.1875

.16

.09

pq = 0.6875

KR20 = (4/3) x (1.44 – 0.6875)/1.44

KR20 = .70

KR21

- If assume all test items are equally difficult, KR20 can be simplified to KR21
KR21 =[(K x S2)-(Mean x (K - Mean)] ÷ [(K-1) x S2]

• K = # of trials or items

• S2 = variance of test

• Mean = mean of test

Equivalence Reliability (Parallel Forms)

- Two equivalent forms of a test are administered to same subjects.
- Scores on the two forms are then correlated.

Spearman-Brown Prophecy formula

- Used to estimate rxx of a test that is changed in length.
- rkk = (k x r11) ÷ [1 + (k - 1)(r11)]
- k = number of times test is changed in length.
- k = (# trials want) ÷ (# trials have)
- r11 = reliability of test you’re starting with
- Spearman-Brown formula will give an estimate of maximum reliability that can be expected (upper bound estimate).

Standard Error of Measurement (SEM)

- Degree you expect test score to vary due to measurement error.
- Standard deviation of a test score.
- SEM = Sx1 - Rxx
• Sx = standard deviation of group

• Rxx = reliability coefficient

- Small SEM indicates high reliability

SEM

- example: written test: Sx = 5 Rxx = .88
- SEM = 5 1 - .88 = 1.73
- Confidence Interval:
68% X ± 1.00 (SEM)

95% X ± 1.96 (SEM)

- If X =23 23 + 1.73 = 24.73
23 - 1.73 = 21.27

- 68% confident true score is between 21.27 and 24.73

Objectivity (Rater Reliability)

- Degree of agreement between raters.
- Depends on:
- clarity of scoring system.
- degree to which judge can assign scores accurately.

- If test is highly objective, objectivity is obvious and rarely calculated.
- As subjectivity increases, test developer should report estimate of objectivity.

Two Types of Objectivity:

- Intrajudge objectivity
- consistency in scoring when test user scores same test two or more times.

- Interjudge objectivity
- consistency between two or more independent judgments of same performance.

- Calculate objectivity like reliability, but substitute judges scores for trials.

Criterion-referenced Test

- A test used to classify a person as proficient or nonproficient (pass or fail).

In Criterion-referenced Framework:

- Reliability - defined as consistency of classification.

Reliability of Criterion-referenced Test Scores

- To estimate reliability, a double-classification or contingency table is formed.

- Most popular way to estimate Rxx of CRT.
- Pa = (A + D) ÷ (A + B + C + D)
- Pa does not take into account that some consistent classifications could happen by chance.

Pass

Fail

Pass

45

12

Day 1

Fail

8

35

Pa = (A + D) ÷ (A + B + C + D)

Pa = (45 + 35) ÷ (45 + 12 + 8 + 35)

Pa = 80 ÷ 100 = .80

- Estimate of CRT Rxx with correction for chance agreements.
K = (Pa - Pc) ÷ (1 - Pc)

• Pa = Proportion of Agreement

• Pc = Proportion of Agreement expected by chance

Pc = [(A+B)(A+C)+(C+D)(B+D)]÷(A+B+C+D)2

Pass

Fail

Pass

45

12

Day 1

Fail

8

35

Pc = [(A+B)(A+C)+(C+D)(B+D)]÷(A+B+C+D)2

Pc = [(45+12)(45+8)+(8+35)(12+35)]÷(100)2

Pc = [(57)(53)+(43)(47)]÷(10,000) = 5,042÷10,000

Pc = .5042

- K = (Pa - Pc) ÷ (1 - Pc)
- K = (.80 - .5042) ÷ (1 - .5042)
- K = .597

- Kq may be more appropriate than K when proportion of people passing a criterion-referenced test is not predetermined.
- Most situations in exercise science do not predetermine the number of people who will pass.

- Kq = (Pa – 1/q) ÷ (1 – 1/q)
- q = number of classification categories
- If pass-fail, q = 2

- Kq = (.80 - .50) ÷ (1 - .50)
- Kq = .60

- Interpreted same as K.
- When proportion of masters = .50, Kq = K.
- Otherwise, Kq > K.

Interpretation of Rxx for CRT

- Pa (Proportion of Agreement)
• Affected by chance classifications

• Pa < .50 are unacceptable

• Pa should be > .80 in most situations.

- K and Kq (Kappa and Modified Kappa)
• Interpretable range: 0.0 to 1.0

- Minimum acceptable value = .60

- Report both indices of Rxx.

Formative Evaluation of Chapter Objectives

- Define and differentiate between reliability and objectivity for norm-referenced tests.
- Identify factors that influence reliability and objectivity of norm-referenced test scores.
- Identify factors that influence reliability of criterion-referenced test scores.
- Select a reliable criterion score based on measurement theory.

Chapter 3Reliability and Objectivity

Download Presentation

Connecting to Server..