PTP 560 • Research Methods • Week 3 Thomas Ruediger, PT
Reliability • Observed score = True Score ± Error, (X) = (T) ± (E) • Consistent • Score • Performance • The true score is free from Error (X) • Measurement Error • Hypothetically it could be zero= • Practically, it is… • Systematic=uses scales, stateometer, • Random=done differently for no reason. • Or both
Types of Measurement Error • Systematic • Biased= always there • Consistent= use same instrument • Often more of a validity concern, but affects reliability • Examples? • Random • Unpredictable factors • As likely to be high as low • Examples?
Sources of Measurement Error • Individual • Skill of the person taking the measure • Also called rater or tester error • The instrument: can be limited by using the same. • Lability of the phenomenon (when not from instrument or tester) • An actual change from measurement to measurement, then a real difference is obsereved.
Regression towards the mean • Initial extreme high scores • Subsequent scores will tend toward the mean • Proportional to the amount of error • Extreme low scores • Will also tend toward the mean subsequently • Proportional to the amount of error • “Bell Shaped” • Research repercussion • Group assignments based on scores • Intervention effect may be masked
Reliability Coefficients • True Score Variance/Total Variance • Can range from 0 to 1 • By convention 0.00 to 1.00 • 0.00 = no reliability • 1.00 = perfect reliability • Portney and Watkins Guidelines *TESTABLE • Less than 0.50 = poor reliability • 0.50 to 0.75 = moderate reliability • 0.75 to 1.00 = good reliability • These are NOT standards • Acceptable level should be based on application
Correlation v Agreement • Correlation – degree of association • Is X correlated/associated with Y • Usually not as clinically important for PT • We want to know whether they agree, not just correlated. We want accuracy to be consistent. • We generally want to know agreement • Between tests • Between raters
Correlation v Agreement In this case both are perfect
Correlation v Agreement In this case correlation is still perfect, but there is no agreement
Reliability • Required to have validity • Validity needs to be reliable • But does not have to be valid to be reliable. • Four general approaches • Test-Retest • (Nominal data) Kappa statistic for percent agreement • Good vs. No Good • (Ordinal Data) Spearman rho • (Interval or Ratio Data) Pearson Product-moment • ICC (For Ordinal, Interval, and Ratio Data) • Association and agreement reflected • The current preferred index
Reliability • Rater reliability • ICC should be used • Alternate forms • Limits of Agreement • Internal Consistency (Homogeneity) • Usually Cronbach’s alpha
Reliability • Generalizability • Reliability is not “owned “ by the instrument • May not apply to: • Another population • Another rater (or group of raters) • Different time interval • Minimum Detectable Difference • Or minimum detectable change • How much change is needed to say it’s not chance • Not the same as MCID
Minimum detectable difference (MDD)? • Smallest difference that reflects true difference • Better the reliability, smaller the MDD • Different than statistical difference • (1.96*SEM*√2) 1.96 = 95% CI • Ask yourself: Difference b/w 1 and 2? • Is it statistically different? • Is it clinically different? (Next slide) Eliasziw M, Young SL, Woodbury MG, Fryday-Field K. Statistical methodology for the concurrent assessment of interrater and intrarater reliability: using goniometric measurements as an example. Phys Ther. Aug 1994;74(8):777-788. Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res. Feb 2005;19(1):231-240.
Minimum Clinically Important Difference (MCID)? • Smallest difference considered clinically non-trivial • Smallest that patient perceives as beneficial • Usually associated with either: • Expert judgment of clinician • External Health Status Measure
Validity • Measurement measures what is intended • We use them to draw inferences in clinical use • Due to indirect nature of measuring • To apply our result to a diagnostic challenge • Ex: Why do we do a manual muscle test? • Validity • Is not something an instrument has • Is specific to the intended use • Not required for Reliability • (i.e. Just because it is reliable does not mean it is valid)
Validity • Multiple types • Face Validity (LEAST rigorous, looks like it should make sense) • Content (tests content, GRE content is a good predictor of passing leisure exam) • Criterion-referenced (To a GOLD or a Reference standard) • Concurrent validity • Predictive validity • Construct (Figure 6.2 in P &W helpful here) • Part content • Part theoretical • Multiple ways to assess (I won’t test these!)
Validity of Change • Change is often how we make clinical decisions • Evaluate treatment effect • Consider different options • Validity affected by four issues • Level of measurement (Ordinal has highest risk) • Reliability • There will likely be a change due to chance • There may be a true change (One suggestion (reliability > 0.50 to use change scores)) • Stability of variable • Baseline scores • Floor effect • Ceiling effect
Truth 1-Sn = - LR + Sp Sp = d/b+d + a b PPV = a/a+b Test NPV = d/c+d - c d Sn Sn = a/a+c + LR = 1-Sp
Truth + Sp Sp = d/b+d 99 + b Test - d 1 Sn = a/a+c Sn = ? In this example we picked 100 people with a known disorder, applied our clinical test and got these results.
Truth + Sp= ? Sp = d/b+d 20 a + Test - 80 c Sn = a/a+c Sn In this example we picked 100 people known to not have the disorder, applied our clinical test and got these results.
Now a patient comes in • The history suggests to you that she has the disorder • You do the clinical test • The result of the test is negative • Which is more useful? • SpPin? or • SnNout?
Another patient comes in • The history suggests to you that she does not have the disorder • She is very concerned that she has it • You do the clinical test • The result of the test is positive • Which is more useful: • SpPin or • SnNout
Truth = - LR = - LR + + 99 20 Test - 1 80 + LR = + LR =
Likelihood Ratios • Allows us to quantify the likelihood of a condition (present or absent) • Importance ↑ as they move away from 1 • 1 does not change our confidence • Which number is further away from 1? • (look at the nomogram) • - LR is further away from 1 (this is a logarithmic scale)