1 / 24

PTP 560

PTP 560. Research Methods Week 3. Thomas Ruediger, PT. Reliability. Observed score = True Score ± Error, (X) = (T) ± (E) Consistent Score Performance The true score is free from Error (X) Measurement Error Hypothetically it could be zero= Practically, it is…

Download Presentation

PTP 560

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PTP 560 • Research Methods • Week 3 Thomas Ruediger, PT

  2. Reliability • Observed score = True Score ± Error, (X) = (T) ± (E) • Consistent • Score • Performance • The true score is free from Error (X) • Measurement Error • Hypothetically it could be zero= • Practically, it is… • Systematic=uses scales, stateometer, • Random=done differently for no reason. • Or both

  3. Types of Measurement Error • Systematic • Biased= always there • Consistent= use same instrument • Often more of a validity concern, but affects reliability • Examples? • Random • Unpredictable factors • As likely to be high as low • Examples?

  4. Sources of Measurement Error • Individual • Skill of the person taking the measure • Also called rater or tester error • The instrument: can be limited by using the same. • Lability of the phenomenon (when not from instrument or tester) • An actual change from measurement to measurement, then a real difference is obsereved.

  5. Regression towards the mean • Initial extreme high scores • Subsequent scores will tend toward the mean • Proportional to the amount of error • Extreme low scores • Will also tend toward the mean subsequently • Proportional to the amount of error • “Bell Shaped” • Research repercussion • Group assignments based on scores • Intervention effect may be masked

  6. Reliability Coefficients • True Score Variance/Total Variance • Can range from 0 to 1 • By convention 0.00 to 1.00 • 0.00 = no reliability • 1.00 = perfect reliability • Portney and Watkins Guidelines *TESTABLE • Less than 0.50 = poor reliability • 0.50 to 0.75 = moderate reliability • 0.75 to 1.00 = good reliability • These are NOT standards • Acceptable level should be based on application

  7. Correlation v Agreement • Correlation – degree of association • Is X correlated/associated with Y • Usually not as clinically important for PT • We want to know whether they agree, not just correlated. We want accuracy to be consistent. • We generally want to know agreement • Between tests • Between raters

  8. Correlation v Agreement In this case both are perfect

  9. Correlation v Agreement In this case correlation is still perfect, but there is no agreement

  10. Reliability • Required to have validity • Validity needs to be reliable • But does not have to be valid to be reliable. • Four general approaches • Test-Retest • (Nominal data) Kappa statistic for percent agreement • Good vs. No Good • (Ordinal Data) Spearman rho • (Interval or Ratio Data) Pearson Product-moment • ICC (For Ordinal, Interval, and Ratio Data) • Association and agreement reflected • The current preferred index

  11. Reliability • Rater reliability • ICC should be used • Alternate forms • Limits of Agreement • Internal Consistency (Homogeneity) • Usually Cronbach’s alpha

  12. Reliability • Generalizability • Reliability is not “owned “ by the instrument • May not apply to: • Another population • Another rater (or group of raters) • Different time interval • Minimum Detectable Difference • Or minimum detectable change • How much change is needed to say it’s not chance • Not the same as MCID

  13. Minimum detectable difference (MDD)? • Smallest difference that reflects true difference • Better the reliability, smaller the MDD • Different than statistical difference • (1.96*SEM*√2) 1.96 = 95% CI • Ask yourself: Difference b/w 1 and 2? • Is it statistically different? • Is it clinically different? (Next slide) Eliasziw M, Young SL, Woodbury MG, Fryday-Field K. Statistical methodology for the concurrent assessment of interrater and intrarater reliability: using goniometric measurements as an example. Phys Ther. Aug 1994;74(8):777-788. Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res. Feb 2005;19(1):231-240.

  14. Minimum Clinically Important Difference (MCID)? • Smallest difference considered clinically non-trivial • Smallest that patient perceives as beneficial • Usually associated with either: • Expert judgment of clinician • External Health Status Measure

  15. Validity • Measurement measures what is intended • We use them to draw inferences in clinical use • Due to indirect nature of measuring • To apply our result to a diagnostic challenge • Ex: Why do we do a manual muscle test? • Validity • Is not something an instrument has • Is specific to the intended use • Not required for Reliability • (i.e. Just because it is reliable does not mean it is valid)

  16. Validity • Multiple types • Face Validity (LEAST rigorous, looks like it should make sense) • Content (tests content, GRE content is a good predictor of passing leisure exam) • Criterion-referenced (To a GOLD or a Reference standard) • Concurrent validity • Predictive validity • Construct (Figure 6.2 in P &W helpful here) • Part content • Part theoretical • Multiple ways to assess (I won’t test these!)

  17. Validity of Change • Change is often how we make clinical decisions • Evaluate treatment effect • Consider different options • Validity affected by four issues • Level of measurement (Ordinal has highest risk) • Reliability • There will likely be a change due to chance • There may be a true change (One suggestion (reliability > 0.50 to use change scores)) • Stability of variable • Baseline scores • Floor effect • Ceiling effect

  18. Truth 1-Sn = - LR + Sp Sp = d/b+d + a b PPV = a/a+b Test NPV = d/c+d - c d Sn Sn = a/a+c + LR = 1-Sp

  19. Truth + Sp Sp = d/b+d 99 + b Test - d 1 Sn = a/a+c Sn = ? In this example we picked 100 people with a known disorder, applied our clinical test and got these results.

  20. Truth + Sp= ? Sp = d/b+d 20 a + Test - 80 c Sn = a/a+c Sn In this example we picked 100 people known to not have the disorder, applied our clinical test and got these results.

  21. Now a patient comes in • The history suggests to you that she has the disorder • You do the clinical test • The result of the test is negative • Which is more useful? • SpPin? or • SnNout?

  22. Another patient comes in • The history suggests to you that she does not have the disorder • She is very concerned that she has it • You do the clinical test • The result of the test is positive • Which is more useful: • SpPin or • SnNout

  23. Truth = - LR = - LR + + 99 20 Test - 1 80 + LR = + LR =

  24. Likelihood Ratios • Allows us to quantify the likelihood of a condition (present or absent) • Importance ↑ as they move away from 1 • 1 does not change our confidence • Which number is further away from 1? • (look at the nomogram) • - LR is further away from 1 (this is a logarithmic scale)

More Related