1 / 23

Measuring Agreement

Measuring Agreement. Introduction. Different types of agreement Diagnosis by different methods Do both methods give the same results? Disease absent or Disease present Staging of carcinomas Will different methods lead to the same results? Will different raters lead to the same results?

long
Download Presentation

Measuring Agreement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measuring Agreement

  2. Introduction • Different types of agreement • Diagnosis by different methods • Do both methods give the same results? • Disease absent or Disease present • Staging of carcinomas • Will different methods lead to the same results? • Will different raters lead to the same results? • Measurements of blood pressure • How consistent are measurements made • Using different devices? • With different observers? • At different times?

  3. Investigating agreement • Need to consider • Data type • Categorical or continuous • How are the data repeated? • Measuring instrument (s), rater(s), time(s) • The goal • Are ratings consistent? • Estimate the magnitude of differences between measurements • Investigate factors that affect ratings • Number of raters

  4. Data type • Categorical • Binary • Disease absent, disease present • Nominal • Hepatitis • Viral A, B, C, D, E or autoimmune • Ordinal • Severity of disease • Mild, moderate, severe • Continuous • Size of tumour • Blood pressure

  5. How are data repeated? • Same person, same measuring instrument • Different observers • Inter-rater reliability • Same observer at different times • Intra-rater reliability • Repeatability • Internal consistency • Do the items of a test measure the same attribute?

  6. Measures of agreement • Categorical • Kappa • Weighted • Fleiss’ • Continuous • Limits of agreement • Coefficient of variation (CV) • Intraclass Correlation (ICC) • Cronbach’s  • Internal consistency

  7. Number of raters • Two • Three or more

  8. Categorical data: two raters • Kappa • Magnitude quoted • ≥0.75 Excellent, 0.40 to 0.75 Fair to good, < 0.40 as Poor • 0 to 0.20 Slight, >0.20 to 0.40 Fair, >0.40 to 0.60 Moderate, >0.60 to 0.80 Substantial, >0.80 Almost perfect • Degree of disagreement can be included • Weighted kappa • Values close together do not count to disagreement as much as those further apart • Linear / quadratic weightings

  9. Categorical data: > two raters • Different tests for • Binomial data • Data with more than two categories • Online calculators • http://www.vassarstats.net/kappa.html

  10. Example 1 • Two raters • Scores 1 to 5 • Unweighted kappa 0.79, 95% CI (0.62 to 0.96) • Linear weighting 0.84, 95% CI (0.70 to 0.98) • Quadratic weighting 0.90, 95% CI (0.77 to 1.00)

  11. Example 2 • Binomial data • Two raters • Two ratings each • Inter-rater agreement • Intra-rater agreement

  12. Example 2 ctd. • Inter-rater agreement • Kappa1,2= 0.865 (P<0.001) • Kappa1,3= 0.054 (P=0.765) • Kappa2,3= -0.071 (P=0.696) • Intra-rater agreement • Kappa1= 0.800 (P<0.001) • Kappa2= 0.790 (P<0.001) • Kappa3= 0.000 (P=1.000)

  13. Continuous data • Test for bias • Check differences not related to magnitude • Calculate mean and SD of differences • Limits of agreement • Coefficient of variation • ICC

  14. Test for bias • Student’s paired t (mean) • Wilcoxon matched pairs (median) • If there is bias, agreement cannot be investigated further

  15. Example 3: Test for bias • Paired t test • P=0.362 • No bias

  16. Check differences unrelated to magnitude • Clearly no relationship

  17. this is s this is mean Calculate Mean and SD differences

  18. Limits of agreement • Lower limit of agreement (LLA) = mean - 1.96×s = -37.6 • Upper limit of agreement (ULA) = mean + 1.96×s = 47.5 • 95% of differences between a pair of measurements for an individual lie in (-37.6, 47.5)

  19. Coefficient of variation • Measure of variability of differences • Expressed as a proportion of the average measured value • Suitable when error (the differences between pairs) increases with the measured values • Other measures require this not to be the case • 100 × s ÷ mean of the measurements • 100 × 21.72÷ 447.88 • 4.85%

  20. Intraclass Correlation • Continuous data • Two or more sets of measurements • Measure of correlation that adjusts for differences in scale • Several models • Absolute agreement of consistency • Raters chosen randomly or same raters throughout • Single or average measures

  21. Intraclass Correlation • ≥0.75 Excellent • 0.4 to 0.75 Fair to Good • <0.4 Poor

  22. Cronbach’s α • Internal consistency • Total scores • Several components. • α ≥0.8 good • ≥0.7 adequate

  23. Investigating agreement • Data type • Categorical • Chi squared • Continuous • Limits of agreement • Coefficient of variation • Intraclass correlation • How are the data repeated? • Measuring instrument (s), rater(s), time(s) • Number of raters • Two • Straightforward • Three or more • Help!

More Related