Measuring Agreement

Measuring Agreement

Introduction • Different types of agreement • Diagnosis by different methods • Do both methods give the same results? • Disease absent or Disease present • Staging of carcinomas • Will different methods lead to the same results? • Will different raters lead to the same results? • Measurements of blood pressure • How consistent are measurements made • Using different devices? • With different observers? • At different times?

Investigating agreement • Need to consider • Data type • Categorical or continuous • How are the data repeated? • Measuring instrument (s), rater(s), time(s) • The goal • Are ratings consistent? • Estimate the magnitude of differences between measurements • Investigate factors that affect ratings • Number of raters

Data type • Categorical • Binary • Disease absent, disease present • Nominal • Hepatitis • Viral A, B, C, D, E or autoimmune • Ordinal • Severity of disease • Mild, moderate, severe • Continuous • Size of tumour • Blood pressure

How are data repeated? • Same person, same measuring instrument • Different observers • Inter-rater reliability • Same observer at different times • Intra-rater reliability • Repeatability • Internal consistency • Do the items of a test measure the same attribute?

Measures of agreement • Categorical • Kappa • Weighted • Fleiss’ • Continuous • Limits of agreement • Coefficient of variation (CV) • Intraclass Correlation (ICC) • Cronbach’s  • Internal consistency

Number of raters • Two • Three or more

Categorical data: two raters • Kappa • Magnitude quoted • ≥0.75 Excellent, 0.40 to 0.75 Fair to good, < 0.40 as Poor • 0 to 0.20 Slight, >0.20 to 0.40 Fair, >0.40 to 0.60 Moderate, >0.60 to 0.80 Substantial, >0.80 Almost perfect • Degree of disagreement can be included • Weighted kappa • Values close together do not count to disagreement as much as those further apart • Linear / quadratic weightings

Categorical data: > two raters • Different tests for • Binomial data • Data with more than two categories • Online calculators • http://www.vassarstats.net/kappa.html

Example 1 • Two raters • Scores 1 to 5 • Unweighted kappa 0.79, 95% CI (0.62 to 0.96) • Linear weighting 0.84, 95% CI (0.70 to 0.98) • Quadratic weighting 0.90, 95% CI (0.77 to 1.00)

Example 2 • Binomial data • Two raters • Two ratings each • Inter-rater agreement • Intra-rater agreement

Example 2 ctd. • Inter-rater agreement • Kappa1,2= 0.865 (P<0.001) • Kappa1,3= 0.054 (P=0.765) • Kappa2,3= -0.071 (P=0.696) • Intra-rater agreement • Kappa1= 0.800 (P<0.001) • Kappa2= 0.790 (P<0.001) • Kappa3= 0.000 (P=1.000)

Continuous data • Test for bias • Check differences not related to magnitude • Calculate mean and SD of differences • Limits of agreement • Coefficient of variation • ICC

Test for bias • Student’s paired t (mean) • Wilcoxon matched pairs (median) • If there is bias, agreement cannot be investigated further

Example 3: Test for bias • Paired t test • P=0.362 • No bias

Check differences unrelated to magnitude • Clearly no relationship

this is s this is mean Calculate Mean and SD differences

Limits of agreement • Lower limit of agreement (LLA) = mean - 1.96×s = -37.6 • Upper limit of agreement (ULA) = mean + 1.96×s = 47.5 • 95% of differences between a pair of measurements for an individual lie in (-37.6, 47.5)

Coefficient of variation • Measure of variability of differences • Expressed as a proportion of the average measured value • Suitable when error (the differences between pairs) increases with the measured values • Other measures require this not to be the case • 100 × s ÷ mean of the measurements • 100 × 21.72÷ 447.88 • 4.85%

Intraclass Correlation • Continuous data • Two or more sets of measurements • Measure of correlation that adjusts for differences in scale • Several models • Absolute agreement of consistency • Raters chosen randomly or same raters throughout • Single or average measures

Intraclass Correlation • ≥0.75 Excellent • 0.4 to 0.75 Fair to Good • <0.4 Poor

Cronbach’s α • Internal consistency • Total scores • Several components. • α ≥0.8 good • ≥0.7 adequate

Investigating agreement • Data type • Categorical • Chi squared • Continuous • Limits of agreement • Coefficient of variation • Intraclass correlation • How are the data repeated? • Measuring instrument (s), rater(s), time(s) • Number of raters • Two • Straightforward • Three or more • Help!

Measuring Agreement

Measuring Agreement

Presentation Transcript

Agreement

Agreement

AGREEMENT

Agreement

Agreement

Agreement

Agreement

Agreement!

Agreement: Subject-Verb Agreement Pronoun Antecedent Agreement

Agreement

Agreement

Agreement

Agreement

Agreement

Agreement

Agreement

Arbitration Agreement . Drafting Arbitration Agreement

AGREEMENT

Agreement

AGREEMENT

Online rent agreement | House rental agreement | Online lease agreement