Interpreting kappa in observational research baserate matters
Download
1 / 40

Interpreting Kappa in Observational Research: Baserate Matters - PowerPoint PPT Presentation


  • 279 Views
  • Uploaded on

Interpreting Kappa in Observational Research: Baserate Matters. Cornelia Taylor Bruckner Vanderbilt University. Acknowledgements . Paul Yoder Craig Kennedy Niels Waller Andrew Tomarken MRDD training grant KC Quant core. Overview. Agreement is a proxy for accuracy

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Interpreting Kappa in Observational Research: Baserate Matters' - Donna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Interpreting kappa in observational research baserate matters l.jpg

Interpreting Kappa in Observational Research: Baserate Matters

Cornelia Taylor Bruckner

Vanderbilt University


Acknowledgements l.jpg
Acknowledgements Matters

  • Paul Yoder

  • Craig Kennedy

  • Niels Waller

  • Andrew Tomarken

  • MRDD training grant

  • KC Quant core


Overview l.jpg
Overview Matters

  • Agreement is a proxy for accuracy

  • Agreement statistics 101

    • Chance agreement

    • Agreement matrix

    • Baserate

  • Kappa and baserate, a paradox

  • Estimating accuracy from kappa

  • Applied example


Framing as observational coding l.jpg
Framing as observational coding Matters

  • I will be framing the talk today within observational measurement but the concepts apply to many other situations e.g.,

    • Agreement between clinicians on diagnosis

    • Agreement between reporters on child symptoms (e.g. mothers and fathers)


Rater accuracy a fictitious session l.jpg
“Rater accuracy”: A fictitious session Matters

  • Madeline Scientist writes a script for an interval coded observation session where the

    • Presence or absence of target behavior in interval

  • Two coders (Eager Beaver and Slack Jack), blind to the script, are asked to code the session.

  • Accuracy of each coder with the script is calculated




Who has the best accuracy l.jpg
Who has the best accuracy? Matters

  • Eager Beaver of course.

  • Slack Jack was not very accurate

  • Notice that accuracy is about agreement with the occurrence and nonoccurrence of behavior.


We don t always know the truth l.jpg
We don’t always know the truth Matters

  • It is great when we know the true occurrence and nonoccurrence of behaviors

  • But, in the real world we deal with agreement between fallible observers


Agreement between raters l.jpg
Agreement between raters Matters

  • Point by point interobserver agreement is achieved when independent observers :

    • see the same thing (behavior, event)

    • at the same time


Difference between agreement and accuracy l.jpg
Difference between agreement and accuracy Matters

  • Agreement can be directly measured.

  • Accuracy can not be directly measured.

    • We don’t know the “truth” of a session.

  • However, agreement is used as a proxy for accuracy

  • Accuracy can be estimated from agreement

    • The method for this estimation is the focus of today’s talk


Percent agreement l.jpg
Percent agreement Matters

  • Percent agreement

    • The proportion of intervals that were agreed upon

    • Agreements/agreements+disagreements

    • Takes into account occurrence and nonoccurrence agreement

    • Varies from 0-100%


Occurrence and nonoccurrence agreement l.jpg
Occurrence and Nonoccurrence agreement Matters

  • Occurrence agreement

    • The proportion of intervals that either coder recorded the behavior that were agreed upon

    • Positive agreement

  • Non-occurrence agreement

    • The proportion of intervals that either coder recorded a nonoccurrence that were agreed upon

    • Negative agreement


Problem with agreement statistics l.jpg
Problem with agreement statistics Matters

  • We assume that agreement is due to accuracy

  • Agreement statistics do not control for chance agreement

  • So agreement could be due only to chance


Chance agreement and point by point agreement l.jpg
Chance agreement and point by point agreement Matters

Nonoccurrence agreement

Occurrence agreement



Using a 2x2 table to check agreement on individual codes l.jpg
Using a 2x2 table to check agreement on individual codes Matters

  • When IOA is computed on the total code set it is an omnibus measure of agreement

  • This does not inform us on agreement on any one code.

  • To know agreement on a particular code the confusion matrix needs to be collapsed into a 2x2 matrix.


Baserate in a 2x2 table l.jpg
Baserate in MattersA 2x2 table

Eager Beaver

Slack

Jack

Happy

All other emotions

Happy

60

10

70

All other emotions

7

123

67

200

(67+70)/(2*200)=

.34


Review l.jpg
Review Matters

  • Defined accuracy

  • Described the relationship between chance agreement and IOA

  • Creating a 2x2 table

  • Calculating a best estimate of the base rate


Kappa l.jpg
Kappa Matters

  • Kappa is an agreement statistic that controls for chance agreement

  • Before kappa there was a sense that we should control for chance but we did not know how

  • Cohen’s 1960 paper has been cited over 7000 times


Definition of kappa l.jpg
Definition of Kappa Matters

  • Kappa is the proportionof non-chance agreement observed out of all the non-chance agreement

    K = Po-Pe

    1 - Pe


Definition of terms l.jpg
Definition of Terms Matters

  • Po= The proportion of events for which there is observed agreement.

    • Same metric as percent agreement

  • Pe= The proportion of events for which agreement would be expected by chance alone

    • Defined as the probability of two raters coding the same behavior at the same time by chance


Agreement matrix for eb and sj with chance agreement l.jpg
Agreement matrix for EB and SJ with (chance agreement) Matters

Po = .36+.18; Pe = .33 + .15; k = (.54-.48)/(1-.48)=.12


What determines the value of kappa l.jpg
What determines the value of kappa Matters

  • Accuracy and base rate

  • Increasing accuracy increases observed agreement therefore: kappa is a consistent estimator of accuracy if base rate is held constant

  • If accuracy is held constant, kappa will decrease as the estimated true base rate deviates from .5



Obtained kappa across baserate for 80 and 99 accuracy l.jpg
Obtained kappa, across baserate, for 80% and 99% accuracy Matters

Accuracy = 99%

Accuracy = 80%


Obtained kappa across baserate from 80 to 99 accuracy l.jpg
Obtained kappa, across baserate, from 80% to 99% accuracy Matters

Accuracy=99%

Accuracy=95%

Accuracy=90%

Accuracy=85%

Accuracy=80%


Bottom line l.jpg
Bottom line Matters

  • When we observe behaviors that are High or Low baserate our kappa’s will be low.

  • This is important for researchers studying low baserate behaviors

    • Many of the behaviors we observe in young children with developmental disabilities are very low baserate


Criterion values for ioa l.jpg
Criterion values for IOA Matters

  • Cohen never suggested using criterion values for kappa

  • Many professional organizations recommend criterions for IOA

  • e.g., The Council for Exceptional Children: Division for Research Recommendations 2005

    • “ Data are collected on the reliability or inter-observer agreement (IOA) associated with each dependent variable, and IOA levels meet minimal standards (e.g., IOA = 80%; Kappa = .60)”


Criterion accuracy l.jpg
Criterion accuracy? Matters

  • Setting a criterion for kappa independent of baserate is not useful

  • If we can estimate accuracy

    • And I am suggesting that we can

  • We need to consider what sufficient accuracy would be


Criterion accuracy cont l.jpg
Criterion accuracy cont. Matters

  • If we consider 80% agreement sufficient than

    • Would we consider 80% accuracy sufficient?

  • If we used 80% accuracy as a criterion

    • Acceptable kappa could be as low as .19 depending on baserate


Why it is really important not to use criterion kappas l.jpg
Why it is really important not to use criterion kappas Matters

  • There is a belief that the quality of data will be higher if kappa is higher.

  • This is only true if there is no associated loss of content or construct validity.

  • The processes of collapsing and redefining codes often result in a loss of validity.


Applied example l.jpg
Applied example Matters

  • See handout for formulas and data


Slide36 l.jpg

Use the table on the first page of your handout to determine

the accuracy of raters from baserate and kappa


Slide37 l.jpg

.32

.85


Recommendations l.jpg
Recommendations

  • Calculate agreement for each code using a 2x2 table

  • Use the table to determine the accuracy of observers from baserate and obtained kappa

  • Report kappa and accuracy


Software to calculate kappa l.jpg
Software to calculate kappa

  • Comkappa, Developed by Bakeman to calculate kappa, SE of kappa, kappa max, and weighted kappa.

  • MOOSES, Developed by Jon Tapp. Calculates kappa on the total code set and individual codes. Can be used with live coding, video coding, and transcription.

  • SPSS


Challenge l.jpg
Challenge

  • The challenge is to change the standards of observational research that demand kappa's above a criteria of .6

    • Editors

    • PI’s

    • Collaborators


ad