Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. EMNLP 2008 Rion Snow CS Stanford Brendan O’Connor Dolores Labs Jurafsky Linguistics Stanford Andrew Y. Ng CS Stanford. Agenda. Introduction Task Design: Amazon Mechanical Turk (AMT)
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Rion Snow CS Stanford
Brendan O’Connor Dolores Labs
Jurafsky Linguistics Stanford
Andrew Y. Ng CS Stanford
5 Experts (E) and 10 Non-Experts (NE)
Overall, 4 non-expert annotations per example to achieve the equivalent correlation as a single expert annotator.
3500 non-expert annotations / USD =>
875 expert-equivalent annotations / USD
Time is given as the total amount of time in hours elapsed from submitting the requester to AMT until the last assignment is submitted by the last worker.
Evaluated with 20-fold cross-validation.
Why is a single set of non-expert annotations better than a single expert annotation?