1 / 12

Systematization of Crowdsoucing for Data Annotation

Systematization of Crowdsoucing for Data Annotation. Aobo , Feb. 2010. Outline. Overview Related Work Analysis and Classification Recommendation Future work Conclusions Reference. Overview. Contribution Provide a faceted analysis of existing crowdsourcing annotation applications.

lavina
Download Presentation

Systematization of Crowdsoucing for Data Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Systematization of Crowdsoucing for Data Annotation Aobo, Feb. 2010

  2. Outline • Overview • Related Work • Analysis and Classification • Recommendation • Future work • Conclusions • Reference

  3. Overview • Contribution • Provide a faceted analysis of existing crowdsourcing annotation applications. • Discuss recommendations on how practioners can take advantage of crowdsourcing. • Discuss the potential opportunities in this area. • Defination • Crowdsoucing • GWAP • Distributed Human-based Computation • AMT • HIT

  4. Retaled Works • “A Taxonomy of Distributed Human Computation” • Author: J. Quinn and B. Bederson • Year: 2009 Contribution • Divide the DHC applications into seven genres. • Proposed six dimensions to help characterize the different approaches . • Propose some recommandation and future directions

  5. Related work • “A Survey of Human Computation Systems” • Author: Yuen, Chen and King • Year: 2009 Contribution • General survey of various human computation systems separately. • Compare the GWAPs based on the game structure, verification method, and game mechanism • Present the performance aspect issues of GWAPs.

  6. Analysis and Classification • Dimensions

  7. Analysis and Classification • GWAP • High score in : GUI desine, Implementation cost, Annotation speed, • Low score in : Anonotation cost, Difficulty, Participation time, Domain Coverage, Popularization • Medium score in: Annotation accuracy, Data size • NLP tasks: -Word Sense Disambiguation - Coreference Annotation

  8. Analysis and Classification • AMT • High score in : Annotation cost • Low score in : GUI design, Implementation Cost, Number of Participants, Data size • Medium score in: Popularization, Difficulty, Domain coverage, Paticipation time, Popularization Annotation accuracy, • NLP tasks: - Parsing - Part-of-Speech Tagging

  9. Analysis and Classification • Wisdom of Volunteers • High score in : Number of Participants, Data size , Difficulty, Paticipation time • Low score in : GUI design, Fun • Medium score in: Implementation Cost, Annotation accuracy, • NLP tasks: - Paraphrasing - Machine Translation task - Summarization

  10. Recommendation • GWAP • Submit the GWAP games to a popular game website which provides and recommend new games for players • Uniform game developing platform • AMT • Make the task fun • Rank the employers by their contribuation • Award employers who provide original data to be annotated • Donate the whole or part of the benefit to an charity • Wisdom of Volunteers • Rank the users by their contribuation • Push the tasks to the public users

  11. Conclusions • Propose different dimentions of existing crowdsourcing annotation applications. • Discuss recommendations on each crowdsourcing approach • Discuss the potential opportunities in this area

  12. Reference • Benjamin B. Bederson Alexander J. Quinn. 2009. A taxonomy of distributed human computation. • AniketKittur, Ed H. Chi, and BongwonSuh. 2008. Crowdsourcing user studies with mechanical turk. • Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Ng. 2008. Cheap and fast – but is it good? evaluating non-expert annotations for natural language tasks. • A. Sorokin and D. Forsyth. 2008. Utility data annotation with amazon mechanical turk. • Luis von Ahn and Laura Dabbish. 2008a. Designing games with a purpose. Commun. ACM, 51(8):58– 67, August. • Luis von Ahn and Laura Dabbish. 2008b. General techniques for designing games with a purpose. Commun. ACM, 51(8):58–67. • Man-Ching Yuen, Ling-Jyh Chen, and Irwin King. 2009. A survey of human computation systems. Computational Science and Engineering, IEEE International Conference on, 4:723–728.

More Related