1 / 58

Data Annotation using Human Computation

Data Annotation using Human Computation. HOANG Cong Duy Vu 07/10/2009. Outline. Introduction Data Annotation with GWAP Data Annotation with AMT Characterization of Multiple Dimensions Correlation Analysis Future Directions Conclusion. Introduction.

omana
Download Presentation

Data Annotation using Human Computation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Annotation using Human Computation HOANG Cong Duy Vu 07/10/2009

  2. Outline • Introduction • Data Annotation with GWAP • Data Annotation with AMT • Characterization of Multiple Dimensions • Correlation Analysis • Future Directions • Conclusion

  3. Introduction • Data annotation refers to the task of adding specific information to raw data • In computational linguistics, various information such as morphology, POS, syntax, semantics, discourse … • In computer vision, some information such as image labels, regions, video descriptions …

  4. Introduction • Annotated data is extremely important for computational & learning problems and training AI algorithms. • But, also very non-trivial tasks to obtain due to • Ambiguity in processing information • Money/time consuming, labor-intensive and error-prone process

  5. Introduction • Motivated facts: • Gaming data [1] • Each day, more than 200 million hours spent playing games in the U.S • by age 21 of American, the average number of more than 10,000 hours playing games, equivalent to five years of working a full-time job 40 hours per week. • With explosion of web services, consider to take advantage of community popularity How can we leverage this for annotation?

  6. Introduction • Human computation emerges as a viable synergy for data annotation. • Its main idea is to harness what humans are good at but machines are poor at. • Use the ability and speed of community solving some particular tasks • Computer programs can simultaneously be used for other purposes (e.g. educational entertainment)

  7. Introduction • What is human computation? • is a CS technique in which a computational process performs its function by outsourcing certain steps to humans. (from Wikipedia) • Also called as human-based computation, human computing • More general term: “Crowdsourcing” • Typical frameworks: • Game With APurpose (GWAP) • Amazon Mechanical Turk (AMT)

  8. Data Annotation with GWAP • GWAP - Game With APurpose • Pioneered by Luis von Ahn at CMU in his PhD thesis in 2005 • GWAPs are online games with special mechanism • Humans enjoy playing games provided by computers • Humans help computers do implicit annotation tasks integrated in such games

  9. Data Annotation with GWAP • How GWAP works? Developers: build everything for both server and clients Bots: sometimes developers create bots which play a role as real players since number of players in GWAP are limited at any certain time Players: people who play game, pairwise interaction GUI: Graphical User Interface Data sources: data need to be annotated

  10. Data Annotation with GWAP • Input-output mechanism of GWAP [1]

  11. Data Annotation with GWAP • Example 1: Image labeling Computer Vision Game Captured from http://www.gwap.com

  12. Data Annotation with GWAP • Example 2: Video description tagging Computer Vision Game Captured from http://www.gwap.com

  13. Data Annotation with GWAP • Example 3: Online word games Natural Language Processing Game Captured from http://wordgame.stanford.edu/freeAssociation.html

  14. Data Annotation with GWAP • Recently, there are various GWAP games developed in wide range of AI domain • Computer Vision: ESPGame, Peekaboom, TagATune, Google Image Labeler, … • Semantic web: OntoGame, Verbosity • Natural Language Processing: Word Online Games (Categorilla, Categodzilla, and Free Association), Phrase Detectives

  15. Data Annotation with GWAP • Results obtained so far: • ESP Game Dataset [1] (CMU): 100,000 images with English labels from ESPGame (1.8Gb) • Online word games [2] (Stanford): 800,000 data instances for semantic processing • TagATune music data1 (CMU): 25863 clips, 5405 source mp3 with 188 unique tags 1 from http://musicmachinery.com/2009/04/01/magnatagatune-a-new-research-data-set-for-mir/

  16. Data Annotation with GWAP • Advantages • Free • People always love playing games • Fun, attractive and sometimes addictive • Disadvantages • Highly visual design for game requires much efforts • Integration of annotation tasks into games is hard, equivalently to thinking up algorithms • Very hard to design GWAP games for complex processing tasks

  17. Data Annotation with GWAP • Players feel fun and enjoy the games • More players play the game, more annotated data people obtain. • Question: if games not fun, whether we can still attract much people to join? • Viable answer: Amazon Mechanical Turk (AMT) ???

  18. Data Annotation with AMT • AMT – Amazon Mechanical Turk • one of the tools of Amazon Web Services • a wide-range marketplace for work • utilize human intelligence to generate tasks which computers are unable to do but humans can do effectively • Located at https://www.mturk.com/mturk/

  19. Data Annotation with AMT Captured from https://www.mturk.com/mturk/

  20. Data Annotation with AMT • How AMT works? Requesters: will define tasks using GUI interactive interface using APIs provided by AMT, known as HITs (Human Intelligence Tasks) HIT:each HIT allows requesters to generate task instructions, required qualifications, duration, or reward by money. Broker: web services playing an intermediate role to supply, assist and unveil everything Workers:people who want to solve HIT tasks to earn money

  21. Data Annotation with AMT • An example about HIT:

  22. Data Annotation with AMT • Statistics related to AMT: • The AMT service was initially launched publicly in 2005 • According to report1 in March 2007, there were more than 100,000 workers in over one hundred countries Why it can attract a lot of participants? 1 from http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk

  23. Data Annotation with AMT • It seems that AMT has wider range due to its ease and simplicity • Results obtained so far: • Statistics from Amazon website • 69,452 HITs currently available • Some of them make annotated data public • Sorokin [3] (UIUC): 25000 annotated images with costs 800$ • Snow [4] (Stanford): linguistics annotation (WSD, temporal ordering, word similarity, …)

  24. Data Annotation with AMT • Advantages • For users • HITs are not so hard to solve • Easily earn money but still remain for relaxing purpose • For developers • APIs provided by AMT help build HITs easily • Diverse demographics of users on Amazon website • hopefully obtain large-scale annotated data very quickly over time

  25. Data Annotation with AMT • Disadvantages • Hard to control and maintain the tradeoff between data quantity and quality • Need effective strategies

  26. Data Annotation with AMT • Example 1: Word Similarity NLP task Captured from http://nlpannotations.googlepages.com/wordsim_sample.html

  27. Data Annotation with AMT • Example 2: Image Labeling CV task Captured from http://visionpc.cs.uiuc.edu/~largescale/protocols/4/index.html

  28. Characterization of Multiple Dimensions • Overview of the dimensions considered: • - to create interfaces interacting with people that participate in annotation process • - should be designed to ensure the objective of obtaining large, clean and useful data • - discuss about the quality of annotation or accuracy of annotated outputs • - Need to figure out where and which data need to be annotated • - Some factors relating to participants in annotation process

  29. Characterization of Multiple Dimensions • Setup Effort • UI design/Visual impact • graphical characteristics in user interface design • substantial factor that determines the efficiency of annotation process • GWAPs need much effort to focus mainly on GUI to make the game entertaining enough to motivate players • But, AMT needs not much effort to build HIT tasks but it should be designed funnily & easily & attractively • Scale - none/low/basic/average/distinctive/excellent

  30. Characterization of Multiple Dimensions • Fun • very significant factor because simply players will not join if GWAP games have no fun • make fun = design algorithms • Some ways: • Timing (GWAP & AMT) • Scores, top scores, top players (GWAP) • Levels (GWAP & AMT) • Money (AMT) • Scale - none/low/fair/high/very high

  31. Characterization of Multiple Dimensions • Payment • To make annotation process more motivated • For example, • In GWAP, player pairs are raised scores • In AMT, workers have monetary payment or bonus from requesters • Scale - none/score/monetary payment

  32. Characterization of Multiple Dimensions • Cheating • Sometimes, unmotivated and lazy participants use some tricks when doing annotation tasks • Some ways to avoid: • filter players by using IP address, locations, training (GWAP) • Use qualification (AMT) • Scale - none/low/possible/high (-)

  33. Characterization of Multiple Dimensions • Implementation Cost • Various costs • Designing annotation tasks • Creation of timing controllers • Game mechanism (online or offline) • Network infrastructure (client/server, peer-to-peer) • Record and statistics (user scores, player skill, qualification) • Building intelligent bots • Scale - none/low/fair/high/very high

  34. Characterization of Multiple Dimensions • Exposure • Relating to high social impacts, letting people know is very important • GWAP must itself do this by popularizing on social webs, contributor sites and gaming sites • AMT under umbrella of web service of Amazon sites -> higher impact • Scale - none/low/fair/high

  35. Characterization of Multiple Dimensions • Centralization • measures whether there is a single entity or owner that defines which tasks are being presented to workers • In case of GWAP games1, there are currently 5 games right now. For AMT, anyone can define their own tasks for their evaluation purpose • Scale - yes/no (-)

  36. Characterization of Multiple Dimensions • Scale • metric of how many tasks the system will be able to accomplish • GWAP can produce extremely volumes of data, because the operating costs are low • AMT scales really well, but it costs money • For example • if we have many millions of tasks to accomplish, GWAP is a better approach • At 10,000 tasks, AMT will do well (and requires less effort to setup and less effort to approve submitted tasks) • Scale - none/low/fair/high/very high

  37. Characterization of Multiple Dimensions • Annotation participation • Number of participants • utilize people at different skills to improve diversity of quality of annotation data • A small study6 indicated that demographics of AMT currently correlated with demographics of Internet users • Scale - none/low/fair/high/very high 6http://behind-the-enemy-lines.blogspot.com/2008/03/mechanical-turk-demographics.html

  38. Characterization of Multiple Dimensions • Motivation • Exhibit the attractiveness of annotation systems • Some of reasons: • for money • for entertainment/fun • for killing free time • for challenge/self-competition, • Scale - none/low/fair/high

  39. Characterization of Multiple Dimensions • Interaction • Different ways to create interaction of participants • Scale - none/multiple without interaction/multiple with pair-wise interaction/multiple with multiple interaction

  40. Characterization of Multiple Dimensions • Qualification • limit required workers to ensure that only qualified workers can do these tasks • Scale - none/low/fair/high

  41. Characterization of Multiple Dimensions • Data Selection • Size • choose which data resources will be annotated • Scale - none/small/fair/large/very large

  42. Characterization of Multiple Dimensions • Coverage • Coverage would mean whether the data covers the expected real population and distribution of data • Scale - none/low/fair/high

  43. Characterization of Multiple Dimensions • Quality of Annotation • Annotation accuracy • Use different strategies to control the quality of annotation • Use repetition which is the process that does not consider an output to be correct until a certain number of players have entered it • use the post-processing steps to re-evaluate the annotated data • Scale - none/low/fair/high/very high

  44. Characterization of Multiple Dimensions • Inter-annotator agreement • The inter-agreement means for measuring agreement among data annotators • Scale - none/low/fair/high

  45. Characterization of Multiple Dimensions • Quality control • filter the bad data out, integrate correction model to minimize errors during annotation process • For example: • In AMT, developers will approve all submitted HIT tasks, use the voting threshold to approve the answers • In GWAP, check all data contributed by players just after a fixed time • Scale - none/low/fair/high/very high

  46. Characterization of Multiple Dimensions • Usability • annotated data should be proved to be useful and have a real world impact • Scale - none/low/fair/high

  47. Characterization of Multiple Dimensions • Annotation Speed • measure how many labels per day/hour/minute people can obtain • Scale - none/slow/fair/fast

  48. Characterization of Multiple Dimensions • Annotation Cost • measure total cost to be paid to get annotated data • Scale - none/cheap/fair/expensive

  49. Correlation Analysis • To analyze correlation between dimensions • Collect info of available human computation systems so far • 28 popular systems with 4 types of human computation

  50. Correlation Analysis • Human computation systems

More Related