Data annotation with Amazon Mechanical Turk. X 100 000 = $5000 Alexander Sorokin David Forsyth University of Illinois at Urbana-Champaign http://vision.cs.uiuc.edu/annotation/
Motivation • Unlabeled data is free (47M creative commons-licensed images at Flickr) • Labels are useful • We need large volumes of labeled data • Different labeling needs: • Is there Xin the image? • Outline X. • Where is part Y of X. • Of these 500 images, which belong to category X? • ……………. and many more ……………….
Amazon Mechanical Turk Workers Task Task: Dog? Broker Answer: Yes Pay: $0.01 Is this a dog? www.mturk.com o Yes o No $0.01
Amazon Mechanical Turk Workers Task Task: Dog? Broker Answer: Yes Pay: $0.01 Is this a dog? www.mturk.com o Yes o No $0.01 x 100 000 = $1 000
Annotation protocols • Type keywords • Select relevant images • Click on landmarks • Outline something • Detect features ……….. anything else ………
Type keywords $0.01 http://austinsmoke.com/turk/.
Select examples Joint work with Tamara and Alex Berg http://vision.cs.uiuc.edu/annotation/data/simpleevaluation/html/horse.html
Select examples $0.02 requester mtlabel
Click on landmarks $0.01 http://vision-app1.cs.uiuc.edu/mt/results/people14-batch11/p7/
Outline something $0.01 http://vision.cs.uiuc.edu/annotation/results/production-3-2/results_page_013.html Data from Ramanan NIPS06
Detect features Measuring molecules. Joint work with Rebecca Schulman (Caltech) ?? $0.1 http://vision.cs.uiuc.edu/annotation/all_examples.html
Ideal task properties • Easy cognitive task Good: Where is the car? (bounding box) Good: How many cars are there? (3) Bad: How many cars are there? (132) • Well-defined task Good: Locate corners of the eyes. Bad: Label joint locations. (low resolution or close-up images) • Concise definition Good: 1-2 paragraphs, fixed for all tasks Good: 1-2 unique sentences per task. Bad: 300 pages annotation manual • Low amount of input Good: few clicks or a couple words Bad: detailed outlines of all objects (100s of control points)
Ideal task properties • High volume Good: 2-100K tasks Bad: <500 tasks (DIY) • Data diversity Bad: Independently label consecutive video frames. • Data is being used Good: Direct input into [active] learning. Bad: Let’s build a dataset for other people to use. • Pay “well” Good: try to pay at the market rate, $0.03-$0.05/image Good: offer bonuses for good work Bad: $0.01 for detailed image segmentation
Price • $0.01 per image (16 clicks) ~ $1500 / 100 000 images >1000 images per day <4 months • Amazon listing fee 10%, $0.005 min • Workers suggested $0.03 - $0.05/img • $3500 - $5500 / 100 000 images
Price-elastic throughput $0.01/ 40 clicks 15 hours 900 labels $0.01 / 14 clicks 1.6 hours 900 labels $0.01 / 16 clicks 4 hours 900 labels
Annotation quality Agree within 5-10 pixels on 500x500 screen There are bad ones. A C E G Protocol: label people, 14pts; Volume 305 images
Submission breakup Protocol: label people, box+14pts; Volume 3078 HITs • We need to “manually” verify the work
Grading tasks • Take 10 submitted results • Create new task to verify the result • Verification is easy • Pay the same or slightly higher price • Total overhead - 10% (work in progress) http://vision-app1.cs.uiuc.edu/mt/grading/people14-batch11-small/p1/
How do I sign up? • Go to our web page: http://vision.cs.uiuc.edu/annotation/ • Send me an e-mail: firstname.lastname@example.org • Register at Amazon Mechanical Turk http://www.mturk.com
What are the next steps • Collecting more data • 100K labeled people at $5000 • Accurate models for 2.1D pose estimation • Complex models, high accuracy, real time • Visualization and storage • If we all collect labels, how do we share? • Active learning/Online classifiers • If we can ask for labels, why not automatically? • Limited domain Human-Computer racing • Run learning until computer model beats humans
Open Issues • What data to annotate? • Is image resolution important? • Images or videos? • Licensing? • How to allocate resources? • Uniformly per object category • Non-uniformly and use transfer learning • How much data do we need? • What is the value of labeled data? • Will 10 000 000 labeled images (for$1M) solve everything?
Acknowledgments Special thanks to: David Forsyth Tamara Berg Rebecca Schulman David Martin Kobus Barnard Mert Dikmen All workers at Amazon Mechanical Turk This work was supported in part by the National Science Foundation under IIS - 0534837 and in part by the Office of Naval Research under N00014-01-1-0890 as part of the MURI program. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect those of the National Science Foundation or the Office of Naval Research.
Thank you X 100 000 = $5000
References • Mechanical turk web site http://www.mturk.com • Our project web site http://vision.cs.uiuc.edu/annotation/ • Label Me - open annotation tool http://labelme.csail.mit.edu/ • Games with a purpose (ESP++) http://www.gwap.com/gwap/ • Lotus hill research institute/image parsing http://www.imageparsing.com/ • Tips on how to formulate a task http://developer.amazonwebservices.com/connect/thread.jspa?threadID=17867
Creative Commons Licenses Attribution. You must attribute the work in the manner specified by the author… Noncommercial. You may not use this work for commercial purposes ShareAlike. You may distribute the modified work only under the same, similar or a compatible license. No Derivative Works. You may not alter, transform, or build upon this work. Adapted from http://creativecommons.org/licenses/
Flickr images by license BY 8,831,568 BY-SA 6,137,030 BY-NC-SA 21,678,154 BY-NC 10,724,800 Total: 47,371,552 http://flickr.com/creativecommons/, as of 07/20/08
Motivation X 100 000 = $5000 Custom annotations Large scale Low price
Motivation X 100 000 = $5000 Custom annotations Large scale Low price
Mechanical Turk terminology • Requester • Worker • HIT (human intelligence task) • Reward • Bonus • Listing fee • Qualification
Commercial applications • Label objects on the highway (asset management) • Create transcript of videos and audios (text-based video search) • Outline a golf course and objects (property valuation) • Write and summarize product review
Scalability • My current throughput is 1000 HITs/day • There are 30K - 60K HITs at a time • Workers enjoy what they do • Popular HITs “disappear” very quickly • Scalability is Amazon’s job!
Why talk to us? • We can jump-start your annotation project • We discuss the annotation protocol • You give us sample data(e.g. 100 images) • We run it through MT • We give you detailed step-by-step instructions how to run it • We can build new tools • All our tools are public • You can always do it yourself
Objective • To build • A simple tool • To obtain annotations • At large scale for • A specific research project • Very quickly • And at low cost
Projects in progress • People joint locations • 2380 images/ 2729 good annotations • Relevant images • Consistency at 20 annotations/set • Annotate molecules • 30% usable data at the first round