1 / 49

Rewarding Crowdsourced Workers

Rewarding Crowdsourced Workers. Panos Ipeirotis New York University and Google. Twitter: @ ipeirotis “A Computer Scientist in a Business School” http://behind-the-enemy-lines.com. Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng; .

kana
Download Presentation

Rewarding Crowdsourced Workers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rewarding Crowdsourced Workers PanosIpeirotis New York University and Google Twitter: @ipeirotis “A Computer Scientist in a Business School”http://behind-the-enemy-lines.com Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

  2. Example: Build an Web Page Classifier • Need a large number of labeled sites for training • Get people to look at sites and label them as: G (general audience)PG (parental guidance) R(restricted)X (porn) • Cost/Speed Statistics • Undergrad intern: 200 websites/hr, cost: $15/hr • Mechanical Turk: 2500 websites/hr, cost: $12/hr

  3. Challenges • We do not know the true category for the objects • Available only after (costly) manual inspection • We do not know quality of the workers • We want to label objects with true categories • We want (need?) to know the quality of the workers

  4. Expectation Maximization Estimation Iterative process to estimate worker error rates Initialize “correct” label for each object (e.g., use majority vote) Estimate error rates for workers (using “correct” labels) Estimate “correct” labels(using error rates, weight worker votes according to quality) Go to Step 2 and iterate until convergence

  5. Challenge: From Confusion Matrixes to Quality Scores • Confusion matrix for spammer worker • P[X → X]=0.847% P[X → G]=99.153% • P[G → X]=0.053% P[G → G]=99.947% • Confusion matrix for good worker • P[X → X]=99.847% P[X → G]=0.153% • P[G → X]=4.053% P[G → G]=95.947% How to check if a worker is a spammer using the confusion matrix? (hint: error rate not enough)

  6. Challenge 1: Spammers are lazy and smart! • Confusion matrix for spammer • P[X → X]=0% P[X → G]=100% • P[G → X]=0% P[G → G]=100% • Confusion matrix for good worker • P[X → X]=80% P[X → G]=20% • P[G → X]=20% P[G → G]=80% • Spammers figure out how to fly under the radar… • In reality, we have 85% G sites and 15% X sites • Error rate of spammer = 0% * 85% + 100% * 15% = 15% • Error rate of good worker = 85% * 20% + 85% * 20% = 20% False negatives: Spam workers pass as legitimate

  7. Challenge 2: Humans are biased! • Error rates for legitimate (but biased) employee • P[G → G]=20.0% P[G → P]=80.0% P[G → R]=0.0% P[G → X]=0.0% • P[P → G]=0.0% P[P → P]=0.0%P[P → R]=100.0% P[P → X]=0.0% • P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R → X]=0.0% • P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0% • We have 85% G sites, 5% P sites, 5% R sites, 5% X sites • Error rate of spammer (all G) = 0% * 85% + 100% * 15% = 15% • Error rate of biased worker = 80% * 85% + 100% * 5% = 73% • False positives: Legitimate workers appear to be spammers • (important note: bias is not just a matter of “ordered” classes)

  8. Solution: Fix bias first, compute error rate afterwards • Error Rates for legitimate (but biased) employee • P[G → G]=20.0% P[G → P]=80.0% P[G → R]=0.0% P[G → X]=0.0% • P[P → G]=0.0% P[P → P]=0.0%P[P → R]=100.0% P[P → X]=0.0% • P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R → X]=0.0% • P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0% • When biased worker says G, it is 100% G • When biased worker says P, it is 100% G • When biased worker says R, it is 50% P, 50% R • When biased worker says X, it is 100% X Small ambiguity for “R-rated” votes but other than that, fine!

  9. Solution: Fix bias first, compute error rate afterwards • Error Rates for spammer • P[G → G]=100.0%P[G → P]=0.0% P[G → R]=0.0% P[G → X]=0.0% • P[P→ G]=100.0% P[P → P]=0.0% P[P → R]=0.0% P[P → X]=0.0% • P[R → G]=100.0% P[R → P]=0.0% P[R → R]=0.0% P[R → X]=0.0% • P[X → G]=100.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=0.0% • When spammer says G, it is 25% G, 25% P, 25% R, 25% X • When spammer says P, it is 25% G, 25% P, 25% R, 25% X • When spammer says R, it is 25% G, 25% P, 25% R, 25% X • When spammer says X, it is 25% G, 25% P, 25% R, 25% X [note: assume equal priors] The results are highly ambiguous. No information provided!

  10. Expected Misclassification Cost • High cost: probability spread across classes • Low cost: probability massconcentrated in one class [***Assume misclassification cost equal to 1, solution generalizes to arbitrary misclassification costs across categories]

  11. Quality Score Quality Score: A scalar measure of quality • A spammer is a worker who assigns labels randomly, regardless of what the true class is. • Quality score useful for ranking workers • Unaffected by systematic biases • Scalar, so no need to examine confusion matrices

  12. Quality Score Question: How to pay workers? • Challenges • Thresholdinghas wrong incentive structure: • Decent (but still useful) workers remain unused • If you are above the threshold, no need to improve • Uncertainty: The quality score is not really a fixed number • Fluctuations in payment are puzzling for workers • Best to have only increases in payment

  13. Two Types of Workers • Divide workers into two groups • Qualified Workers • The quality satisfies the target quality • Unqualified Workers • The qualify fails to meet the target quality levels

  14. A Simple Pricing Model for Qualified Workers • p: the price paid to all qualified workers • fW(w): the pdf distribution of worker reservation wage • FW(w): the cdf distribution of worker reservation wage • R: the fixed price paid by external client for each qualified object

  15. Example Optimal Worker Salary = 21 Revenue fW(w) • fW(w): LogNormal(3,1), Selling price R=50

  16. The Value of Unqualified Workers • Binary classification (1:1) • Worker confusion matrix • “Accept” classification cost <=0.1 We need ~9 workers to achieve the required quality Value: 1/9

  17. A Pricing Model for Workers • p: the price paid to all qualified workers • fW(w): the pdf distribution of worker reservation wageFW(w): the cdf distribution of worker reservation wage • Adjust for the presence of “unqualified” workers, and each unqualified worker “counts” as 1/k of a qualified one • R: the fixed price paid by external client for each qualified object

  18. Optimal Prices p*/9 p* p*/3

  19. Quality Score Question: How to pay workers? • Challenges • Thresholdinghas wrong incentive structure: • Decent (but still useful) workers remain unused • If you are above the threshold, no need to improve • Uncertainty: The quality score is not really a fixed number • Fluctuations in payment are puzzling for workers • Best to have only increases in payment

  20. Bayesian Estimates for Uncertainty Worker B Worker A P[0 → 0]=Beta(101,1) P[0 → 1]=Beta(1,101) P[1 → 0]=Beta(1,101) P[0 → 0]=Beta(101,1) P[0 → 0]=Beta(2,1) P[0 → 1]=Beta(1,2) P[1 → 0]=Beta(1,2) P[0 → 0]=Beta(2,1)

  21. Real-Time Payment and Reimbursement Example of the piece-rate payment of a worker Fair Payment

  22. Real-Time Payment and Reimbursement Example of the piece-rate payment of a worker Fair Payment Potential “Bonus” Piece-rate Payment Payment Number of Tasks 10

  23. Real-Time Payment and Reimbursement Example of the piece-rate payment of a worker Fair Payment Potential “Bonus” Piece-rate Payment Reimbursement Payment Payment 10 20 Number of Tasks

  24. Real-Time Payment and Reimbursement Example of the piece-rate payment of a worker Fair Payment Potential “Bonus” Reimbursement Piece-rate Payment Reimbursement Payment Payment Payment 10 20 30

  25. Real-Time Payment and Reimbursement Example of the piece-rate payment of a worker Fair Payment Potential “Bonus” Reimbursement Reimbursement Piece-rate Payment Reimbursement Payment Payment Payment Payment 10 20 30 40

  26. Synthetic Experiment Setup • N=10,000 tasks • R=200 cents • Cost <=0.01 • Labeling Process: Workers • Arrival frequency: every 600 seconds • Number of each arrival: 10 workers • Submitting speed: 30 seconds per task • The evaluation criterion is Unit Time Profit:

  27. (Synthetic) Experimental Setup Scatter Plot of Confusion Matrix, Reservation Wage Qualify level and reservation wage independently distributed Qualify level and reservation wage positively correlated

  28. Experimental Results Average Profit per Second

  29. Workers reacting to bad rewards/scores Score-based feedback leads to strange interactions: The “angry, has-been-burnt-too-many-times” worker: • “F*** YOU! I am doing everything correctly and you know it! Stop trying to reject me with your stupid ‘scores’!” The overachiever worker: • “What am I doing wrong?? My score is 92% and I want to have 100%”

  30. National Academy of Sciences Dec 2010 “Frontiers of Science” conference Your workers behave like my mice! An unexpected connection…

  31. Your workers behave like my mice! Eh?

  32. Your workerswant to use only their motorskills,nottheircognitive skills

  33. The Biology Fundamentals • Brain functions are biologically expensive (20% of total energy consumption in humans) • Motor skills are more energy efficient than cognitive skills (e.g., walking) • Brain tends to delegate easy tasks to part of the neural system that handles motor skills

  34. An unexpected connection at theNAS “Frontiers of Science” conf. Your workerswant to use only their motorskills,nottheircognitive skills Makes sense

  35. An unexpected connection at theNAS “Frontiers of Science” conf. And here is how I train mymice to behave…

  36. The Mice Experiment Cognitive Solve mazeFind pellet Motor Push lever three timesPellet drops

  37. How to Train the Mice? Confuse motor skills! Reward cognition! I should try this the moment that I get back to my room

  38. Punishing Worker’s Motor Skills • Punishbad answers with frustration of motor skills (e.g., add delays between tasks) • “Loading image, please wait…” • “Image did not load, press here to reload” • “404 error. Return the HIT and accept again” →Make this probabilisticto keep feedback implicit

  39. Rewarding (?) Cognitive Effort • Rewardgood answers by rewarding the cognitive part of the brain • Introduce variety • Introduce novelty • Give new tasks fast • Show score improvements faster (but not the opposite) • Show optimistic score estimates

  40. Experiments • Web page classification • Image tagging • Email & URL collection

  41. Experimental Summary (I) • Spammer workers quickly abandon • No need to display scores, or ban • Low quality submissions from ~60% to ~3% • Half-life of low-quality from 100+ HITs to less than 5 • Good workers unaffected • No significant effect on participation of workers with good performance • Lifetime of participants unaffected • Longer response times (after removing the “intervention delays”; that was puzzling)

  42. Experimental Summary (II) • Remember, scheme was for training the mice… • 15%-20% of the spammers start submitting good work! ????

  43. Two key questions • Why response time was slower for some good workers? • Why some low quality workers start working well? ????

  44. System 1: “Automatic” actions • System 2:“Intelligent” actions

  45. System 1 Tasks

  46. System 2 Tasks

  47. Not Performing Well?Disrupt and Engage System 2 Not Performing Well?Hell/Slow ban Performing Well? Out Status: Usage of System 1 (“Automatic”) Status: Usage of System 2 (“Intelligent”) Performing Well? Check if System 1 can Handle, remove System 2 stimuli

  48. Thanks!Q & A?

More Related