1 / 37

Get Another Label? Using Multiple, Noisy Labelers

Get Another Label? Using Multiple, Noisy Labelers. Joint work with Victor Sheng and Foster Provost. Panos Ipeirotis Stern School of Business New York University. Motivation. Many task rely on high-quality labels for objects: relevance judgments duplicate database records

leonardw
Download Presentation

Get Another Label? Using Multiple, Noisy Labelers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost Panos IpeirotisStern School of Business New York University

  2. Motivation • Many task rely on high-quality labels for objects: • relevance judgments • duplicate database records • image recognition • song categorization • videos • Labeling can be relatively inexpensive, using Mechanical Turk, ESP game …

  3. ESP Game (by Luis von Ahn)

  4. Mechanical Turk Example “Are these two documents about the same topic?”

  5. Mechanical Turk Example

  6. Motivation • Labels can be used in training predictive models • Duplicate detection systems • Image recognition • Web search • But: labels obtained from above sources are noisy. This directly affects the quality of learning models • How can we know the quality of annotators? • How can we know the correct answer? • How can we use best noisy annotators?

  7. Quality and Classification Performance Labeling quality increases  classification quality increases Q = 1.0 Q = 0.8 Q = 0.6 Q = 0.5

  8. How to Improve Labeling Quality • Find better labelers • Often expensive, or beyond our control • Use multiple, noisy labelers: repeated-labeling • Our focus

  9. Our Focus:Labeling using Multiple Noisy Labelers • Multiple labelers and resulting label quality • Multiple labelers and classification quality • Selective label acquisition

  10. Majority Voting and Label Quality • Ask multiple labelers, keep majority label as “true” label • Quality is probability of majority label being correct P=1.0 P=0.9 P=0.8 P is probabilityof individual labelerbeing correct P=0.7 P=0.6 P=0.5 P=0.4

  11. So… • Multiple noisy labelers improve quality • (Sometimes) quality of multiple noisy labelers better than quality of best labeler in set So, should we always get multiple labels?

  12. Tradeoffs for Classification • Get more labels  Improve label quality  Improve classification • Get more examples  Improve classification Q = 1.0 Q = 0.8 Q = 0.6 Q = 0.5

  13. Basic Labeling Strategies • Get as many data points as possible, one label each • Repeatedly-label everything, same number of times

  14. Repeat-Labeling vs. Single Labeling Repeated Single P= 0.6, labeling quality K=5, #labels/example With high noise, repeated labeling better than single labeling

  15. Repeat-Labeling vs. Single Labeling Single Repeated P= 0.8, labeling quality K=5, #labels/example With low noise, more (single labeled) examples better

  16. Estimating Labeler Quality • (Dawid, Skene 1979): “Multiple diagnoses” • Assume equal qualities • Estimate “true” labels for examples • Estimate qualities of labelers given the “true” labels • Repeat until convergence

  17. Selective Repeated-Labeling • We have seen: • With noise and enough (noisy) examples getting multiple labels better than single-labeling • Can we do better? • Select data points, in terms of uncertainty score, to allocate multi-label resource, e.g. {+,-,+,+,-,+,+} vs. {+,+,+,+}

  18. Natural Candidate: Entropy • Entropy is a natural measure of label uncertainty: • E({+,+,+,+,+,+})=0 • E({+,-, +,-, +,- })=1 Strategy: Get more labels for high-entropy examples

  19. What Not to Do: Use Entropy Improves at first, hurts in long run Entropy Round robin

  20. Why not Entropy • In the presence of noise, entropy will be high even with many labels • Entropy is scale invariant • (3+ , 2-) has same entropy as (600+ , 400-)

  21. Estimating Label Uncertainty (LU) • Observe +’s and –’s and compute Pr{+|obs} and Pr{-|obs} • Label uncertainty = tail of beta distribution Beta probability density function SLU 0.5 0.0 1.0

  22. Label Uncertainty • p=0.7 • 5 labelers(3+, 2-) • Entropy ~ 0.97

  23. Label Uncertainty • p=0.7 • 10 labelers(7+, 3-) • Entropy ~ 0.88

  24. Label Uncertainty • p=0.7 • 20 labelers(14+, 6-) • Entropy ~ 0.88

  25. Comparison Label Uncertainty Uniform, round robin

  26. Model Uncertainty (MU) • However, we do not have only labelers • A classifier can also give us labels! • Model uncertainty: get more labels for ambiguous/difficult examples • Intuitively: make sure that difficult cases are correct + + - - - - - - - - + + + + ? - - - - - - - - + + + + + + + + - - - - - - - - + + - - - - + + - - - - + + ? ?

  27. Label + Model Uncertainty • Label and model uncertainty (LMU): avoid examples where either strategy is certain

  28. Comparison Model Uncertainty alone also improves quality Label + Model Uncertainty Label Uncertainty Uniform, round robin

  29. Classification Improvement

  30. Conclusions • Gathering multiple labels from noisy users is a useful strategy • Under high noise, almost always better than single-labeling • Selectively labeling using label and model uncertainty is more effective

  31. More Work to Do • Estimating the labeling quality of each labeler • Increased compensation vs. labeler quality • Example-conditional quality issues (some examples more difficult than others) • Multiple “real” labels • Hybrid labeling strategies using “learning-curve gradient”

  32. Other Projects • SQoUT projectStructured Querying over Unstructured Texthttp://sqout.stern.nyu.edu • Faceted Interfaces • EconoMining projectThe Economic Value of User Generated Contenthttp://economining.stern.nyu.edu

  33. SQoUT: Structured Querying over Unstructured Text • Information extraction applications extract structured relations from unstructured text May 19 1995, Atlanta -- The Centers for Disease Control and Prevention, which is in the front line of the world's response to the deadly Ebola epidemic in Zaire , is finding itself hard pressed to cope with the crisis… Disease Outbreaks in The New York Times Information Extraction System (e.g., NYU’s Proteus)

  34. SIGMOD’06, TODS’07, + in progress SQoUT: The Questions Text Databases Extraction System(s) Retrieve documents from database/web/archive Process documents Extract output tuples Questions: How to we retrieve the documents? How to configure the extraction systems? What is the execution time? What is the output quality?

  35. Basic Idea Applications (in increasing order of difficulty) • Opinion mining an important application of information extraction • Opinions of users are reflected in some economic variable (price, sales) EconoMining ProjectShow me the Money! • Buyer feedback and seller pricing power in online marketplaces (ACL 2007) • Product reviews and product sales (KDD 2007) • Importance of reviewers based on economic impact (ICEC 2007) • Hotel ranking based on “bang for the buck” (WebDB 2008) • Political news (MSM, blogs), prediction markets, and news importance

  36. Some Indicative Dollar Values Negative Positive captures misspellings as well Natural method for extracting sentiment strength and polarity good packaging -$0.56 Negative Positive? ? Naturally captures the pragmatic meaning within the given context

  37. Thanks!Q & A?

More Related