Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling

Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science, Carnegie Mellon University KDD ’09 June 30th 2009 Paris, France

Problem Illustration oracles 0.69 instances 0.9 0.58 0.55 0.67 0.83 0.8 0.74

Interval Estimate Threshold (IEThresh) • Goal: find the labeler(s) with the highest expected accuracy • Our work builds upon Interval Estimation [L. P. Kaelbling] • Estimate the reward of each labeler (more on next slide) • Compute upper confidence interval for the labelers • Select labelers with upper interval higher than a threshold • Observe the output of the chosen oracles to estimate their reward • Repeat to step 1 • filter out unreliable labelers • reduce labeling cost

Reward of the labelers • The reward of each labeler is unknown => need to be estimated • reward of a labeler  eliciting true label • true label is also unknown => estimated by the majority vote • We propose the below reward function reward=1 if the labeler agrees with the majority label reward=0 otherwise

IEThresh at the Beginning Expected reward increases Oracles

IEThresh Oracle Selection Expected reward Threshold increases 2 3 1 4 5 Oracles

IE Learning Snapshot II Expected reward Threshold increases 4 2 3 5 1 Oracles

1 2 5 4 3 IEThresh Instance Selection

Uniform Expert Accuracy є (0.5,1] Classification error Repeated Labeling [Sheng et al, 2008]: querying all experts for labeling

# Oracle Queries vs. Accuracy : First 10 iterations : Next 40 iterations : Next 100 iterations

# Oracle queries to reach a target accuracy better skew increases

Results on AMT Data with Human Annotators • IEThresh reaches the best performance with similar effort to Repeated labeling • Repeated baseline needs 840 queries total to reach 0.95 accuracy 5 annotators 6 annotators Dataset at http://nlpannotations.googlepages.com/ made available by [Snow et al., 2008]

Conclusions and Future Work • Conclusions • IEThresh is effective in balancing exploration vs. exploitation tradeoff • Early filtering of unreliable labelers boosts performance • Utilizing labeler accuracy estimates is more effective than asking all or randomly • Future Work • from consistent to time-variant labeler quality • label noise conditioned on the data instance • correlated labeling errors

THANK YOU!

Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling