Loading in 5 sec....

Generative Models for Crowdsourced DataPowerPoint Presentation

Generative Models for Crowdsourced Data

- 129 Views
- Uploaded on
- Presentation posted in: General

Generative Models for Crowdsourced Data

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Generative Models for Crowdsourced Data

- What is Crowdsourcing?
- Modeling the labeling process
- Example with real data
- Extensions
- Future Directions

- Human based computation.
- Outsourcing certain steps of a computation to humans.
- ``Artificial artificial intelligence.’’
- Data science:
- Making an immediate decision.
- Creating a labeled data set for learning.

- Not everybody agrees on the gender of a Twitter profile.
- Difficult Instances
- Worker Ability / Motivation
- Worker Bias
- AdversarialBehaviour

- When some workers say “male” and some workers say “female”, what to do?

- Assign label l to item x if a majority of workers agree.
- Otherwise item x remains unlabeled.

- Assign label l to item x if a majority of workers agree.
- Otherwise item x remains unlabeled.
- Ignores prior worker data.

- Assign label l to item x if a majority of workers agree.
- Otherwise item xremains unlabeled.
- Ignores prior worker data.
- Introduce bias in labeled data.

- For labeled data set workflow.
- Add all item-label pairs to the data set.
- Equivalent to cost vector of:
- P (l | { lw}) = 1/nwS 1{l = lw}

- For labeled data set workflow.
- Add all item-label pairs to the data set.
- Equivalent to cost vector of:
- P (l | { lw}) = 1/nwS1{l = lw}

- Ignores prior worker data.

- For labeled data set workflow.
- Add all item-label pairs to the data set.
- Equivalent to cost vector of:
- P (l | { lw}) = 1/nwS1{l = lw}

- Ignores prior worker data.
- Models the crowd, not the “ground truth.”

- Different theoretical approaches.
- PAC learning with noisy labels.
- Fully-adversarial active learning.

- Bayesians have been very active.
- “Easy” to posit a functional form and quickly develop inference algorithms.
- Issue of model correctness is ultimately empirical.

- (2009) Whitehill et. al. GLAD framework.
- (1979) Dawid and Skene. Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm.

- (2010) Welinder et. al. The Multidimensional Wisdom of Crowds.
- (2010) Raykar et. al. Learning from Crowds.

- Define ground truth via a generative model which describes how “ground truth” is related to the observed output of crowdsource workers.
- Fit to observed data.
- Extract posterior over ground truth.
- Make decision or train classifier.

- Each worker has a matrix.
α = ( -1 α01)

( α10 -1 )

- Each item has a scalar difficulty β > 0.
- P (lw = j | z = i) = e-βαij / (Σk e-βαik)
- αij ~ N (μij, 1) ; μij ~ N (0, 1)
- log β ~ N (ρ, 1) ; ρ ~ N (0, 1)

- Multiclass classification:
- Same as binary with larger confusion matrix.

- Ordinal classification: (“Hot or not”)
- Confusion matrix has special form.
- O (L) parameters instead of O (L2).

- Multilabel classification:
- Reduce to multiclass on power set.
- Assume low-rank confusion matrix.

- Initially all workers are assumed moderately accurate and without bias.
- Implies initial estimate of ground truth distribution favors consensus.
- Disagreeing with the majority is a likely error.

- Initially all workers are assumed moderately accurate.
- Workers consistently in the minority have their confusion probabilities increase.

- Initially all workers are assumed moderately accurate.
- Workers consistently in the minority have their confusion probabilities increase.
- Workers with higher confusion probabilities contribute less to the distribution of ground truth.

- Workers that are consistently in the minority will not contribute strongly to the posterior distribution over ground truth.
- Even if they are actually more accurate.

- Can correct when an accurate worker(s) is paired with some inaccurate workers.
- Good for breaking ties.
- Raykar et. al.

- Given a set of worker-label pairs for a single item:
- (Inference) Using current α, find most likely β* and distribution q* over ground truth.
- (Training) Do SGD update of α with respect to EM auxiliary function evaluated at β* and q*.

- Given a set of worker-label pairs for a single item:
- (Inference) Using current α, find most likely β* and distribution q* over ground truth.
- (Training) Do SGD update of α with respect to EM auxiliary function evaluated at β* and q*.

- Take an immediate cost-sensitive decision
- d* = argmindEz~q*[f (z, d)]

- Train a (importance-weighted) classifier
- cost vector cd = Ez~q*[f (z, d)]
- e.g. 0/1 loss: cd = (1 - q*d)
- e.g. binary 0/1 loss: |c1 – c0| = |1 – 2 q*1|
- No need to decide what the true label is!

- Raykar et. al.: why not jointly estimate classifier and worker confusion?

- Cost vector is constructed by estimating worker confusion matrices.
- Subsequently, classifier is trained; it will sometimes disagree with workers.
- Would be nice to use that disagreement to inform the worker confusion matrices.
- Circular dependency suggests joint estimation.

- Initially the classifier will output an uninformative prior and therefore will be trained to follow consensus of workers.
- Eventually workers which disagree with the classifier will have their confusion probabilities increase.
- Workers consistently in the minority can contribute strongly to the posterior if they tend to agree with the classifier.

- Software
- http://code.google.com/p/nincompoop

- Blog
- http://machinedlearnings.com/