- 80 Views
- Uploaded on
- Presentation posted in: General

Crowd Algorithms

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Crowd Algorithms

Scoop — The Stanford – Santa Cruz Project for Cooperative Computing with Algorithms, Data, and People

Hector Garcia-Molina, Stephen Guo, AdityaParameswaran, Hyunjung Park, AlkisPolyzotis, PetrosVenetis, Jennifer Widom

Stanford and UC Santa Cruz

- Design Fundamental Algorithms for Human Computation

Latency

- Which questions do I ask?
- When do I ask the questions?
- When do I stop?
- How do I combine the answers?

Uncertainty

Cost

- Crowd-
- Crowd-
- Crowd-
- Crowd-

- Sort / Max
- GraphSearch
- Categorize
- Filter

- : Difficult!
- : Difficult!
- : Difficult!
- : Difficult!

[VLDB 2011]

Summaries of the rest

Progress!

The focus of this talk.

Latency

Uncertainty

Cost

Is this image that of Bytes Café ?

- Given:
- Error Probability (FP/FN) & Selectivity for each predicate
- Desired Overall Error Probability

- To: Compose a filtering strategy
- Minimize Overall Cost (# of questions)

Predicate 1

Dataset of Items

Is the image blurry?

Filtered Dataset

Predicate 2

Does it show people’s faces?

……

Predicate k

- Which questions do I ask?
- When do I ask the questions?
- When do I stop?
- How do I combine the answers?

- Surprisingly difficult!
- Need to meet an overall error threshold
- Say, up to 10% of my images may be wrongly filtered

- Minimize overall expected number of questions
- Boils down to the following:
- Take one item
- Ask some questions
- Results in a certain number of (Y, N) for a given item

- Do I stop (if so, what do I return), or do I continue asking?

Dataset of Items

Filtered Dataset

Predicate 1

- Solutions from statistics guarantee the same error per item
- Important on contexts like:
- Automobile testing
- Diagnosis

- Important on contexts like:
- We’re worried about aggregate error over all items: a uniquely data-oriented problem
- I don’t care if every image is perfect as long as the overall error is met.
- As we will see, results in $$$ savings

YES Answers

- Reformulated Task:
- For each point in grid : Return Pass/Fail/Cont.
- Equivalently,
- Find the best shape and color it!

YES = 5, NO = 6

Return “Passed”

YES = 3, NO = 5

Continue

YES = 3, NO = 7

Return “Failed”

Start here, with no questions

NO

Answers

- Always ask X questions, return most likely answer
- The triangle shape

- If you get X YES, return “Pass” or Y NO, return “Fail”, else keep asking.
- Rectangular shape

- Ask until |#YES - #NO| > X, or at most Y questions
- Chopped off rectangle
- Anhai’s work on MOBS

- A characterization of which “shapes” are optimal
- A optimal PTIME “probabilistic” approach
- LP leveraging the inherent DP structure
- Optimal: Strategy with minimum overall cost
- for given parameters and requirements

- Probabilistic: Probability of “Pass” “Fail” “Continue”

Generate Parameters

- Evaluation on 10000 synthetic scenarios
- Tested:
- Optimal, Brute Force, Statistical, 5 Heuristic Algorithms

- Optimal Probabilistic issues fewer questions overall
- 15% savings on average compared to brute force
- 32% savings when optimal wins

- 22% savings on average compared to the statistics approach
- 49% savings when optimal wins

- 15% savings on average compared to brute force

Brute Force

Deterministic

Optimal Probabilistic

Other Algorithms

>>

>>

COST1

COST2

COST3

Translates to $$$ for many items !!

- The problem(s):
- Find the strategy of sorting n items
- Given: Probability of error for a comparison
- Given: Desired threshold on error,#questions,#rounds

- Find the strategy of sorting n items
- Sorting automatically given evidence
- NP-Hard even for a simple probability of error model
- Related work in the area of voting theory, economics

- Which r questions do we ask next?

- One question in each round

- Ask all pairs a total of 2k/n times

- Tournament, with k repetitions at each level

Decreasing Parallelism

More Accuracy

Image Categorization Example

To attach: image of a honda car

Is image one of vehicle?

vehicle

YES!

car

Is image one of toyota?

NO!

nissan

honda

toyota

Is image one of honda?

maxima

sentra

YES!

target node = intended category

Is the image one of X? = Is the target node reachable from X?

Find the target node by asking minimum number of search questions.

- k buckets, n items
- Categorize every item, overall error < threshold
- For k = 1, same as filters problem
- Two versions:
- Discrete
- Independent (like in the filters case)
- Dependent buckets (e.g., colors, GraphSearch)

- Continuous (e.g., age)

- Discrete

Dataset of Items

…….