Crowd algorithms
1 / 14

Crowd Algorithms - PowerPoint PPT Presentation

  • Uploaded on

Crowd Algorithms. Scoop — The Stanford – Santa Cruz Project for Cooperative Computing with Algorithms, Data, and People .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Crowd Algorithms' - monte

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Crowd algorithms

Crowd Algorithms

Scoop — The Stanford – Santa Cruz Project for Cooperative Computing with Algorithms, Data, and People

Hector Garcia-Molina, Stephen Guo, AdityaParameswaran, Hyunjung Park, AlkisPolyzotis, PetrosVenetis, Jennifer Widom

Stanford and UC Santa Cruz

The goal
The Goal

  • Design Fundamental Algorithms for Human Computation


  • Which questions do I ask?

  • When do I ask the questions?

  • When do I stop?

  • How do I combine the answers?



The problems
The Problems

  • Crowd-

  • Crowd-

  • Crowd-

  • Crowd-

  • Sort / Max

  • GraphSearch

  • Categorize

  • Filter

  • : Difficult!

  • : Difficult!

  • : Difficult!

  • : Difficult!

[VLDB 2011]

Summaries of the rest


The focus of this talk.





Is this image that of Bytes Café ?

  • Given:

    • Error Probability (FP/FN) & Selectivity for each predicate

    • Desired Overall Error Probability

  • To: Compose a filtering strategy

    • Minimize Overall Cost (# of questions)

Predicate 1

Dataset of Items

Is the image blurry?

Filtered Dataset

Predicate 2

Does it show people’s faces?


Predicate k

  • Which questions do I ask?

  • When do I ask the questions?

  • When do I stop?

  • How do I combine the answers?

Single filter
Single Filter

  • Surprisingly difficult!

  • Need to meet an overall error threshold

    • Say, up to 10% of my images may be wrongly filtered

  • Minimize overall expected number of questions

  • Boils down to the following:

    • Take one item

    • Ask some questions

      • Results in a certain number of (Y, N) for a given item

    • Do I stop (if so, what do I return), or do I continue asking?

Dataset of Items

Filtered Dataset

Predicate 1

Hasn t this been done before
Hasn’t this been done before?

  • Solutions from statistics guarantee the same error per item

    • Important on contexts like:

      • Automobile testing

      • Diagnosis

  • We’re worried about aggregate error over all items: a uniquely data-oriented problem

    • I don’t care if every image is perfect as long as the overall error is met.

    • As we will see, results in $$$ savings


YES Answers

  • Reformulated Task:

  • For each point in grid : Return Pass/Fail/Cont.

  • Equivalently,

  • Find the best shape and color it!

YES = 5, NO = 6

Return “Passed”

YES = 3, NO = 5


YES = 3, NO = 7

Return “Failed”

Start here, with no questions



Common strategies
Common Strategies

  • Always ask X questions, return most likely answer

    • The triangle shape

  • If you get X YES, return “Pass” or Y NO, return “Fail”, else keep asking.

    • Rectangular shape

  • Ask until |#YES - #NO| > X, or at most Y questions

    • Chopped off rectangle

    • Anhai’s work on MOBS

Summary of results
Summary of Results

  • A characterization of which “shapes” are optimal

  • A optimal PTIME “probabilistic” approach

    • LP leveraging the inherent DP structure

    • Optimal: Strategy with minimum overall cost

      • for given parameters and requirements

    • Probabilistic: Probability of “Pass” “Fail” “Continue”

Empirical results
Empirical Results

Generate Parameters

  • Evaluation on 10000 synthetic scenarios

  • Tested:

    • Optimal, Brute Force, Statistical, 5 Heuristic Algorithms

  • Optimal Probabilistic issues fewer questions overall

    • 15% savings on average compared to brute force

      • 32% savings when optimal wins

    • 22% savings on average compared to the statistics approach

      • 49% savings when optimal wins

Brute Force


Optimal Probabilistic

Other Algorithms






Translates to $$$ for many items !!

Crowd max sort

  • The problem(s):

    • Find the strategy of sorting n items

      • Given: Probability of error for a comparison

      • Given: Desired threshold on error,#questions,#rounds

  • Sorting automatically given evidence

    • NP-Hard even for a simple probability of error model

    • Related work in the area of voting theory, economics

  • Which r questions do we ask next?

  • One question in each round

  • Ask all pairs a total of 2k/n times

  • Tournament, with k repetitions at each level

Decreasing Parallelism

More Accuracy

Crowd graphsearch

Image Categorization Example

To attach: image of a honda car

Is image one of vehicle?




Is image one of toyota?





Is image one of honda?




target node = intended category

Is the image one of X? = Is the target node reachable from X?

Find the target node by asking minimum number of search questions.

Crowd categorize

  • k buckets, n items

  • Categorize every item, overall error < threshold

  • For k = 1, same as filters problem

  • Two versions:

    • Discrete

      • Independent (like in the filters case)

      • Dependent buckets (e.g., colors, GraphSearch)

    • Continuous (e.g., age)

Dataset of Items