wisdom of crowds and rank aggregation n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Wisdom of Crowds and Rank Aggregation PowerPoint Presentation
Download Presentation
Wisdom of Crowds and Rank Aggregation

Loading in 2 Seconds...

play fullscreen
1 / 51

Wisdom of Crowds and Rank Aggregation - PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on

Wisdom of Crowds and Rank Aggregation. Mark Steyvers Department of Cognitive Sciences University of California, Irvine. Joint work with: Brent Miller, Pernille Hemmer, Mike Yi, Michael Lee. Wisdom of crowds phenomenon.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Wisdom of Crowds and Rank Aggregation' - lakia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
wisdom of crowds and rank aggregation

Wisdom of Crowds and Rank Aggregation

Mark Steyvers

Department of Cognitive Sciences

University of California, Irvine

Joint work with:

Brent Miller, Pernille Hemmer, Mike Yi, Michael Lee

wisdom of crowds phenomenon
Wisdom of crowds phenomenon
  • Aggregating over individuals in a group often leads to an estimate that is better than any of the individual estimates
examples of wisdom of crowds phenomenon
Examples of wisdom of crowds phenomenon

Galton’s Ox (1907): Median of individual weight estimates came close to true answer

Prediction markets

our research ranking problems
Our research: ranking problems

What is the correct chronological order?

Abraham Lincoln

Ulysses S. Grant

time

Ulysses S. Grant

Rutherford B. Hayes

Rutherford B. Hayes

James Garfield

Abraham Lincoln

Andrew Johnson

James Garfield

Andrew Johnson

a ggregating ranking data
Aggregating ranking data

ground truth

group answer

?

A B C D

=

A B C D

Aggregation Algorithm

A D B C

D A B C

B A D C

A C B D

A B D C

task constraints
Task constraints
  • No communication between individuals
  • There is always a true answer (ground truth)
  • Unsupervisedalgorithms
    • no feedback is available
    • ground truth only used for evaluation
u nsupervised models for ranking data
Unsupervised models for ranking data
  • Classic models:
    • Thurstone (1927)
    • Mallows (1957); Fligner and Verducci, 1986
    • Diaconis(1989)
    • Voting methods: e.g. Borda count (1770)
  • Machine learning applications
    • Information retrieval and meta-search
      • e.g. Klementiev, Roth et al. (2008; 2009), Lebanon & Mao (2008); Dwork et al. (2001)
    • multi-object tracking
      • e.g. Huan, Guestrin, Guibas (2009); Kondor, Howard, Jebara (2007)

Many models were developed for preference rankings and voting situations  no known ground truth

unsupervised approach
Unsupervised Approach

latent ground truth

? ? ? ?

Incorporate individual differences

Generative Model

A D B C

D A B C

B A D C

A C B D

A B D C

overview of talk
Overview of talk
  • Reconstruct the order of US presidents
  • Effect of group size and expertise
  • Reconstruct the order of events
  • Traveling Salesman Problem
measuring performance
Measuring performance

Kendall’s Tau: The number of adjacent pair-wise swaps

= 1

= 1+1

= 2

Ordering by Individual

A B E C D

A B E CD

E

C

D

A B C D E

A B

True Order

A B C D E

empirical results
Empirical Results

(random guessing)

t

thurstonian model
Thurstonian Model

A. George Washington

B. James Madison

C. Andrew Jackson

Each item has a true coordinate on some dimension

thurstonian model1
Thurstonian Model

A. George Washington

B. James Madison

C. Andrew Jackson

… but there is noise because of encoding errors

thurstonian model2
Thurstonian Model

A. George Washington

B. James Madison

C. Andrew Jackson

A

B

C

Each person’s mental encoding is based on a single sample from each distribution

thurstonian model3
Thurstonian Model

A. George Washington

B. James Madison

C. Andrew Jackson

A

A < C < B

B

C

The observed ordering is based on the ordering of the samples

thurstonian model4
Thurstonian Model

A. George Washington

B. James Madison

C. Andrew Jackson

A

A < B < C

B

C

The observed ordering is based on the ordering of the samples

thurstonian model5
Thurstonian Model

A. George Washington

B. James Madison

C. Andrew Jackson

Important assumption: across individuals, standard deviation can vary but not the means

graphical model of extended thurstonian model
Graphical Model of Extended Thurstonian Model

Latent group means

Individual noise level

Mental representation

Observed ordering

j individuals

inferred distributions for 44 us presidents
Inferred Distributions for 44 US Presidents

George Washington (1)

John Adams (2)

Thomas Jefferson (3)

James Madison (4)

James Monroe (6)

John Quincy Adams (5)

Andrew Jackson (7)

Martin Van Buren (8)

William Henry Harrison (21)

John Tyler (10)

James Knox Polk (18)

Zachary Taylor (16)

Millard Fillmore (11)

Franklin Pierce (19)

James Buchanan (13)

Abraham Lincoln (9)

Andrew Johnson (12)

Ulysses S. Grant (17)

Rutherford B. Hayes (20)

James Garfield (22)

Chester Arthur (15)

Grover Cleveland 1 (23)

Benjamin Harrison (14)

Grover Cleveland 2 (25)

William McKinley (24)

Theodore Roosevelt (29)

William Howard Taft (27)

Woodrow Wilson (30)

Warren Harding (26)

Calvin Coolidge (28)

Herbert Hoover (31)

Franklin D. Roosevelt (32)

Harry S. Truman (33)

Dwight Eisenhower (34)

John F. Kennedy (37)

Lyndon B. Johnson (36)

Richard Nixon (39)

Gerald Ford (35)

James Carter (38)

Ronald Reagan (40)

George H.W. Bush (41)

William Clinton (42)

George W. Bush (43)

Barack Obama (44)

error bars = median and minimumsigma

c alibration of individuals
Calibration of individuals

t

individual

t

distance to ground truth

s

inferred noise level for each individual

alternative heuristic models
Alternative Heuristic Models
  • Many heuristic methods from voting theory
    • E.g., Borda count method
  • Suppose we have 10 items
    • assign a count of 10 to first item, 9 for second item, etc
    • add counts over individuals
    • order items by the Borda count
    • i.e., rank by average rank across people
overview of talk1
Overview of talk
  • Reconstruct the order of US presidents
  • Effect of group size and expertise
  • Reconstruct the order of events
  • Traveling Salesman Problem
experiment
Experiment
  • 78 participants
  • 17 ordering problems each with 10 items
    • Chronological Events
    • Physical Measures
    • Purely ordinal problems, e.g.
      • Ten Amendments
      • Ten commandments
ordering states west east
Ordering states west-east

Oregon (1)

Utah (2)

Nebraska (3)

Iowa (4)

Alabama (6)

Ohio (5)

Virginia (7)

Delaware (8)

Connecticut (9)

Maine (10)

ordering ten amendments
Ordering Ten Amendments

Freedom of speech & religion (1)

Right to bear arms (2)

No quartering of soldiers (4)

No unreasonable searches (3)

Due process (5)

Trial by Jury (6)

Civil Trial by Jury (7)

No cruel punishment (8)

Right to non-specified rights (10)

Power for the States & People (9)

how effective are small groups of experts
How effective are small groups of experts?
  • Want to find experts endogenously – without feedback
  • Approach: select individuals with the smallest estimated noise levels based on previous tasks
  • We are identifying general expertise (“Pearson’s g”)
group composition based on prior performance
Group Composition based on prior performance

# previous tasks

T = 0

T = 2

T = 8

t

Group size (best individuals first)

slide33

Endogenous no feedback

required

Exogenousselecting people based on actual performance

t

t

overview of talk2
Overview of talk
  • Reconstruct the order of US presidents
  • Effect of group size and expertise
  • Reconstruct the order of events
  • Traveling Salesman Problem
recollecting order from episodic memory
Recollecting Order from Episodic Memory

Study this sequence of images

c alibration of individuals1
Calibration of individuals

t

individual

distance to ground truth

s

inferred noise level

(pizza sequence; perturbation model)

overview of talk3
Overview of talk
  • Reconstruct the order of US presidents
  • Effect of group size and expertise
  • Reconstruct the order of events
  • Traveling Salesman Problem
find the shortest route between cities
Find the shortest route between cities

Individual 5

Individual 83

Optimal

Individual 60

B30-21

dataset vickers bovet lee hughes 2003
Dataset Vickers, Bovet, Lee, & Hughes (2003)
  • 83 participants
  • 7 problems of 30 cities
tsp aggregation problem
TSP Aggregation Problem
  • Data consists of city order only
    • No access to city locations
heuristic approach
Heuristic Approach
  • Idea: find tours with edges for which many individuals agree
  • Calculate agreement matrix A
    • A = n × n matrix, where n is the number of cities
    • aij indicates the number of participants that connect cities i and j.
  • Find tour that maximizes

(this itself is a non-Euclidian TSP problem)

summary
Summary
  • Combine ordering / ranking data
    • going beyond numerical estimates or multiple choice questions
  • Incorporate individual differences
    • assume some individuals might be “experts”
    • going beyond models that treat every vote equally
  • Applications
    • combine multiple eyewitness accounts
    • combine solutions in complex problem-solving situations
    • fantasy football
that s all
That’s all

Do the experiments yourself:

http://psiexp.ss.uci.edu/

predictive rankings fantasy football
Predictive Rankings: fantasy football

Australian Football League (29 people rank 16 teams)

South Australian Football League (32 people rank 9 teams)

predicting problem difficulty
Predicting problem difficulty

city size rankings

t

t

distance of group answer to ground truth

ordering states

geographically

std( s )

dispersion of noise levels across individual

related concepts in supervised learning
Related Concepts in Supervised Learning
  • Boosting
    • combining multiple classifiers
  • Bagging (Bootstrap Aggregating)