Less is more probabilistic models for retrieving fewer relevant documents
Download
1 / 32

less is more probabilistic models for retrieving fewer relevant documents - PowerPoint PPT Presentation


  • 322 Views
  • Uploaded on

Less is More Probabilistic Models for Retrieving Fewer Relevant Documents. Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006. Outline. Motivations Expected Metric Principle Metrics Bayesian Retrieval Objectives Heuristics Experimental Results Related Work

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'less is more probabilistic models for retrieving fewer relevant documents' - emily


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Less is more probabilistic models for retrieving fewer relevant documents l.jpg

Less is MoreProbabilistic Models for Retrieving Fewer Relevant Documents

Harr Chen, David R. Karger

MIT CSAIL

ACM SIGIR 2006

August 9, 2006


Outline l.jpg
Outline

  • Motivations

  • Expected Metric Principle

  • Metrics

  • Bayesian Retrieval

  • Objectives

  • Heuristics

  • Experimental Results

  • Related Work

  • Future Work and Conclusions

ACM SIGIR 2006


Motivation l.jpg
Motivation

  • In IR, we have formal models, and formal metrics

  • Models provide framework for retrieval

    • E.g.: Probabilistic

  • Metrics provide rigorous evaluation mechanism

    • E.g.: Precision and recall

  • Probability ranking principle (PRP) provably optimal for precision/recall

    • Ranking by probability of relevance

  • But other metrics capture other notions of result set quality  and PRP isn’t necessarily optimal

ACM SIGIR 2006


Example diversity l.jpg
Example: Diversity

  • User may be satisfied with one relevant result

    • Navigational queries, question/answering

  • In this case, we want to “hedge our bets” by retrieving for diversity in result set

    • Better to satisfy different users with different interpretations, than one user many times over

  • Reciprocal rank/search length metrics capture this notion

  • PRP is suboptimal

ACM SIGIR 2006


Ir system design l.jpg
IR System Design

  • Metrics define preference ordering on result sets

    • Metric[Result set 1] > Metric[Result set 2]

      Result set 1 preferred to Result set 2

  • Traditional approach: Try out heuristics that we believe will improve relevance performance

    • Heuristics not directly motivated by metric

    • E.g. synonym expansion, psuedorelevance feedback

  • Observation: Given a model, we can try to directly optimize for some metric

ACM SIGIR 2006


Expected metric principle emp l.jpg
Expected Metric Principle (EMP)

  • Knowing which metric to use tells us what to maximize for – the expected value of the metric for each result set, given a model

Corpus

Result Sets

Calculate

E[Metric]

using

model

Return

set

with

max

score

1, 2

Document 1

1, 3

2, 1

Document 2

2, 3

3, 1

Document 3

3, 2

ACM SIGIR 2006


Our contributions l.jpg
Our Contributions

  • Primary: EMP – metric as retrieval goal

    • Metric designed to measure retrieval quality

      • Metrics we consider: precision/recall @ n, search length, reciprocal rank, instance recall, k-call

    • Build probabilistic model

    • Retrieve to maximize an objective: the expected value of metric

      • Expectations calculated according to our probabilistic model

    • Use computational heuristics to make optimization problem tractable

  • Secondary: retrieving for diversity (special case)

    • A natural side effect of optimizing for certain metrics

ACM SIGIR 2006


Detour what is a heuristic l.jpg

Ad hoc approach

Use heuristics that are believed to be correlated with good performance

Heuristics used to improve relevance

Heuristics (probably) make system slower

Infinite number of possibilities, no formalism

Model, heuristics intertwined

Our approach

Build model that directly optimizes for good performance

Heuristics used to improve efficiency

Heuristics (probably) make optimization worse

Well-known space of optimization techniques

Clean separation between model and heuristics

Detour: What is a Heuristic?

ACM SIGIR 2006


Our contributions9 l.jpg
Our Contributions

  • Primary: EMP – metric as retrieval goal

    • Metric designed to measure retrieval quality

      • Metrics we consider: precision/recall @ n, search length, reciprocal rank, instance recall, k-call

    • Build probabilistic model

    • Retrieve to maximize an objective: the expected value of metric

      • Expectations calculated according to our probabilistic model

    • Use computational heuristics to make optimization problem tractable

  • Secondary: retrieving for diversity (special case)

    • A natural side effect of optimizing for certain metrics

ACM SIGIR 2006


Search length reciprocal rank l.jpg
Search Length/Reciprocal Rank

  • (Mean) search length (MSL): number of irrelevant results until first relevant

  • (Mean) reciprocal rank (MRR): one over rank of first relevant

}

Search length = 2

Reciprocal rank = 1/3

ACM SIGIR 2006


Instance recall l.jpg
Instance Recall

  • Each topic has multiple instances (subtopics, aspects)

  • Instance recall is how many instances covered (in union) over first n results

}

Instance recall @ 5 = 0.75

ACM SIGIR 2006


K call @ n l.jpg
k-call @ n

  • Binary metric: 1 if top n results has k relevant, 0 otherwise

  • 1-call is (1 – %no)

    • See TREC robust track

}

1-call @ 5 = 1

2-call @ 5 = 1

3-call @ 5 = 0

ACM SIGIR 2006


Motivation for k call l.jpg
Motivation for k-call

  • 1-call: Want one relevant document

    • Many queries satisfied with one relevant result

    • Only need one relevant document, more room to explore  promotes result set diversity

  • n-call: Want all relevant documents

    • “Perfect precision”

    • Hone in on one interpretation and stick to it!

  • Intermediate k

    • Risk/reward tradeoff

  • Plus, easily modeled in our framework

    • Binary variable

ACM SIGIR 2006


Our contributions14 l.jpg
Our Contributions

  • Primary: EMP – metric as retrieval goal

    • Metric designed to measure retrieval quality

      • Metrics we consider: precision/recall @ n, search length, reciprocal rank, instance recall, k-call

    • Build probabilistic model

    • Retrieve to maximize an objective: the expected value of metric

      • Expectations calculated according to our probabilistic model

    • Use computational heuristics to make optimization problem tractable

  • Secondary: retrieving for diversity (special case)

    • A natural side effect of optimizing for certain metrics

ACM SIGIR 2006


Bayesian retrieval model l.jpg
Bayesian Retrieval Model

  • There exists distributions that generate relevant documents, irrelevant documents

  • PRP: rank by

  • Remaining modeling questions: form of rel/irrel distributions and parameters for those distributions

  • In this paper, we assume multinomial models, and choose parameters by maximum a posteriori

    • Prior is background corpus word distribution

ACM SIGIR 2006


Our contributions16 l.jpg
Our Contributions

  • Primary: EMP – metric as retrieval goal

    • Metric designed to measure retrieval quality

      • Metrics we consider: precision/recall @ n, search length, reciprocal rank, instance recall, k-call

    • Build probabilistic model

    • Retrieve to maximize an objective: the expected value of metric

      • Expectations calculated according to our probabilistic model

    • Use computational heuristics to make optimization problem tractable

  • Secondary: retrieving for diversity (special case)

    • A natural side effect of optimizing for certain metrics

ACM SIGIR 2006


Objective l.jpg
Objective

  • Probability Ranking Principle (PRP): maximize at each step in ranking

  • Expected Metric Principle (EMP): maximize for complete result set

  • In particular for k-call, maximize:

ACM SIGIR 2006


Our contributions18 l.jpg
Our Contributions

  • Primary: EMP – metric as retrieval goal

    • Metric designed to measure retrieval quality

      • Metrics we consider: precision/recall @ n, search length, reciprocal rank, instance recall, k-call

    • Build probabilistic model

    • Retrieve to maximize an objective: the expected value of metric

      • Expectations calculated according to our probabilistic model

    • Use computational heuristics to make optimization problem tractable

  • Secondary: retrieving for diversity (special case)

    • A natural side effect of optimizing for certain metrics

ACM SIGIR 2006


Optimization of objective l.jpg
Optimization of Objective

  • Exact optimization of objective is usually NP-hard

    • E.g.: Exact optimization for k-call reducible to NP-hard maximum graph clique problem

  • Approximation heuristic: Greedy algorithm

    • Select documents successively in rank order

    • Hold previous documents fixed, optimize objective at each rank

Maximize E[metric | d]

d1

ACM SIGIR 2006


Optimization of objective20 l.jpg
Optimization of Objective

  • Exact optimization of objective is usually NP-hard

    • E.g.: Exact optimization for k-call reducible to NP-hard maximum graph clique problem

  • Approximation heuristic: Greedy algorithm

    • Select documents successively in rank order

    • Hold previous documents fixed, optimize objective at each rank

Fixed

d1

Maximize E[metric | d, d1]

d2

ACM SIGIR 2006


Optimization of objective21 l.jpg
Optimization of Objective

  • Exact optimization of objective is usually NP-hard

    • E.g.: Exact optimization for k-call reducible to NP-hard maximum graph clique problem

  • Approximation heuristic: Greedy algorithm

    • Select documents successively in rank order

    • Hold previous documents fixed, optimize objective at each rank

Fixed

d1

Fixed

d2

Maximize E[metric | d, d1, d2]

d3

ACM SIGIR 2006


Greedy on 1 call and n call l.jpg
Greedy on 1-call and n-call

  • 1-greedy

    • Greedy algorithm reduces to ranking each successive document assuming all previous documents are irrelevant

    • Algorithm has “discovered” incremental negative pseudorelevance feedback

  • n-greedy: Assume all previous documents relevant

ACM SIGIR 2006


Greedy on other metrics l.jpg
Greedy on Other Metrics

  • Greedy with precision/recall  reduces to PRP!

  • Greedy on k-call for general k (k-greedy)

    • More complicated…

  • Greedy with MSL, MRR, instance recall works out to 1-greedy algorithm

    • Intuition: to make first relevant document appear earlier, we want to hedge our bets as to query interpretation (i.e., diversify)

ACM SIGIR 2006


Experiments overview l.jpg
Experiments Overview

  • Experiments verify that optimizing for metric improves performance on metric

    • They do not tell us which metrics to use

  • Looked at ad hoc diversity examples

  • TREC topics/queries

  • Tuned weights on separate development set

  • Tested on:

    • Standard ad hoc (robust track) topics

    • Topics with multiple annotators

    • Topics with multiple instances

ACM SIGIR 2006


Diversity on google results l.jpg
Diversity on Google Results

  • Task: reranking top 1,000 Google results

  • In optimizing 1-call, our algorithm finds more diverse results than PRP, Google results

ACM SIGIR 2006


Experiments robust track l.jpg
Experiments: Robust Track

  • TREC 2003, 2004 robust tracks

    • 249 topics

    • 528,000 documents

  • 1-call, 10-call results statistically significant

ACM SIGIR 2006


Experiments instance retrieval l.jpg
Experiments: Instance Retrieval

  • TREC-6,7,8 interactive tracks

    • 20 topics

    • 210,000 documents

    • 7 to 56 instances per topic

  • PRP baseline: instance recall @ 10 = 0.234

  • Greedy 1-call: instance recall @ 10 = 0.315

ACM SIGIR 2006


Experiments multi annotator l.jpg
Experiments: Multi-annotator

  • TREC-4,6 ad hoc retrieval

    • Independent annotators assessed same topics

    • TREC-4: 49 topics, 568,000 documents, 3 annotators

    • TREC-6: 50 topics, 556,000 documents, 2 annotators

  •  More annotators more satisfied using 1-greedy

ACM SIGIR 2006


Related work l.jpg
Related Work

  • Fits in risk minimization framework (objective as negative loss function)

  • Other approaches look at optimizing for metrics directly, with training data

  • Pseudorelevance feedback

  • Subtopic retrieval

  • Maximal marginal relevance

  • Clustering

  • See paper for references

ACM SIGIR 2006


Future work l.jpg
Future Work

  • General k-call (k = 2, etc.)

    • Determination if this is what users want

  • Better underlying probabilistic model

    • Our contribution is in the ranking objective, not the model  model can be arbitrarily sophisticated

  • Better optimization techniques

    • E.g., Local search would differentiate algorithms for MRR and 1-call

  • Other metrics

    • Preliminary work on mean average precision, precision @ recall

      • (Perhaps) surprisingly, these metrics are not optimized by PRP!

ACM SIGIR 2006


Conclusions l.jpg
Conclusions

  • EMP: Metric can motivate model – choosing and believing in a metric already gives us a reasonable objective, E[metric]

  • Can potentially apply EMP on top of a variety of different underlying probabilistic models

  • Diversity is one practical example of a natural side effect of using EMP with the right metric

ACM SIGIR 2006


Acknowledgments l.jpg
Acknowledgments

  • Harr Chen supported by the Office of Naval Research through a National Defense Science and Engineering Graduate Fellowship

  • Jaime Teevan, Susan Dumais, and anonymous reviewers provided constructive feedback

  • ChengXiang Zhai, William Cohen, and Ellen Voorhees provided code and data

ACM SIGIR 2006


ad