Shuffling a Stacked Deck
Download
1 / 31

Sandeep Pandey 1 , Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3 - PowerPoint PPT Presentation


  • 68 Views
  • Uploaded on

Shuffling a Stacked Deck The Case for Partially Randomized Ranking of Search Engine Results. Sandeep Pandey 1 , Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3. 1 Carnegie Mellon 2 UCLA 3 IIT Bombay. --------- --------- ---------.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Sandeep Pandey 1 , Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3' - curran-anderson


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Shuffling a Stacked DeckThe Case for Partially Randomized Ranking of Search Engine Results

Sandeep Pandey1, Sourashis Roy2, Christopher Olston1, Junghoo Cho2, Soumen Chakrabarti3

1 Carnegie Mellon

2 UCLA

3 IIT Bombay


Popularity as a surrogate for quality

Popularity as a Surrogate for Quality

  • Search engines want to measure the “quality” of pages

  • Quality is hard to define and measure

  • Various “popularity” measures are used in ranking

    • e.g., in-links, PageRank, usertraffic


Relationship between popularity and quality
Relationship Between Popularity and Quality

aware of

page p

  • Popularity : depends on the number of users who “like” a page

    • relies on both quality and awareness of the page

Users

like page p

  • Popularity is different from quality

    • But strongly correlated when awareness is large


Problem
Problem

  • Popularity/quality correlation weak for young pages

    • Even if of high quality, may not (yet) be popular due to lack of user awareness

  • Plus, process of gaining popularity inhibited by “entrenchment effect”

    • [Cho et. al. WWW’04], [Chakrabarti et. al. SODA’05]

      [Mowshowitz et. al. Communication’02]

      and many others


Entrenchment effect

  • ---------

  • ---------

  • ---------

  • ---------

  • ---------

  • ---------

user attention

entrenched pages

Entrenchment Effect

  • Search engines show entrenched (already-popular) pages at the top

  • Users discover pages via search engines; tend to focus on top results

new unpopular pages


Outline
Outline

  • Problem introduction

  • Key idea: Mitigate entrenchment by introducing randomness into ranking

    • Randomized Rank Promotion Scheme

    • Model of ranking and popularity evolution

    • Evaluation

  • Summary


Alternative approaches to counter act entrenchment effect
Alternative Approaches to Counter-act Entrenchment Effect

  • Weight links to young pages more

    • [Baeza-Yates et. al SPIRE ’02]

    • Proposed an age-based variant of PageRank

  • Extrapolate quality based on increase in popularity

    • [Cho et. al SIGMOD ’05]

    • Proposed an estimate of quality based on the derivative of popularity


Our approach randomized rank promotion

1

1

500

2

2

3

.

.

.

3

.

500

499

501

501

Our Approach: Randomized Rank Promotion

  • Select random (young) pages to promote to good rank positions

  • Rank position to promote to is chosen at random


Our approach randomized rank promotion1
Our Approach: Randomized Rank Promotion

  • Consequence: Users visit promoted pages; improves ability to estimate quality via popularity

  • Compared with previous approaches:

    • Does not rely on temporal measurements (+)

    • Sub-optimal (-)


Exploration exploitation tradeoff
Exploration/Exploitation Tradeoff

  • Exploration/Exploitation tradeoff

    • exploit known high-quality pages by assigning good rank positions

    • explore quality of new pages by promoting them in rank

  • Existing search engines only exploit (to our knowledge)


Possible objectives for rank promotion
Possible Objectives for Rank Promotion

  • Fairness

    • Give each page an equal chance to become popular

    • Incentive for search engines to be fair?

  • Quality

    • Maximize quality of search results seen by users (in aggregate)

    • Quality page p: extent to which users “like” p

    • Q(p) [0,1]

our choice


Quality per click metric qpc
Quality-Per-Click Metric (QPC)

  • V(p,t):number of visits made to page p at time t through search engine

  • QPC : average quality of pages viewed by users, amortized over time


Outline1
Outline

  • Problem introduction

  • Key idea: Mitigate entrenchment by introducing randomness into ranking

    • Randomized Rank Promotion Scheme

    • Model of ranking and popularity evolution

    • Evaluation

  • Summary


Desiderata for randomized rank promotion

1

1

500

2

2

3

.

.

.

3

.

500

499

501

501

Desiderata for Randomized Rank Promotion

Want ability to:

  • Control exploration/exploitation tradeoff

  • “Select” certain pages as candidates for promotion

  • “Protect’’ certain pages from demotion


Randomized rank promotion scheme

1

2

W

3

4

1

2

3

4

Randomized Rank Promotion Scheme

Promotion pool

Wm

random ordering

Remainder

W-Wm

Lm

order by popularity

Ld


Randomized rank promotion scheme1

1-r

r

k-1

Randomized Rank Promotion Scheme

Promotion list

Remainder

1

2

1

2

4

3

Ld

Lm

1

2

3

4

5

6

k = 3 r = 0.5


Parameters
Parameters

  • Promotion pool(Wm)

    • Uniform rank promotion: give an equal chance to each page

    • Selective rank promotion: exclusively target zero awareness pages

  • Start rank (k)

    • rank to start randomization from

  • Degree of randomization (r)

    • controls the tradeoff between exploration and exploitation


Tuning the parameters
Tuning the Parameters

  • Objective: maximize quality-per-click (QPC)

  • Two ways to tune

    • Real-world experiment

    • Analytical modeling


Outline2
Outline

  • Problem introduction

  • Key idea: Mitigate entrenchment by introducing randomness into ranking

    • Randomized Rank Promotion Scheme

    • Model of ranking and popularity evolution

    • Evaluation

  • Summary


Popularity evolution cycle
Popularity Evolution Cycle

Popularity P(p,t)

Awareness A(p,t)

Rank R(p,t)

Visit rate

V(p,t)


Popularity evolution cycle1
Popularity Evolution Cycle

FPR(P(p,t))

FAP(A(p,t))

Popularity P(p,t)

Awareness A(p,t)

Rank R(p,t)

Visit rate

V(p,t)

FRV(R(p,t))

FVA(V(p,t))


Deriving popularity evolution curve

Next step : derive formula for popularity evolution curve

Popularity

P(p,t)

time (t)

Deriving Popularity Evolution Curve

  • Assumptions

    • Number of pages constant

    • Pages are created and retired according to a Poisson process with rate parameter

    • Quality distribution of pages is stationary


Deriving popularity evolution curve1

DETAIL

Deriving Popularity Evolution Curve

Doing the steady state analysis, we get


Use popularity evolution model to tune parameters
Use Popularity Evolution Model to Tune Parameters

  • Model of popularity evolution process(see paper)

    • Complex dynamic process

    • To study, we combine approximate analysis with simulation

  • Next step:use model to tune rank promotion scheme

    • Parameters: k, r and Wm

    • Objective: maximize QPC


Tuning promotion pool w m
Tuning: Promotion Pool (Wm )

  • -no promotion

  • - uniform promotion

  • selective promotion

k=1 and r=0.2


Tuning k and r
Tuning: k and r

k: start rank

r: degree of

randomization


Tuning k and r1
Tuning: k and r

MaximizeQPC

(Quality-per-click)

Avoid excessive

“junk”

Preserve #1 result

for navigational

searches


Model of the web

Linux

Squash

Model of the Web

  • Web = collection of multiple disjoint topic-specific communities (e.g., ``Linux’’, ``Squash’’ etc.)

  • A community is made up of a set of pages, interested users and related queries



Summary
Summary

  • Entrenchment effect hurts search result quality

  • Solution :Randomized rank promotion

  • Model of Web evolution and QPC metric

    • Used to tune & evaluate randomized rank promotion

  • Results :

    • New high-quality pages become popular much faster

    • Aggregate search result quality significantly improved


The end
THE END

  • Paper available at :

    www.cs.cmu.edu/~spandey


ad