490 likes | 594 Views
Explore how introducing randomness into ranking can mitigate the entrenchment effect in search engine results, improving the quality of pages seen by users. The correlation between popularity and quality, impact of search engines on page popularity, and alternative approaches are discussed.
E N D
Shuffling a Stacked DeckThe Case for Partially Randomized Ranking of Search Engine Results Sandeep Pandey1, Sourashis Roy2, Christopher Olston1, Junghoo Cho2, Soumen Chakrabarti3 1 Carnegie Mellon 2 UCLA 3 IIT Bombay
--------- • --------- • --------- Popularity as a Surrogate for Quality • Search engines want to measure the “quality” of pages • Quality hard to define and measure • Various “popularity” measures are used in ranking • e.g., in-links, PageRank, usertraffic
Relationship Between Popularity and Quality • Popularity : depends on the number of users who “like” a page • relies on both awareness and quality of the page • Popularity correlated with quality • when awareness is large
Problem • Popularity/quality correlation weak for young pages • Even if of high quality, may not (yet) be popular due to lack of user awareness • Plus, process of gaining popularity inhibited by “entrenchment effect”
--------- • --------- • --------- • --------- • --------- • --------- … user attention entrenched pages Entrenchment Effect • Search engines show entrenched (already-popular) pages at the top • Users discover pages via search engines; tend to focus on top results
Outline • Problem introduction • Evidence of entrenchment effect • Key idea: Mitigate entrenchment by introducing randomness into ranking • Model of ranking and popularity evolution • Evaluation • Summary
Evidences of the Entrenchment Do search engines suppress controversy? - Susan L. Gerhart More news, less diversity - New York Times Googlearchy Distinction of retrievability and visibility The politics of search engines - IEEE Computer • The political economy • of linking on the Web • ACM conf. on • Hypertext & Hypermedia Are search engines biased? - Chris Sherman Bias on the Web - Comm. of the ACM
Quantification of Entrenchment Effect • Impact of Search Engines on Page Popularity • Real Web study by Cho et. al. [WWW’04] • Pages downloaded every week from 154 sites • Partitioned into 10 groups based on initial link popularity • After 7 months, • 70% of new links to top 20% pages • Decrease in PageRank for bottom 50% pages
Alternative Approaches to Counter-act Entrenchment Effect • Weight links to young pages more • [Baeza-Yates et. al SPIRE ’02] • Proposed an age-based variant of PageRank • Extrapolate quality based on increase in popularity • [Cho et. al SIGMOD ’05] • Proposed an estimate of quality based on the derivative of popularity
1 1 500 2 2 3 . . . 3 . 500 499 501 501 Our Approach: Randomized Rank Promotion • Select random (young) pages to promote to good rank positions • Rank position to promote to is chosen at random
Our Approach: Randomized Rank Promotion • Consequence: Users visit promoted pages; improves quality estimate • Compared with previous approaches: • Does not rely on temporal measurements (+) • Sub-optimal (-)
Exploration/Exploitation Tradeoff • Exploration/Exploitation tradeoff • exploit known high-quality pages by assigning good rank positions • explore quality of new pages by promoting them in rank • Existing search engines only exploit (to our knowledge)
Possible Objectives for Rank Promotion • Fairness • Give each page an equal chance to become popular • Incentive for search engines to be fair? • Quality • Maximize quality of search results seen by users (in aggregate) • Quality page p: extent to which users “like” p • Q(p) [0,1] our choice
Squash Linux Model of the Web • Web = collection of multiple disjoint topic-specific communities (e.g., ``Linux’’, ``Squash’’ etc.) • A community is made up of a set of pages, interested users and related queries
Model of the Web • Users visit pages only by issuing queries to search engine • Mixed surfing & searching considered in the paper • Query answer = ordered list containing all pages in the corresponding community • A single ranked list associated with each community • Since queries within a community are very similar
--------- • --------- • --------- • --------- • --------- • --------- … • --------- • --------- • --------- • --------- • --------- • --------- … Model of the Web Community on Squash Community on Linux • Consequence: Each community evolves independent of the other communities
Quality-Per-Click Metric (QPC) • V(p,t):number of visits to page p at time t • QPC : average quality of pages viewed by users, amortized over time
Outline • Problem introduction • Evidence of entrenchment effect • Key idea: Mitigate entrenchment by introducing randomness into ranking • Model of ranking and popularity evolution • Evaluation • Summary
1 1 500 2 2 3 . . . 3 . 500 499 501 501 Desiderata for Randomized Rank Promotion Want ability to: • Control exploration/exploitation tradeoff • “Select” certain pages as candidates for promotion • “Protect’’ certain pages from demotion
1 2 W 3 4 1 2 3 4 Randomized Rank Promotion Scheme Promotion pool Wm random ordering Remainder W-Wm Lm order by popularity Ld
1-r r k-1 Randomized Rank Promotion Scheme Promotion list Remainder 1 2 1 2 4 3 Ld Lm 1 2 3 4 5 6 k = 3 r = 0.5
Parameters • Promotion pool(Wm) • Uniform rank promotion : give an equal chance to each page • Selective rank promotion : exclusively target zero awareness pages • Start rank (k) • rank to start randomization from • Degree of randomization (r) • controls the tradeoff between exploration and exploitation
Tuning the Parameters • Objective: maximize quality-per-click (QPC) • Entrenchment in a community depends on many factors • Number of pages and users • Page lifetimes • Visits per user • Two ways to tune • set parameters per community • one parameter setting for all communities
Outline • Problem introduction • Evidence of entrenchment effect • Key idea: Mitigate entrenchment by introducing randomness into ranking • Model of ranking and popularity evolution • Evaluation • Summary
Popularity Evolution Cycle Popularity P(p,t) Awareness A(p,t) Rank R(p,t) Visit rate V(p,t)
DETAIL Popularity to Rank Relationship • Rank of a page under randomized rank promotion scheme • determined by a combination of popularity and randomness • Deterministic Popularity-based-ranking is a special case • i.e., r=0 • Unknown function FPR:rank as a function ofthe popularity of page p under a given randomized scheme R(p,t) = FPR(P(p,t))
DETAIL Viewing Likelihood • Depends primarily on rank in list [Joachims KDD’02] • From AltaVista data [Lempel et al. WWW’03]: 1 . 2 1 0 . 8 view probability 0 . 6 Probability of Viewing FRV(r) r –1.5 0 . 4 0 . 2 0 0 5 0 1 0 0 1 5 0 rank R a n k
DETAIL Visit to Awareness Relationship • Awareness A(p,t) :fraction of users who have visited page p at least once by time t
DETAIL Awareness to Popularity Relationship • Quality Q(p) :extent to which users like page p (contribute towards its popularity) • Popularity P(p,t) :
Popularity Evolution Cycle FPR(P(p,t)) FAP(A(p,t)) Popularity P(p,t) Awareness A(p,t) Rank R(p,t) Visit rate V(p,t) FRV(R(p,t)) FVA(V(p,t))
Next step : derive formula for popularity evolution curve Popularity P(p,t) time (t) Deriving Popularity Evolution Curve • Derive it using the awareness distribution of pages
Deriving Popularity Evolution Curve • Assumptions • number of pages constant • Pages are created and retired according to a Poisson process with rate parameter • Quality distribution of pages is stationary In the steady state, both popularity and awareness distribution of the pages are stationary
DETAIL Popularity Evolution Curve and Awareness Distribution Awareness distribution : fraction of pages of quality q whose awareness is i / (#users) Popularity EvolutionCurveE(x,q) : time duration for which a page of quality q has popularity value x Next: derive popularity evolution curve using the awareness distribution
DETAIL Popularity Evolution Curve and Awareness Distribution : interpret it as the probability of a page of quality q to have awareness ai at any point of time We know that : Hence,
DETAIL Deriving Awareness Distribution • : fraction of pages of quality q whose awareness is i / (#users) • Doing the steady state analysis, we get but remember that we do not know FPRyet R(p,t) = FPR(P(p,t))
DETAIL Deriving Awareness Distribution Good news: rank is a combination of popularity and randomness, we can derive FPR given . (ex. below) Start with an initial form of FPR; iterate till convergence
Summary of Where We Stand • Formalized the popularity evolution cycle • Relationship between popularity evolution and awareness distribution • Derived the awareness distribution • Next step: tune parameters • Recall, goal is to obtain scheme that: • achieves high QPC (quality per click) • is robust across a wide range of community types
Tuning the Promotion Scheme • Parameters: k, r and Wm • Objective: maximize QPC • Influential factors: • Number of pages and users • Page lifetimes • Visits per user
Default Community Setting Number of pages = 10,000 * Number of users = 1000 Visits per user = 1000 visits per day Page lifetimes = 1.5 years [Ntoulas et. al, WWW’04 ] * How Much Information? SIMS, Berkeley, 2003
Tuning: Wm parameter • -no promotion • - uniform promotion • selective promotion k=1 and r=0.2
Tuning: k and r • Optimal r: (0,1) • Optimal r increases • with increasing k Based on simulation (reason: analysis only accurate for small values of r)
Tuning: k and r Deciding k & r : • k >= 2 for “feeling lucky” • Minimize amount of “junk” perceived • Maximize QPC
Final Parameter Settings • Promotion pool (Wm ): zero-awareness pages • Start rank (k): 1 or 2 • Randomization (r) : 0.1
Tuning the Promotion Scheme • Parameters: k, r and Wm • Objective: maximize QPC • Influential factors: • Number of pages and users • Page lifetimes • Visits per user
Influence of Visit Rate 1000 visits/day per user
Summary • Entrenchment effect hurts search result quality • Solution: Randomized rank promotion • Model of Web evolution and QPC metric • Used to tune & evaluate randomized rank promotion • Initial results • Significantly increases QPC • Robust across wide range of Web communities • More study required
THE END • Paper available at : www.cs.cmu.edu/~spandey