Accurately Interpreting Clickthrough Data as Implicit Feedback

Accurately Interpreting Clickthrough Data as Implicit Feedback Joachims, Granka, Pan, Hembrooke, Gay Paper Presentation: Vinay Goel 10/27/05

Introduction • Adapt a retrieval system to users and or collections • Manual adaptation - time consuming or even impractical • Explore and evaluate implicit feedback • Use clickthrough data in WWW search

User Study • Record and evaluate user actions • Provide insight into the decision process • Record users’ eye movements : Eye tracking

Questions used

Two Phases of the study • Phase I • 34 participants • Start search with Google query, search for answers • Phase II • Investigate how users react to manipulations of search results • Same instructions as phase I • Each subject assigned to one of three experimental conditions • Normal, Swapped, Reversed

Explicit Relevance Judgments • Collected explicit relevance judgments for all queries and results pages • Inter-judge agreements

Analysis of user behavior • Which links do users view and click? • Do users scan links from top to bottom? • Which links do users evaluate before clicking?

Which links do users view and click? • Almost equal frequency of 1st and 2nd link, but more clicks on 1st link • Once the user has started scrolling, rank appears to become less of an influence

Do users scan links from top to bottom? • Big gap before viewing 3rd ranked abstract • Users scan viewable results thoroughly before scrolling

Which links do users evaluate before clicking? • Abstracts closer above the clicked link are more likely to be viewed • Abstract right below a link is viewed roughly 50% of the time

Analysis of Implicit Feedback • Does relevance influence user decisions? • Are clicks absolute relevance judgments?

Does relevance influence user decisions? • Yes • Use the “reversed” condition • Controllably decreases the quality of the retrieval function and relevance of highly ranked abstracts • Users react in two ways • View lower ranked links more frequently, scan significantly more abstracts • Subjects are much less likely to click on the first link, more likely to click on a lower ranked link

Clicks = absolute relevance judgments? • Interpretation is problematic • Trust Bias • Abstract ranked first receives more clicks than the second • First link is more relevant (not influenced by order of presentation) or • Users prefer the first link due to some level of trust in the search engine (influenced by order of presentation)

Trust Bias • Hypothesis that users are not influenced by presentation order can be rejected • Users have substantial trust in search engine’s ability to estimate relevance

Quality Bias • Quality of the ranking influences the user’s clicking behavior • If relevance of retrieved results decreases, users click on abstracts that are on average less relevant • Confirmed by the “reversed” condition

Are clicks relative relevance judgments? • An accurate interpretation of clicks needs to take into consideration • User’s trust into quality of search engine • Quality of retrieval function itself • Difficult to measure explicitly • Interpret clicks as pairwise preference statements

Strategy 1 • Takes trust and quality bias into consideration • Substantially and significantly better than random • Close in accuracy to inter judge agreement

Strategy 2 • Slightly more accurate than Strategy 1 • Not a significant difference in Phase II

Strategy 3 • Accuracy worse than Strategy 1 • Ranking quality has an effect on the accuracy

Strategy 4 • No significant differences compared to Strategy 1

Strategy 5 • Highly accurate in the “normal” condition • Misleading • Aligned preferences probably less valuable for learning • Better results even if user behaves randomly • Less accurate than Strategy 1 in the “reversed” condition

Conclusion • Users’ clicking decisions influenced by search bias and quality bias • Strategies for generating relative relevance feedback signals • Implicit relevance signals are less consistent with explicit judgments than the explicit judgments among each other • Encouraging results

Accurately Interpreting Clickthrough Data as Implicit Feedback