Click Chain Model in Web Search

Click Chain Model in Web Search Fan GuoCarnegie Mellon University PPT Revised and Presented by Xin Xin

Outline • Background and motivation • Designing a click model • Algorithms • Experiments

How to utilize users’ feedback to improve search engine results?

Diverse User Feedback • Click-through • Browser action • Dwelling time • Explicit judgment • Other page elements 5

Web Search Click Log • Auto-generated data keeping important information about search activity.

A real world example

How large is the clicklog? • search logs: 10+ TB/day • In existing publications: • [Craswell+08]: 108k sessions • [Dupret+08] : 4.5M sessions (21 subsets * 216k sessions) • [Guo +09a] : 8.8M sessions from 110k unique queries • [Guo+09b]: 8.8M sessions from 110k unique queries • [Chapelle+09]: 58M sessions from 682k unique queries • [Liu+09a]: 0.26PB data from 103M unique queries

Intuition to Utilize Clicks • Adapt ranking to user clicks # of clicks received

Position Bias Problem # of clicks received

Problem Definition • Given a click log data set, for each query-document pair, compute user-perceived relevance and the solution should be • Aware of the position bias and context dependency • Scalable to Terabyte data • Incremental to stay updated

Examination Hypothesis • A document must be examined before a click. • The (conditional) probability of click upon examination depends on document relevance.

Cascade Hypothesis • The first document is always examined. • First-order Markov property: • Examination at position (i+1) depends on examination and click at position i only • Examination follows a strict linear order: Position i Position (i+1)

User Behavior Description Examine the Document Click? No Yes See Next Doc? No Yes Done Yes See Next Doc? No Done

Click Chain Model … R1 R2 R3 R4 R5 Cascade Hypothesis … E1 E2 E3 E4 E5 Examination Hypothesis C1 C2 C3 C4 C5 …

A Coin-Toss Example for Bayesian Framework Posterior Prior Density Function(not normalized) x1(1-x)0x2(1-x)0 x3(1-x)0x3(1-x)1 x4(1-x)1

Click Data Example x1(1-x)0(1-0.6x)0(1+0.3x)1(1-0.5x)0(1-0.2x)0 … x1(1-x)1(1-0.6x)0(1+0.3x)1(1-0.5x)0(1-0.2x)0 … x2(1-x)1(1-0.6x)0(1+0.3x)2(1-0.5x)0(1-0.2x)0 … x3(1-x)1(1-0.6x)1(1+0.3x)2(1-0.5x)0(1-0.2x)0 … x3(1-x)1(1-0.6x)1(1+0.3x)2(1-0.5x)1(1-0.2x)0 … Prior Density Function(not normalized)

Estimating P(C|Ri)

0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 …

Putting them together

Alpha Estimation

Data Set • Collected in 2 weeks in July 2008. • Preprocessing: • Discard no-click sessions for fair comparison. • 178 most frequent queries removed. • Split to training/test sets according to time stamps.

Data Set • After preprocessing: • 110,630 distinct queries; • 4.8M/4.0M query sessions in the training/test set.

Metric • Efficiency: • Computational Time • Effectiveness: • Perplexity • Log-likely hood • Click Prediction.

Competitors • UBM: User Browsing Model (Dupret et al., SIGIR’08) • DCM: Dependent Click Model (WSDM’09)

Results - Time Environment: Unix Server, 2.8GHz cores, MATLAB R2008b.

Results – Perplexity Worse Better

Results – Log Likelihood Better Worse

First Clicked Position

Last Clicked Position

The End

Click Chain Model in Web Search

Click Chain Model in Web Search

Presentation Transcript

Supply Chain Model

Click Chain Model in Web Search

Web Search

WSCD09 Workshop on Web Search Click Data 2009

Personalized Ranking Model Adaptation for Web Search

Web Search

Web Search

Click on ‘Search Availability’

Daisy chain model

Web Search with Variable User Model

Click Here to Search

Statistic Models for Web/Sponsored Search Click Log Analysis

Web Search

Web Search

Challenges in Web Search

Statistical Models for Web Search Click Log Analysis

Web Search

Web Search

Web Search

Web Search