380 likes | 475 Views
Click Chain Model in Web Search. Fan Guo Carnegie Mellon University. PPT Revised and Presented by Xin Xin. Outline. Background and motivation Designing a click model Algorithms Experiments. How to utilize users’ feedback to improve search engine results?. Diverse User Feedback.
E N D
Click Chain Model in Web Search Fan GuoCarnegie Mellon University PPT Revised and Presented by Xin Xin
Outline • Background and motivation • Designing a click model • Algorithms • Experiments
How to utilize users’ feedback to improve search engine results?
Diverse User Feedback • Click-through • Browser action • Dwelling time • Explicit judgment • Other page elements 5
Web Search Click Log • Auto-generated data keeping important information about search activity.
How large is the clicklog? • search logs: 10+ TB/day • In existing publications: • [Craswell+08]: 108k sessions • [Dupret+08] : 4.5M sessions (21 subsets * 216k sessions) • [Guo +09a] : 8.8M sessions from 110k unique queries • [Guo+09b]: 8.8M sessions from 110k unique queries • [Chapelle+09]: 58M sessions from 682k unique queries • [Liu+09a]: 0.26PB data from 103M unique queries
Intuition to Utilize Clicks • Adapt ranking to user clicks # of clicks received
Position Bias Problem # of clicks received
Problem Definition • Given a click log data set, for each query-document pair, compute user-perceived relevance and the solution should be • Aware of the position bias and context dependency • Scalable to Terabyte data • Incremental to stay updated
Outline • Background and motivation • Designing a click model • Algorithms • Experiments
Examination Hypothesis • A document must be examined before a click. • The (conditional) probability of click upon examination depends on document relevance.
Cascade Hypothesis • The first document is always examined. • First-order Markov property: • Examination at position (i+1) depends on examination and click at position i only • Examination follows a strict linear order: Position i Position (i+1)
User Behavior Description Examine the Document Click? No Yes See Next Doc? No Yes Done Yes See Next Doc? No Done
Click Chain Model … R1 R2 R3 R4 R5 Cascade Hypothesis … E1 E2 E3 E4 E5 Examination Hypothesis C1 C2 C3 C4 C5 …
Outline • Background and motivation • Designing a click model • Algorithms • Experiments
A Coin-Toss Example for Bayesian Framework Posterior Prior Density Function(not normalized) x1(1-x)0x2(1-x)0 x3(1-x)0x3(1-x)1 x4(1-x)1
Click Data Example x1(1-x)0(1-0.6x)0(1+0.3x)1(1-0.5x)0(1-0.2x)0 … x1(1-x)1(1-0.6x)0(1+0.3x)1(1-0.5x)0(1-0.2x)0 … x2(1-x)1(1-0.6x)0(1+0.3x)2(1-0.5x)0(1-0.2x)0 … x3(1-x)1(1-0.6x)1(1+0.3x)2(1-0.5x)0(1-0.2x)0 … x3(1-x)1(1-0.6x)1(1+0.3x)2(1-0.5x)1(1-0.2x)0 … Prior Density Function(not normalized)
0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 …
0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 …
0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 …
0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 …
0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 …
Outline • Background and motivation • Designing a click model • Algorithms • Experiments
Data Set • Collected in 2 weeks in July 2008. • Preprocessing: • Discard no-click sessions for fair comparison. • 178 most frequent queries removed. • Split to training/test sets according to time stamps.
Data Set • After preprocessing: • 110,630 distinct queries; • 4.8M/4.0M query sessions in the training/test set.
Metric • Efficiency: • Computational Time • Effectiveness: • Perplexity • Log-likely hood • Click Prediction.
Competitors • UBM: User Browsing Model (Dupret et al., SIGIR’08) • DCM: Dependent Click Model (WSDM’09)
Results - Time Environment: Unix Server, 2.8GHz cores, MATLAB R2008b.
Results – Perplexity Worse Better
Results – Log Likelihood Better Worse