1 / 58

Click Chain Model in Web Search

Click Chain Model in Web Search. Fan Guo Carnegie Mellon University. Joint Work With…. Chao Liu. Anitha Kannan. Tom Minka. Mike Taylor. MSR, ISRC-Redmond. MSR, Search Lab. MSR, Cambridge. MSR, Cambridge. Yi-Min Wang. Christos Faloutsos. MSR, ISRC-Redmond.

marika
Download Presentation

Click Chain Model in Web Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Click Chain Model in Web Search Fan GuoCarnegie Mellon University WWW'09, Madrid, Spain

  2. Joint Work With… Chao Liu Anitha Kannan Tom Minka Mike Taylor MSR, ISRC-Redmond MSR, Search Lab MSR, Cambridge MSR, Cambridge Yi-Min Wang Christos Faloutsos MSR, ISRC-Redmond Carnegie Mellon University

  3. WWW'09, Madrid, Spain

  4. Click Logs • Auto-generated data keeping important information about search activity. WWW'09, Madrid, Spain

  5. Problem Definition • Given a click log data set, for each query-document pair, compute user-perceived relevance. Impression Data Click Data … … WWW'09, Madrid, Spain

  6. Relevance Representation 0.75 Previous Click Models Click Chain Model Human Judge 0 1 Integration WWW'09, Madrid, Spain

  7. Applications • Automated Ranking Alterations • Search Engine Performance Metric • Calibrate Human Judgment • Related Application in Sponsored Search WWW'09, Madrid, Spain

  8. Roadmap • Motivation and Problem Definition • Click Model Basics • CCM and Algorithms • Experimental Evaluation • Related Work and Conclusion WWW'09, Madrid, Spain

  9. WWW'09, Madrid, Spain

  10. Eye-Tracking User Study Fixation Heat Map WWW'09, Madrid, Spain

  11. Overall: Fixation is biased towards higher ranks, so do the clicks. • For each position:fixation/clicks are context dependent. Normal Impression Reversed Impression WWW'09, Madrid, Spain

  12. Problem Definition (Recap) • Given a click log data set, for each query-document pair, compute user-perceived relevance and the solution should be • Aware of the position bias and context dependency • Scalable to Terabyte data • Incremental to stay updated WWW'09, Madrid, Spain

  13. Examination Hypothesis • User behavior abstraction: Fixation → binary examination variable Click → binary click variable • A document must be examined before being clicked. WWW'09, Madrid, Spain

  14. Examination Hypothesis • For each position, P(Click=1) = P(Examination=1) * Relevance Relevance = P(Click=1|Examination=1) • The position bias is reflected in the derivation of P(Examination). WWW'09, Madrid, Spain

  15. Cascade Hypothesis • User scans through documents and make decisions in strict linear order. • The decision process: E1, C1, E2, C2,… • Essential part of click model: • What is the probability of “See Next Doc”? WWW'09, Madrid, Spain

  16. Roadmap • Motivation and Problem Definition • Click Model Basics • CCM and Algorithms • Experimental Evaluation • Related Work and Conclusion WWW'09, Madrid, Spain

  17. The Context • Top-10 organic search results only. • Query sessions are independent. • Semantic info are not used. Suggestions Ads Other Elements WWW'09, Madrid, Spain

  18. User Behavior Description Examine the Document Click? No Yes See Next Doc? No Yes Done Yes See Next Doc? No Done WWW'09, Madrid, Spain

  19. Click Chain Model … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 … WWW'09, Madrid, Spain

  20. Why Bayesian? • Modeling Benefit: • A principled way of smoothing the relevance estimates; • Offers more flexibility such as computing P(Ri>Rj). • Computational Benefit: • Avoid iterative optimization procedure in maximum-likelihood estimation WWW'09, Madrid, Spain

  21. Relevance Inference • Given a query, and all its click data compute the posterior for each possible j. • Let then focus on click probability for a particular session, and look at different cases WWW'09, Madrid, Spain

  22. Click Chain Model … R1 R2 R3 R4 R5 Cascade Hypothesis … E1 E2 E3 E4 E5 Examination Hypothesis C1 C2 C3 C4 C5 … WWW'09, Madrid, Spain

  23. 0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 … WWW'09, Madrid, Spain

  24. 0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 … WWW'09, Madrid, Spain

  25. 0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 … WWW'09, Madrid, Spain

  26. 0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 … WWW'09, Madrid, Spain

  27. 0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 … WWW'09, Madrid, Spain

  28. Putting them together WWW'09, Madrid, Spain

  29. Summary of the Algorithm • Initializing (2*10+2) counts for each pair; • Go through the click log once and update the counts; • Compute parameter values and get β values; • Ready to output results (using numerical integration if necessary). WWW'09, Madrid, Spain

  30. Sanity Check • The algorithm should be • Aware of the position bias and context dependency • Scalable to Terabyte data Single Pass, Linear • Incremental to stay updated Update counts WWW'09, Madrid, Spain

  31. Roadmap • Motivation and Problem Definition • Click Model Basics • CCM and Algorithms • Experimental Evaluation • Related Work and Conclusion WWW'09, Madrid, Spain

  32. Data Set • Collected in 2 weeks in July 2008. • Preprocessing: • Discard no-click sessions for fair comparison. • 178 most frequent queries removed. • Split to training/test sets according to time stamps. WWW'09, Madrid, Spain

  33. Data Set • After preprocessing: • 110,630 distinct queries; • 4.8M/4.0M query sessions in the training/test set. WWW'09, Madrid, Spain

  34. Metric • Efficiency: • Computational Time • Effectiveness: • With known document identities in the test set, • Using the relevance and parameter learned on the training set, • To do Click Prediction. (resort to indirect measure) WWW'09, Madrid, Spain

  35. Competitors • UBM: User Browsing Model (Dupret et al., SIGIR’08) • More parameters • Iterative, more expensive algorithm • DCM: Dependent Click Model (WSDM’09) • Modeling 1+ clicks per session WWW'09, Madrid, Spain

  36. Results - Time • Environment: Unix Server, 2.8GHz cores, MATLAB R2008b. WWW'09, Madrid, Spain

  37. Results – Perplexity • Perplexity: quality of click prediction for each position individually. Random Guess (pH=0.5): 2.00 Best Guess (pH=0.8): 1.65 Ground Truth (Cheating): 1.00 WWW'09, Madrid, Spain

  38. Results – Perplexity Worse Better WWW'09, Madrid, Spain

  39. Results – Perplexity • Average Perplexity over top 10 positions. WWW'09, Madrid, Spain

  40. Results – Log Likelihood • Log-likelihood: log of the chance to recover the entire click vector out of 210 possibilities. WWW'09, Madrid, Spain

  41. Results – Log Likelihood Better Worse WWW'09, Madrid, Spain

  42. Roadmap • Motivation and Problem Definition • Click Model Basics • CCM and Algorithms • Experimental Evaluation • Related Work and Conclusion WWW'09, Madrid, Spain

  43. Related Work • User behavior study and hypothesis • Eye-tracking Study (Joachims et al., KDD’05, ACM TOIS) • Examination Hypothesis (Richardson et al., WWW’07) • Cascade Hypothesis (Craswell et al., WSDM’08) • Other click models • Logistic Regression (Dupret et al., SIGIR’08) • Dynamic Bayesian Network (Chapelle et al., WWW’09) • Bayesian Browsing Model (KDD’09, To appear) WWW'09, Madrid, Spain

  44. Conclusion • Click Chain Model • A probabilistic approach to interpret clicks. • A Bayesian approach to model relevance. • Both scalable and incremental. • Future Directions • Validation/Bucket Test. • Pairwise comparison • More on context dependency WWW'09, Madrid, Spain

  45. Thank you :-) WWW'09, Madrid, Spain

  46. Abstract/Document Relevance • Relevance of Abstract: • Conditional probability of click as defined by examination hypothesis • Relevance of Document: • Determines the probability of “See Next Doc” • A binary random variable (integrated out under CCM) WWW'09, Madrid, Spain

  47. Alt. User Behavior Description Examine the Document Yes See Next Doc? No Click? Yes Yes See Next Doc? No Relevant? Yes Yes See Next Doc? WWW'09, Madrid, Spain

  48. Results – Perplexity (by Freq) Worse Better WWW'09, Madrid, Spain

  49. Examination/Click Distribution WWW'09, Madrid, Spain

More Related