1 / 27

Personalizing Search

Personalizing Search. Jaime Teevan, MIT Susan T. Dumais, MSR and Eric Horvitz, MSR. Relevant result. Query:. “pia workshop”. Outline. Approaches to personalization The PS algorithm Evaluation Results Future work. Approaches to Personalization. Content of user profile

riverapaul
Download Presentation

Personalizing Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Personalizing Search Jaime Teevan, MIT Susan T. Dumais, MSR and Eric Horvitz, MSR

  2. Relevant result Query: “pia workshop”

  3. Outline • Approaches to personalization • The PS algorithm • Evaluation • Results • Future work

  4. Approaches to Personalization • Content of user profile • Long-term interests • Liu, et al. [14], Compass Filter [13] • Short-term interests • Query refinement [2,12,15], Watson [4] • How user profile is developed • Explicit • Relevance feedback [19], query refinement [2,12,15] • Implicit • Query history [20, 22], browsing history [16, 23] Very rich user profile

  5. PS Search Engine query

  6. PS Search Engine query forest hiking walking gorp dog cat monkey banana food baby infant child boy girl csail mit artificial research robot baby infant child boy girl web search retrieval ir hunt

  7. PS Search Engine query Search results page 6.0 1.6 0.2 2.7 0.2 1.3 web search retrieval ir hunt 1.3

  8. Calculating a Document’s Score • Based on standard tf.idf Score = Σtfi * wi web search retrieval ir hunt 1.3

  9. Calculating a Document’s Score • Based on standard tf.idf Score = Σtfi * wi 0.1 0.05 0.5 0.35 0.3 1.3 Σ

  10. N (N) (ni) ni wi = log Calculating a Document’s Score • Based on standard tf.idf Score = Σtfi * wi World 0.3 0.7 0.1 0.23 0.6 0.6 0.1 0.7 0.001 0.23 0.6 0.1 0.05 0.5 0.35 0.3 0.002 0.7 0.1 0.01 0.6 0.1 0.7 0.001 0.23 0.6 0.1 0.05 0.5 0.35 0.3 1.3 Σ 0.2 0.8 0.1 0.001 0.3 0.4

  11. N (N) (ni) ni wi = log (ri+0.5)(N-ni-R+ri+0.5) (ni-ri+0.5)(R-ri+0.5) (ri+0.5)(N-ni-R+ri+0.5) (ni-ri+0.5)(R-ri+0.5) ri wi = log wi = log ’ ’ Where: N = N+R, ni = ni+ri R Calculating a Document’s Score • Based on standard tf.idf Score = Σtfi * wi World 0.3 0.7 0.1 0.23 0.6 0.6 † ’ ’ 0.1 0.7 0.001 0.23 0.6 ’ 0.1 0.05 0.5 0.35 0.3 0.002 0.7 0.1 0.01 0.6 0.1 0.7 0.001 0.23 0.6 0.2 0.8 0.1 0.001 0.3 0.4 † From Sparck Jones, Walker and Roberson, 1998 [21]. 0.002 0.7 0.1 0.01 0.6 0.002 0.7 0.1 0.01 0.6 Client

  12. Finding the Parameter Values • Corpus representation (N, ni) • How common is the term in general? • Web vs. result set • User representation (R, ri) • How well does it represent the user’s interest? • All vs. recent vs. Web vs. queries vs. none • Document representation • What terms to sum over? • Full document vs. snippet web search retrieval ir hunt

  13. Building a Test Bed • 15 evaluators x ~10 queries • 131 queries total • Personally meaningful queries • Selected from a list • Queries issued earlier (kept diary) • Evaluate 50 results for each query • Highly relevant / relevant / irrelevant • Index of personal information

  14. Evaluating Personalized Search • Measure algorithm quality • DCG(i) = { • Look at one parameter at a time • 67 different parameter combinations! • Hold other parameters constant and vary one • Look at best parameter combination • Compare with various baselines Gain(i), DCG(i–1) + Gain(i)/log(i), if i = 1 otherwise

  15. Analysis of Parameters User

  16. Analysis of Parameters Corpus User Document

  17. PS Improves Text Retrieval • No model • Relevance Feedback • Personalized Search 0.46 0.41 0.37

  18. Text Features Not Enough 0.56 0.46 0.41 0.37

  19. Take Advantage of Web Ranking 0.58 0.56 0.46 0.41 0.37 PS+Web

  20. Summary • Personalization of Web search • Result re-ranking • User’s documents as relevance feedback • Rich representations important • Rich user profile particularly important • Efficiency hacks possible • Need to incorporate features beyond text

  21. Further Exploration • Improved non-text components • Usage data • Personalized PageRank • Learn parameters • Based on individual • Based on query • Based on results • UIs for user control

  22. User Interface Issues • Make personalization transparent • Give user control over personalization • Slider between Web and personalized results • Allows for background computation • Exacerbates problem with re-finding • Results change as user model changes • Thesis research – Re:Search Engine

  23. Thank you! teevan@csail.mit.edu sdumais@microsoft.com horvitz@microsoft.com

  24. Much Room for Improvement • Group ranking • Best improves on Web by 23% • More people  Less improvement • Personal ranking • Best improves on Web by 38% • Remains constant Potential for Personalization

  25. Evaluating Personalized Search • Query selection • Chose from 10 pre-selected queries • Previously issued query Pre-selected cancer Microsoft traffic … Las Vegas rice McDonalds … bison frise Red Sox airlines … Mary Joe Total: 137 53 pre-selected (2-9/query)

  26. Making PS Practical • Learn most about personalization by deploying a system • Best algorithm reasonably efficient • Merging server and client • Query expansion • Get more relevant results in the set to be re-ranked • Design snippets for personalization

More Related