1 / 21

Model-based Feedback in the Language Modeling Approach to Information Retrieval

Model-based Feedback in the Language Modeling Approach to Information Retrieval. Chengxiang Zhai and John Lafferty School of Computer Science Carnegie Mellon University. Outline. The Language Modeling Approach to IR Feedback: Expansion-based vs. Model-based

carterm
Download Presentation

Model-based Feedback in the Language Modeling Approach to Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Model-based Feedback in the Language Modeling Approach to Information Retrieval Chengxiang Zhai and John Lafferty School of Computer Science Carnegie Mellon University

  2. Outline • The Language Modeling Approach to IR • Feedback: Expansion-based vs. Model-based • Two Model-based feedback algorithms • Evaluation • Conclusions & Future Work

  3. Text Retrieval (TR) • Given a query, find relevant documents in a document collection ( Ranking documents) • Many applications (Web pages, News, Email, …) • Many models developed (vector space, probabilistic) • The “language modeling approach” is a new model that is promising …

  4. Document language model Retrieval as Language Model Estimation • Document ranking based on query likelihood(Ponte & Croft 98, Miller et al. 99, Berger & Lafferty 99, Hiemstra 2000, etc.) • Retrieval problem  Estimation of p(wi|d) • Many advantages:good statistical foundation, reuse existing LM methods ... • But, feedback is awkward …

  5. Feedback in Text Retrieval • Learning from examples • In effect, new, related terms are extracted to enhance the original query • Generally leads to performance increase (both average precision and recall)

  6. Results: d1 3.5 d2 2.4 … dk 0.5 ... Retrieval Engine Query Updated query User Document collection Judgments: d1 + d2 - d3 + … dk - ... Feedback Relevance Feedback

  7. top 10 Pseudo/Blind/Automatic Feedback Results: d1 3.5 d2 2.4 … dk 0.5 ... Retrieval Engine Query Updated query Document collection Judgments: d1 + d2 + d3 + … dk - ... Feedback

  8. Feedback in the Language Modeling Approach • Mostly expansion-based : adding new terms to query (Ponte 1998, Miller et al. 1999, Ng 1999) • Query term reweighting, no expansion(Hiemstra 2001) • Implicit feedback(Berger & Lafferty 99) • Conceptual inconsistency in expansion-based approaches • Original query : as text • Expanded query: as text + {terms}

  9. Answer: Introduce a query model & treat feedback as query model updating Retrieval function: Query-likelihood => KL-Divergence Feedback: Expansion-based => Model-based Question: How to exploit language modeling to perform natural and effective feedback?

  10. A KL-Divergence Unigram Retrieval Model • A special case of the general risk minimization retrieval framework (Lafferty & Zhai 2001) • Retrieval formula • Retrieval  Estimation of Q and D • Special case: = empirical distribution of q recovers “query-likelihood” query entropy (ignored for ranking)

  11. modify Expansion-based Feedback Model-based Feedback modify Expansion-based vs. Model-based Doc model Scoring Document D Results Query Q Query likelihood Feedback Docs Doc model Document D Scoring Results KL-divergence Query model Query Q Feedback Docs

  12. Feedback as Model Interpolation ML+smooth Document D Results Query Q ML Feedback Docs F={d1, d2 , …, dn} =0 =1 Generative model Divergence minimization No feedback Full feedback

  13. Background words w P(w| C)  F={d1,…,dn} P(source) Topic words w 1- P(w|  ) Maximum Likelihood Use EM to find F F Estimation Method I: Generative Mixture Model

  14. d1 close C  F={d1,…,dn} far () dn Empirical divergence Divergence minimization Given F, C, , solution is F Estimation Method II:Empirical Divergence Minimization

  15. Example of Feedback Query Model Trec topic 412: “airport security” Mixture model approach Web database Top 10 docs =0.9 =0.7

  16. Model-based feedback vs. Simple LM

  17. Div. Min less sensitive Mixture model more sensitive origial query model =0 feedback model only =1 Sensitivity of Precision to 

  18. Mixture model less sensitive No feedback Div. min. more sensitive More common words “ignored” Sensitivity of Precision to  (Mixture Model & Divergence Min., =0.5) Over discrimination can be harmful

  19. The Lemur Toolkit • Language Modeling and Information Retrieval Toolkit • Under development at CMU and UMass • All experiments reported here were run using Lemur • http://www.cs.cmu.edu/~lemur • Contact us if you are interested in using it

  20. Conclusions • Model-based feedback is natural and effective • Performance is sensitive to both  and  • Mixture model: more sensitive to , but less to  (0.5) • Divergence min: more sensitive to , but less to  (0.3) • The sensitivity suggests more robust models are needed. E.g., use query to focus the model • Markov chain query model (Lafferty & Zhai, 2001) • Relevance language model(Lavrenko & Croft, 2001)

  21. Future Work • Evaluating methods for relevance feedback • Examples in pseudo feedback can be quite noisy • Relevance feedback better reflects “learning ability” • More robust feedback models, e.g., • Query-focused feedback (e.g., Query translation model) • Passage-based feedback (e.g., Hidden Markov model)

More Related