1 / 48

Risk Minimization and Language Modeling in Text Retrieval

Risk Minimization and Language Modeling in Text Retrieval. ChengXiang Zhai. Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David A. Evans W. Bruce Croft (Univ. of Massachusetts, Amherst). Information Overflow. Web Site Growth. query. “Tips on thesis defense”.

sela
Download Presentation

Risk Minimization and Language Modeling in Text Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Risk Minimization and Language Modeling in Text Retrieval ChengXiang Zhai Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David A. Evans W. Bruce Croft (Univ. of Massachusetts, Amherst)

  2. Information Overflow Web Site Growth

  3. query “Tips on thesis defense” Text Retrieval (TR) database/collection Retrieval System User relevant docs text docs

  4. Utility Challenges in TR Ad hoc parameter tuning (independent,topical) Relevance

  5. Sophisticated Parameter Tuningin the Okapi System “k1, b and k3 are parameters which depend on the nature of the queries and possibly on the database; k1 and b default to 1.2 and 0.75 respectively, but smaller values of b are sometimes advantageous; in long queries k3 is often set to 7 or 1000 (effectively infinite).” (Robertson et al. 1999)

  6. Desired Ranking Redundancy Readability More Than “Relevance” Relevance Ranking

  7. Meeting the Challenges Risk Minimization Framework Parameter Estimation Statistical Language Models Bayesian Decision Theory Utility-based Retrieval

  8. Map of Thesis New TR Framework New TR Models Features Two-stage Language Model Automatic parameter setting Risk Minimization Framework KL-divergence Retrieval Model Natural incorporation of feedback Aspect Retrieval Model Non-traditional ranking

  9. ? Unordered subset ? … Query Ranked list 1 2 3 4 ? Clustering Retrieval as Decision-Making Given a query, - Which documents should be selected? (D) - How should these docs be presented to the user? () Choose: (D,)

  10. U q User Query Partially observed observed d S Document Source inferred Generative Model of Document & Query

  11. Loss L L L queryq userU q Choice: (D1,1) 1 Choice: (D2,2) doc setC sourceS ... Choice: (Dn,n) N loss hidden observed RISK MINIMIZATION Bayes risk for choice (D, ) Bayesian Decision Theory

  12. Special Cases • Set-based models (choose D) • Ranking models (choose ) • Independent loss (  PRP) • Relevance-based loss • Distance-based loss • Dependent loss • MMR loss • MDR loss Boolean model Probabilistic relevance model Vector-space Model Two-stage LM KL-divergence model Aspect retrieval model

  13. Relevance P(d q) or P(q d) Probabilistic inference (R(q), R(d)) Similarity P(r=1|q,d) r {0,1} Probability of Relevance Regression Model (Fox 83) Generative Model Different inference system Different rep & similarity Query generation Doc generation … Inference network model (Turtle & Croft, 91) Prob. concept space model (Wong & Yao, 95) Vector space model (Salton et al., 75) Prob. distr. model (Wong & Yao, 89) Classical prob. Model (Robertson & Sparck Jones, 76) LM approach (Ponte & Croft, 98) (Lafferty & Zhai, 01a) Map of Existing TR Models

  14. Where Are We? Two-stage Language Model Risk Minimization Framework KL-divergence Retrieval Model Aspect Retrieval Model

  15. Loss function U q Stage 2: compute Stage 2 (Mixture model) Stage 1 Two-stage smoothing S d Stage 1: compute (Dirichlet prior smoothing) Two-stage Language Models Risk ranking formula

  16. Keyword queries Verbose queries The Need of Query-Modeling(Dual-Role of Smoothing)

  17. Interaction of the Two Roles of Smoothing

  18. Stage-1 -Explain unseen words -Dirichlet prior(Bayesian) Stage-2 -Explain noise in query -2-component mixture c(w,d) +p(w|C) (1-) + p(w|U)   |d| + P(w|d) = Two-stage Smoothing

  19. w1 Leave-one-out P(w1|d- w1) log-likelihood w2 P(w2|d- w2) Maximum Likelihood Estimator ... wn Newton’s Method P(wn|d- wn) Estimating  using leave-one-out

  20. Stage-2 Stage-1   1 d1 P(w|d1) (1-)p(w|d1)+p(w|U) ... … ... query N   dN P(w|dN) (1-)p(w|dN)+p(w|U) Maximum Likelihood Estimator Expectation-Maximization (EM) algorithm Estimating  using Mixture Model

  21. Automatic 2-stage results  Optimal 1-stage results Average precision (3 DB’s + 4 query types, 150 topics)

  22. Where Are We? Two-stage Language Model Risk Minimization Framework KL-divergence Retrieval Model Aspect Retrieval Model

  23. Loss function Risk ranking formula U q S d KL-divergence Retrieval Models

  24. modify Expansion-based Feedback Model-based Feedback modify Expansion-based vs. Model-based Doc model Scoring Document D Results Query Q Query likelihood Feedback Docs Doc model Document D Scoring Results KL-divergence Query model Query Q Feedback Docs

  25. =0 =1 No feedback Full feedback Feedback as Model Interpolation Document D Results Query Q Feedback Docs F={d1, d2 , …, dn} Generative model Divergence minimization

  26. Background words w P(w| C)  F={d1,…,dn} P(source) Topic words w 1- P(w|  ) Maximum Likelihood F Estimation Method I: Generative Mixture Model

  27. d1 Background model close C  F={d1,…,dn} far () dn F Estimation Method II:Empirical Divergence Minimization Empirical divergence Divergence minimization

  28. Example of Feedback Query Model Trec topic 412: “airport security” Mixture model approach Web database Top 10 docs =0.9 =0.7

  29. Model-based feedback vs. Simple LM

  30. Where Are We? Two-stage Language Model Risk Minimization Framework KL-divergence Retrieval Model Aspect Retrieval Model

  31. Aspect Retrieval Query: What are the applications of robotics in the world today? Find as many DIFFERENT applications as possible. Aspect judgments A1 A2 A3 … ... Ak d1 1 1 0 0 … 0 0 d2 0 1 1 1 … 0 0 d3 0 0 0 0 … 1 0 …. dk 1 0 1 0 ... 0 1 Example Aspects: A1: spot-welding robotics A2: controlling inventory A3: pipe-laying robots A4: talking robot A5: robots for loading & unloading memory tapes A6: robot [telephone] operators A7: robot cranes … …

  32. #doc 1 2 3 … … #asp 2 5 8 … … #uniq-asp 2 4 5 AC: 2/1=2.0 4/2=2.0 5/3=1.67 AU: 2/2=1.0 4/5=0.8 5/8=0.625 Accumulated counts Evaluation Measures • Aspect Coverage (AC): measures per-doc coverage • #distinct-aspects/#docs • Equivalent to the “set cover” problem, NP-hard • Aspect Uniqueness(AU): measures redundancy • #distinct-aspects/#aspects • Equivalent to the “volume cover” problem, NP-hard • Examples 0 0 0 1 0 0 1 0 1 0 1 1 0 0 1 0 0 0 1 0 1 … ... d1 d2 d3

  33. Maximal Marginal Relevance (MMR)  1 Novelty/Redundancy Nov ( k+1| 1 … k) The best dk+1 is novel & relevant  k Relevance Rel( k+1) ? dk+1 k+1 Maximal Diverse Relevance (MDR) Aspect Coverage Distrib. p(a|i)  1 The best dk+1 is complementary in coverage  k k+1 Loss Function L( k+1| 1 … k) known d1 … dk

  34. Maximal Marginal Relevance (MMR) Models • Maximizing aspect coverage indirectly through redundancy elimination • Elements • Redundancy/Novelty measure • Combination of novelty and relevance • Proposed & studied six novelty measures • Proposed & studied four combination strategies

  35. Comparison of Novelty Measures (Aspect Coverage)

  36. Comparison of Novelty Measures (Aspect Uniqueness)

  37. Ref. document Maximum Likelihood Expectation-Maximization P(w|Old) Collection P(w|Background) A Mixture Model for Redundancy =?  1-

  38. Cost-based Combination of Relevance and Novelty Relevance score Novelty score

  39. Maximal Diverse Relevance (MDR) Models • Maximizing aspect coverage directly through aspect modeling • Elements • Aspect loss function • Generative Aspect Model • Proposed & studied KL-divergence aspect loss function • Explored two aspect models (PLSI, LDA)

  40. U q User Query d Document S Source PLSI: LDA: Aspect Generative Model of Document & Query  =(1,…, k)

  41. U q  S d Aspect Loss Function

  42. perfect redundant “Already covered” p(a|1)... p(a|k -1) non-relevant New candidate p(a|k) Combined coverage Aspect Loss Function: Illustration Desired coverage p(a|Q)

  43. Preliminary Evaluation: MMR vs. MDR • On the relevant data set, both MMR and MDR are effective, but they complement each other • - MMR improves AU more than AC • - MDR improves AC more than AU • On the mixed data set, however, • - MMR is only effective when relevance ranking is accurate • - MDR improves AC, even though relevance ranking is degraded.

  44. Further Work is Needed • Controlled experiments with synthetic data • Level of redundancy • Density of relevant documents • Per-document aspect counts • Alternative loss functions • Aspect language models, especially along the line of LDA • Aspect-based feedback

  45. New TR Models Specific Contributions • Empirical study of smoothing (dual role of smoothing) • New smoothing method (two-stage smoothing) • Automatic parameter setting (leave-one-out, mixture) New TR Framework Two-stage Language Model Risk Minimization Framework • Query/document distillation • Feedback with LMs (mixture model & div. min.) KL-divergence Retrieval Model • Unifies existing models • Incorporates LMs • Serves as a map for • exploring new models • Evaluation criteria (AC, AU) • Redundancy/novelty measures (mixture weight) • MMR with LMs (cost-comb.) • Aspect-based loss function (“collective KL-div”) Aspect Retrieval Model Summary of Contributions

  46. Future Research Directions • Better Approximation of the risk integral • More effective LMs for “traditional” retrieval • Can we beat TF-IDF without increasing computational complexity? • Automatic parameter setting, especially for feedback models • Flexible passage retrieval, especially with HMM • Beyond unigrams (more linguistics)

  47. More Future Research Directions • Aspect Retrieval Models • Document structure/sub-topic modeling • Aspect-based feedback • Interactive information retrieval models • Risk minimization for information filtering • Personalized & context-sensitive retrieval

  48. Thank you!

More Related