1 / 21

Two-stage Language Models for Information Retrieval

Two-stage Language Models for Information Retrieval. ChengXiang Zhai*, John Lafferty School of Computer Science Carnegie Mellon University. *New Address Department of Computer Science University of Illinois, Urbana-Champaign. Motivation. Retrieval parameters are needed to

nelia
Download Presentation

Two-stage Language Models for Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Two-stage Language Models for Information Retrieval ChengXiang Zhai*, John Lafferty School of Computer Science Carnegie Mellon University *New Address Department of Computer Science University of Illinois, Urbana-Champaign

  2. Motivation • Retrieval parameters are needed to • model different user preferences • customize a retrieval model according to different queries and documents • So far, parameters have been set through empirical experimentation • Can we set parameters automatically?

  3. Parameters in Traditional Models • EXTERNAL to the model, hard to interpret • Most parameters are introduced heuristically to implement our “intuition” • As a result, no principles to quantify them • Set through empirical experiments • Lots of experimentation • Optimality for new queries is not guaranteed

  4. Example of Parameter Tuning (Okapi) “k1, b and k3 are parameters which depend on the nature of the queries and possibly on the database; k1 and b default to 1.2 and 0.75 respectively, but smaller values of b are sometimes advantageous; in long queries k3 is often set to 7 or 1000 (effectively infinite).” (Robertson et al. 1999)

  5. The Way to Automatic Tuning ... • Parameters must be PART of the model! • Query modeling (explain difference in query) • Document modeling (explain difference in doc) • De-couple the influence of a query on parameter setting from that of documents • To achieve stable setting of parameters • To pre-compute query-independent parameters

  6. The Rest of the Talk Risk Minimization Retrieval Framework Two-stage Language Models Two-stage Dirichlet-Mixture smoothing Parameter estimation

  7. The Risk Minimization Framework(Lafferty & Zhai 01, Zhai 02) QUERY MODELING Query Language Model Query USER MODELING ? User Retrieval Decision: Loss Function Documents Document Language Models DOC MODELING

  8. Parameter Setting in Risk Minimization Estimate Estimate Query model parameters Set User model parameters Doc model parameters Query Language Model Query User Loss Function Documents Document Language Models

  9. Two-stage Language Models stage-2 stage-1 Risk ranking formula 1 2 Query Query Language Model q Loss Function Smoothing! d Document Language Model Doc

  10. Sensitivity in Traditional (“one-stage”) Smoothing Keyword Verbose (sentence-like)

  11. The Need of Two-stage Smoothing (I) Accurate Estimation of Doc Model Language Model P(w|d) Document Query = “data mining algorithms” … text 10/500=0.02 mining 3/500=0.006 assocation 1/500=0.002 algorithm 2/500=0.004 … data 0/500=0 … ? Text mining paper p(q) = p(“data”|d)p(“mining”|d)p(“algorithm”|d) = 0*0.006*0.004 = 0! P(“data”|d) = ? P(“unicorn”|d) = ?

  12. The Need of Two-stage Smoothing (II)Explanation of Noise in Query Query = “the algorithms for data mining” d1: 0.04 0.001 0.02 0.002 0.003 d2: 0.02 0.001 0.01 0.003 0.004 p( “algorithms”|d1) = p(“algorithm”|d2) p( “data”|d1) < p(“data”|d2) p( “mining”|d1) < p(“mining”|d2) But p(q|d1)>p(q|d2)! We should make p(“the”) and p(“for”) less different for all docs.

  13. Two-stage Dirichlet-Mixture Smoothing Stage-1 Smoothing -Explain unseen words -Dirichlet prior -Add pseudo counts Stage-2 Smoothing -Explain noise in query -2-component mixture -Linear interpolation c(w,d) +p(w|C) (1-) + p(w|U)   |d| + P(w|d) =

  14. Estimating  using leave-one-out w1 Leave-one-out P(w1|d- w1) log-likelihood w2 P(w2|d- w2) Maximum Likelihood Estimator ... wn Newton’s Method P(wn|d- wn)

  15. Estimating  using Mixture Model Stage-2 Stage-1   1 d1 P(w|d1) (1-)p(w|d1)+ p(w|U) ... … ... query N   dN P(w|dN) (1-)p(w|dN)+ p(w|U) Simultaneously adjust , and 1,…, N to maximize query likelihood Maximum Likelihood Estimator Expectation-Maximization (EM) algorithm

  16. Effectiveness of Parameter Estimation • Five databases • News articles (AP, WSJ, ZIFF, FBIS, FT, LA) • Government documents (Federal Register) • Web pages • Four types of queries • Long vs. short • Verbose (sentence-like) vs. keyword • Results: Automatic 2-stage  Optimal 1-stage

  17. Automatic 2-stage results  Optimal 1-stage results Average precision (3 DB’s + 4 query types, 150 topics)

  18. Automatic 2-stage results  Optimal 1-stage results Average precision ( 2 large DB’s + 2 query types, 50 topics)

  19. Conclusions • Two-stage language models • Direct modeling of both queries and documents • Parameters are part of a probabilistic model • Parameters can be estimated using standard estimation techniques • Two-stage Dirichlet-Mixture smoothing • Involves two meaningful parameters (I.e., document sample size and query noise) • Achieves very good performance through automatically setting smoothing parameters • It is possible to set parameters automatically!

  20. Future Work • Optimality analysis in the two-stage parameter space • Offline vs. online estimation • Alternative estimation methods • Parameter estimation for more sophisticated language models (e.g., with feedback)

  21. Thank you!

More Related