1 / 24

Term Necessity Prediction P( t | R q )

Main Points. Necessity is as important as idf (theory) Explains behavior of IR models (practice) Can be predicted Performance gain. Term Necessity Prediction P( t | R q ). Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University

olina
Download Presentation

Term Necessity Prediction P( t | R q )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Main Points • Necessity is as important as idf (theory) • Explains behavior of IR models (practice) • Can be predicted • Performance gain Term Necessity PredictionP(t | Rq) Le Zhao and Jamie Callan Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University Oct 27, CIKM 2010

  2. Definition of Necessity P(t | Rq) • Collection Directly calculated given relevance judgements for q Relevant (q) Docs that contain t P(t | Rq) = 0.4 Necessity == term recall == 1 – mismatch

  3. Why Necessity?Roots in Probabilistic Models • Main Points • Binary Independence Model • [Robertson and Spärck Jones 1976] • “Relevance Weight”, “Term Relevance” • P(t | R)is effectively the only part about relevance. • Necessity is as important as idf (theory) • Explains behavior of IR models (practice) • Can be predicted • Performance gain • idf (sufficiency) Necessity odds

  4. Without Necessity • The emphasis problem for idf-only term weighting • Emphasize high idf terms in query • “prognosis/viability of a political third party in U.S.” (Topic 206)

  5. Ground Truth TREC 4 topic 206 Emphasis

  6. Indri Top Results 1. (ZF32-220-147) Recession concerns lead to a discouraging prognosis for 1991 2. (AP880317-0017) Politics … party … Robertson's viability as a candidate 3. (WSJ910703-0174) political parties … 4. (AP880512-0050) there is no viable opposition … 5. (WSJ910815-0072) A third of the votes 6. (WSJ900710-0129) politics, party, two thirds 7. (AP880729-0250) third ranking political movement… 8. (AP881111-0059) political parties 9. (AP880224-0265) prognosis for the Sunday school 10. (ZF32-051-072) third party provider (Google, Bing still have top 10 false positives. Emphasis also a problem for large search engines!)

  7. Without Necessity • The emphasis problem for idf-only term weighting • Emphasize high idf terms in query • “prognosis/viability of a political third party in U.S.” (Topic 206) • False positives throughout rank list • especially detrimental at top rank • No term recall hurts precision at all recall levels • (This is true for BIM, and also BM25, LM that use tf.) • How significant is the emphasis problem?

  8. Failure Analysis of 44 Topics from TREC 6-8 • Main Points • Necessity is as important as idf (theory) • Explains behavior of IR models (practice) • & Bigrams, &Term restriction using doc fields • Can be predicted • Performance gain Necessity term weighting Necessity guided expansion Basis: Term Necessity Prediction RIA workshop 2003 (7 top research IR systems, >56 expert*weeks)

  9. Given True Necessity • +100% over BIM (in precision at all recall levels) • [Robertson and Spärk Jones 1976] • +30-80% over Language Model, BM25 (in MAP) • This work • For a new query w/o relevance judgements, need to predict necessity. • Predictions don’t need to be very accurate to show performance gain.

  10. How Necessary are Words? (Examples from TREC 3 topics)

  11. Mismatch Statistics • Mismatch variation across terms (TREC 3 title) (TREC 9 desc) • Not constant, need prediction

  12. Mismatch Statistics (2) • Mismatch variation for the same term in different queries TREC 3 recurring words • Query dependent features needed(1/3 term occurrences have necessity variation>0.1)

  13. Prior Prediction Approaches • Croft/Harper combination match (1979) • treats P(t | R)as a tuned constant • when >0.5, rewards docs that match more query terms • Greiff’s (1998) exploratory data analysis • Used idf to predict overall term weighting • Improved over BIM • Metzler’s (2008) generalized idf • Used idf to predict P(t | R) • Improved over BIM • Years of simple idf feature, limited success • Missing piece: P(t | R) = term necessity = term recall

  14. Factors that Affect Necessity What causes a query term to not appear in relevant documents? • Topic Centrality (Concept Necessity) • E.g., Laser research related or potentially related to US defense, Welfare laws propounded as reforms • Synonyms • E.g., movie == film == … • Abstractness • E.g., Ailments in the vitamin query, Dog Maulings, Christian Fundamentalism • Worst thing is a rare & abstract term, e.g. prognosis

  15. Features • We need to • Identify synonyms/searchonyms of a query term • in a query dependent way • Use Thesauri? • Biased (not collection dependent) • Static (not query dependent) • Not promising, Not easy • Term-term similarity in concept space! • Local LSI (Latent Semantic Indexing) • LSI of (e.g. 200) top ranked documents • keep (e.g. 150) dimensions

  16. Features • Topic Centrality • Length of term vector after dimension reduction (local LSI) • Synonymy (Concept Necessity) • Average similarity scores of top 5 similar terms • Replaceability • Adjust the Synonymy measure by how many new documents the synonyms match • Abstractness • Users modify abstract terms with concrete terms effects on the US educational program prognosis of a political third party

  17. Experiments • Necessity Prediction Error • Regression problem • Model: RBF kernel regression, M: <f1, f2, .., f5>  P(t | R) • Necessity for Term Weighting • End-to-End retrieval performance • How to weight terms by their necessity • In BM25 • Binary Independence Model • In Language Models • Relevance model Pm(t| R) – multinomial (Lavrenko and Croft 2001)

  18. Necessity Prediction Example Trained on TREC 3, tested on TREC 4 Emphasis

  19. Necessity Prediction Error • Main Points The lower The better • Necessity is as important as idf • Explains behavior of IR models • Can be predicted • Performance gain L1 Loss:

  20. Predicted Necessity Weighting 10-25% gain (necessity weight) 10-20% gain(top Precision)

  21. Predicted Necessity Weighting (ctd.) • Main Points • Necessity is as important as idf • Explains behavior of IR models • Can be predicted • Performance gain

  22. vs. Relevance Model x ~ y w1 ~ P(t1|R) w2 ~ P(t2|R) Relevance Model: #weight( 1-λ #combine( t1t2 ) λ #weight( w1t1 w2t2 w3t3 … ) ) y x Weight Only ≈ Expansion Supervised > Unsupervised (5-10%)

  23. Take Home Messages • Necessity is as important as idf (theory) • Explains behavior of IR models (practice) • Effective features can predict necessity • Performance gain

  24. Acknowledgements • Reviewers from multiple venues • Ni Lao, Frank Lin, Yiming Yang, Stephen Robertson, Bruce Croft, Matthew Lease • Discussions & references • David Fisher, Mark Hoy • Maintaining the Lemur toolkit • Andrea Bastoni and Lorenzo Clemente • Maintaining LSI code for Lemur toolkit • SVM-light, Stanford parser • TREC • All the data • NSF Grant IIS-0707801 and IIS-0534345 Feedback: Le Zhao (lezhao@cs.cmu.edu)

More Related