1 / 32

A Field Relevance Model

A Field Relevance Model . for Structured Document Retrieval. JIN YOUNG KIM @ ECIR 2012. Three Themes. The Concept of Field Relevance Using Field Relevance for Retrieval The Estimation of Field Relevance. Relevance. Field Weighting. Field Relevance. T he Field Relevance.

jonny
Download Presentation

A Field Relevance Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Field Relevance Model for Structured Document Retrieval JIN YOUNG KIM @ ECIR 2012

  2. Three Themes • The Concept of Field Relevance • Using Field Relevance for Retrieval • The Estimation of Field Relevance Relevance Field Weighting Field Relevance

  3. The Field Relevance

  4. IR : The Quest for Relevance • The Role of Relevance • Core Component of Retrieval Models • Basis of (Pseudo) Relevance Feedback • Retrieval Models based on the Relevance • Binary Independence Model (BM25) [Robertson76] • Relevance-based Language Model [Lavrenko01] P(w|R) V = (w1w2 ... wm)

  5. Structured Document Retrieval • Documents have multiple fields • Emails, products (entities), and so on. • Retrieval models exploit the structure • Field weighting is common q1 q2 ... qm f1 f1 w1 w1 f2 f2 w2 w2 sum ... ... fn fn wn wn multiply

  6. Relevance for Structured Document Retrieval • Term-level Relevance • Which term is important for user’s information need? • Field-level Relevance • Which field is important for user’s information need? • Term-level relevance • Field-level relevance P(w|R) P(F|R) F = (F1 F2 … Fn) V = (w1w2 ... wm)

  7. Defining the Field Relevance Field Relevance P(F|w,R) per-term • The distribution of per-term relevance over document fields q1… qi… qm Query: m words Collection: n fields for each document Q= (q1q2... qm) F1 … Fj … Fn F = (F1 F2 … Fn) P(F|qm,R) P(F|qi,R) P(F|q1,R)

  8. Why P(F|w,R) instead of P(F|R)? • Different fields are relevant for different query-term Query: ‘james registration’ ‘registration’ is relevant when it occurs in <subject> 1 1 2 2 1 2 ‘james’ is relevant when it occurs in <to>

  9. More Evidence for the Field Relevance • Field Operator / Advanced Search Interface • User’s search terms are found in multiple fields Understanding Re-finding Behavior in Naturalistic Email Interaction Logs. Elsweiler, D, Harvey, M, Hacker., M [SIGIR'11] • Evaluating Search in Personal Social Media Collections Chia-Jung L, Croft, W.B., Kim, J[WSDM12]

  10. The Field Relevance Model

  11. Retrieval over Structured Documents • Field-based Retrieval Models • Score each field against each query-term • Combine field-level scores using field weights Fixed field weights wj can be too restrictive

  12. Using the Field Relevance for Retrieval • Field Relevance Model • Comparison with Mixture of Field Language Model q1 q2 ... qm q1 q2 ... qm f1 f1 f1 f1 w1 w1 P(F1|q1) P(F1|qm) f2 f2 f2 f2 w2 w2 P(F2|q1) P(F2|qm) ... ... ... ... Per-term Field Score sum fn fn fn fn wn wn P(Fn|q1) P(Fn|qm) Per-term Field Weight multiply

  13. Structured Document Retrieval: PRM-S [Kim, Xue, Croft 09] • Probabilistic Retrieval Model for Semi-structured data • Estimate the mapping between queryterms and doc. fields • Use the mapping probability as per-term field weights Estimation is based on limited sources.

  14. Using the Field Relevance for Retrieval • Field Relevance Model • Comparison with the PRM-S • FRM has the same functional form to PRM-S • FRM differs in how per-term field weights are estimated Per-term Field Score Per-term Field Weight Per-term Field Weight

  15. Estimating Field Relevance

  16. Estimating Field Relevance: in a Nutshell • If User Provides Feedback • Relevant document provides sufficient information • If No Feedback is Available • Combine field-level term statistics from multiple sources from/to from/to + ≅ title title content content Collection Top-k Docs from/to title content Relevant Docs

  17. Estimating Field Relevance using Feedback • Assume a user who marked DR as relevant • Estimate field relevance from the field-level term dist. of DR • We can personalize the results accordingly • Rank higher docs with similar field-level term distribution Field Relevance: - To is relevant for ‘james’ - Content is relevant for ‘registration’ DR

  18. Estimating Field Relevance without Feedback • Method • Linear Combination of Multiple Sources • Weights estimated using training queries • Features • Field-level term distribution of the collection • Unigram and Bigram LM • Field-level term distribution of top-k docs • Unigram and Bigram LM • A priori importance of each field (wj) • Estimated using held-out training queries Unigram is the same to PRM-S Pseudo-relevance Feedback Similar to MFLM and BM25F

  19. Experiments

  20. Experimental Setup • Collections • TREC Emails • IMDB Movies • Monster Resumes • Distribution of the Most Relevant Field

  21. Query Examples (Indri) • Oracle Estimates of Field Relevance TREC IMDB Monster

  22. Retrieval Methods Compared • Baselines • DQL / BM25F • MFLM : fixed regardless of terms • PRM-S : estimated using the collection • Field Relevance Models • FRM-C : estimated using the combination • FRM-O : estimated using relevant documents Differs only in terms of the field weighting!

  23. Retrieval Effectiveness (Metric: Mean Reciprocal Rank) Per-term Field Weights Fixed Field Weights

  24. Quality of Field Relevance Estimation • Aggregated KL-Divergence from Oracle Estimates • Aggregated Cosine Similarity with Oracle Estimates

  25. Feature Ablation Results • Features Revisited • Field-level term distribution of the collection (PRM-S) • Field-level term distribution of top-k documents • A priori relevance of term (prior) • Results for TREC Collection

  26. Conclusions

  27. Summary • Field relevance as a generalization of field weighting • Relevance modeling for structured document retrieval • Field relevance model for structured doc. retrieval • Using field relevance to combine per-field LM scores • Estimating the field relevance using relevant docs • Providing a natural way to incorporate relevance feedback • Estimating the field relevance by combining sources • Improved performance over MFLM and PRM-S

  28. Ongoing Work • Large-scale batch evaluation on a book collection • Test collections built using OpenLibrary.org query logs • Evaluation of the relevance feedback on FRM • Does relevance feedback improves on subsequent results? • Integrating the term relevance and field relevance • Further improvement is expected when combined Term Relevance Field Relevance

  29. I’m on the job market! More at @jin4ir, or cs.umass.edu/~jykim • Structured Document Retrieval • A Probabilistic Retrieval Model for Semi-structured Data [ECIR09] • A Field Relevance Model for Structured Document Retrieval [ECIR11] • Personal Search • Retrieval Experiments using Pseudo-Desktop Collections [CIKM09] • Ranking using Multiple Document Types in Desktop Search [SIGIR10] • Evaluating an Associative Browsing Model for Personal Info. [CIKM11] • Evaluating Search in Personal Social Media Collections [WSDM12] • Web Search • An Analysis of Instability for Web Search Results [ECIR10] • Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic [WSDM12]

  30. Optional Slides

  31. Optimality of Field Relevance Estimation • This results in the optimal field weighting • Scores DR as highly as possible against other docs • Under the language modeling framework for IR Per-term Field Score Proof on the extended version Per-term Field Weight

  32. Features based on Field-level Term Dists. • Summary • Estimation Unigram LM (= PRM-S) Bigram LM

More Related