1 / 30

Relevance Feedback and other Query Modification Techniques

Relevance Feedback and other Query Modification Techniques. 課程名稱: 資訊擷取與推薦技術 指導教授: 黃三益 教授 報告者: 博一 楊錦生 ( d9142801) 博一 曾繁絹 ( d9142803). Introduction. Precision v.s. Recall In case high recall ratio is critical to users, they have to retrieve more relevant documents.

morrison
Download Presentation

Relevance Feedback and other Query Modification Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relevance Feedback and other Query Modification Techniques 課程名稱: 資訊擷取與推薦技術 指導教授: 黃三益 教授 報告者: 博一 楊錦生 (d9142801) 博一 曾繁絹 (d9142803)

  2. Introduction • Precision v.s. Recall • In case high recall ratio is critical to users, they have to retrieve more relevant documents. • Methods to retrieve more: • “Expand” their search by broadening a narrow Boolean query or looking further down a ranked list of retrieved documents. • Modify the original query.

  3. Introduction (cont’d) • “Word Mismatch” problem: • Some of the unretrieved relevant documents are indexed by a different set of terms than those in the query or in most of the other relevant documents. • Approaches for improving the initial query: • Relevance Feedback • Automatic Query Modification

  4. Conceptual Model of Relevance Feedback Query New Query Based on Result Set User Relevance Feedback Result Set

  5. Basic Ideas about Relevance Feedback • Two components of relevance feedback: • Reweighting of query terms based on the distribution of these terms in the relevant and nonrelevant documents retrieved in response to those queries • Changing the actual terms in the query

  6. Basic Ideas about Relevance Feedback (cont’d) • Evaluation of Relevance Feedback • The results after one iteration of feedback against those using no feedback generally show spectacular improvement • Another evaluation of the results is to compare only the residual collections

  7. Basic approach to Relevance Feedback • Rocchio’s approach used the vector space model to rank documents

  8. Ide developed three particular strategies extending Rocchio’s approach • Basic Roccho’s formula, minus the normalization for the number of relevant and nonrelevant documents • Allowed only feedback from relevant documents • Allowed limited negative feedback from only the highest-ranked nonrelevant document

  9. Term reweighting without Query Expansion • A probabilistic model proposed by Robertson and Sparck Jones (1976) Wij = the term weight for term i in query j r = the number of relevant documents for query j having term i R = the total number of relevant documents for query j n = the number of documents in the collection having term i N = the number of documents in the collection

  10. Term reweighting without Query Expansion (cont’d) • Croft (1983) extended this weighting scheme as below, initial search Feedback Wijk = the term weight for term I in query j and document k IDFi = the IDF weight for term I in the entire collection Pij = the probability that term i is assigned within the set of relevant documents for query j Qij = the probability that term i is assigned with the set of nonrelevant documents for query j Fik = K+(1-K)(freqik/maxfreqk) freqik=the frequency of term i in document k maxfreqk = the maximum frequency of any term in document k

  11. Query Expansion • The query could be expanded by • offering users a selection of terms that are the terms most closely related to the initial query terms (thesaurus) • presenting users with a sorted list of terms from the relevant documents or all retrieved documents

  12. Query Expansion (cont’d) • A proposed list of terms from relevant/nonrelevant documents using ranking methods • User selection from the top N terms • Automatically added to the query • The early SMART experiments both expanded the query and reweighted the query terms by adding the vectors of the relevant and nonrelevant documents.

  13. Query Expansion (cont’d) • Modification of terms in relevant/nonrelevant documents: • Any relevant document(s) as a “new query” (Noreault, 1979) • If no relevant documents are indicated, the term list shown to the user is the list of related terms based on those previously sorted in the inverted file

  14. Query Expansion with Term Reweighting • The vast amount of relevance feedback and query expansion research has been done using both query expansion and term-reweighting. • Three of most used feedback methods: • Ide Regular

  15. Query Expansion with Term Reweighting(cont’d) • Ide dec-hi • Standard Rocchio Si = the top ranked non-relevant document

  16. Automatic Query Modification • The major disadvantage of relevance feedback is that it increase the burden on the users [X97]. • Approaches for automatic query modification: • Local feedback • Automatic query expansion • Dictionary-based • Global analysis • Local analysis

  17. Local Feedback • Local feedback is similar to relevance feedback. • Difference: assume the top ranked documents are relevant without human judgment. • It saves the costs of relevance judgment, but it can result in poor retrieval if the top ranked documents are non-relevant.

  18. Automatic Query Expansion • Basic idea: • Expanding a user query using semantically similar and/or statistically associated terms with corresponding weights are added. • Thesauri are needed for similarity judgment. • Two approach for thesauri construction: • Manual thesauri • Automatic thesauri

  19. Dictionary-based Query Expansion • Based on manual thesauri (e.g., WordNet [M95] ). • In expansion process, synonymous (or other semantic relations) words of initial query terms are selected and assigned each term a weight. • Disadvantage: • Construction of manual thesaurus requires a lot of human labor. • A general manual thesaurus does not consistently improve retrieval performance.

  20. Example - WordNet

  21. Automatic Thesauri Construction Approach • Thesauri are construction from the whole (a part of) the data corpus. • Basic idea of automatic thesauri construction: • Term co-occurrence • Methods of automatic thesauri construction: • Traditional TFxIDF [Y02] • Variant of TFxIDF (i.e., similarity thesaurus [QF93]) • Mining Association Rule Approach [WBO00]

  22. Example of Thesaurus Construction • To each term ti is associated a vector: Where • The relationship between two terms tu and tv According to [QF93]

  23. CRM Knowledge Discovery Text Mining 0.90 0.12 0.32 Data Warehouse 0.75 Data Mining 0.31 Decision Tree 0.56 0.50 0.50 Clustering Analysis 0.22 Classification Analysis 0.21 0.45 C4.5 Prediction Example of Thesaurus Construction (cont’d)

  24. Global Analysis • The whole collection of documents is used for thesaurus creation. • Approaches: • Similarity Thesaurus [QF93] • Statistical Thesaurus [CY92]

  25. Data Corpus Thesaurus Thesaurus Construction Initial User Query Query Expansion Expanded Query Relevant Documents Retrieve Global Analysis (cont’d)

  26. Local Analysis • Unlike the global analysis, only the top ranked documents are used for constructing thesaurus. • Approaches: • Local Clustering [AF77] • Local Content Analysis [X97, XC96, XC00] • According to [XC96, X97, X00], local analysis is more effective than global analysis.

  27. Top Ranked Documents Initial User Query 1st Retrieve Thesaurus Construction Query Expansion Expanded Query Relevant Documents 2nd Retrieve Local Analysis (cont’d)

  28. References • [AF77] Attar, R. and Fraenkel, A. S., “Local Feedback in Full-Text Retrieval Systems,” Journal of the ACM, Volume 24, Issue 3, 1977, pp.397-417. • [BR99] Baeza-Yates, R, Ribeiro-Neto, B, Modern Information Retrieval, Addison Wesley/ACM Pres, Harlow, England, 1999. • [CY92] Crouch, C. J., Yang, B., "Experiments in Automatic Statistical Thesaurus Construction," Proceedings of the 15th Annual International ACM SIGIR Conference on Research and development in information retrieval, 1992, pp.77-88. • [M95] Miller, G. A, “WordNet: A Lexical Database for English,” Communications of the ACM, Vol. 38, No. 11, November 1995, pp.39- 41. • [QF93] Qiu, Y., Frei, H. P., "Concept Based Query Expansion," Proceedings of the 16th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, 1993, pp. 160-169. • [WBO00] Wei, J., Bressan, S., and Ooi, B. C., “Mining Term Association Rules for Automatic Global Query Expansion: Methodology and Preliminary Results,”Proceedings of the First International Conference on Web Information Systems Engineering, Volume 1, 2000, pp. 366-373.

  29. References (cont’d) • [X97] Xu, J., “Solving the Word Mismatch Problem Through Automatic Text Analysis,” PhD Thesis, University of Massachusetts at Amherst, 1997. • [XC96] Xu, J. and Croft, W. B., “Query Expansion Using Local and Global Document Analysis,” Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996, pp. 4-11. • [XC00] Xu, J. and Croft, W. B., “Improving the Effectiveness of Information Retrieval with Local Context Analysis,” ACM Transactions on Information Systems, Volume 18, Issue 1, 2000, pp. 79-112. • [Y02] Yang, C., “Investigation of Term Expansion on Text Mining Techniques,” Master Thesis, National Sun Yet-Sen University, Taiwan, 2002.

More Related