sonja
Uploaded by
11 SLIDES
249 VIEWS
110LIKES

Enhancing Relevance Feedback in Document Retrieval: A Comparative Study of Section Queries

DESCRIPTION

This study investigates the effectiveness of section-based relevance feedback in improving document retrieval. We analyze the performance of section queries versus traditional retrieval methods using a test collection of over 1.3 million research papers. By evaluating the relevance judgments of user queries and the intersections of returned results, we highlight how marking significant subsections can lead to more relevant results. Our findings aim to provide insights into optimizing search processes and the practical significance of refined feedback techniques in information retrieval systems.

1 / 11

Download Presentation

Enhancing Relevance Feedback in Document Retrieval: A Comparative Study of Section Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Section Based Relevance Feedback Student: Nat Young Supervisor: Prof. Mark Sanderson

  2. Relevance Feedback • SE user marks document(s) as relevant • E.g. “find more like this” • Terms are extracted from full document • Whole document may not be relevant • Could marking a sub-section relevant be better?

  3. Test Collections • Simulate a real user’s search process • Submit queries in batch mode • Evaluate the result sets • Relevance Judgments • QREL: <topicId, docId> pairs (1 … n) • Traditionally produced by human assessors

  4. Building a Test Collection • Documents • 1,388,939 research papers • Stop words removed • Porter Stemmer applied • Topics • 100 random documents • Their sub-sections (6 per document)

  5. Building a Test Collection • In-edges • Documents that cite paper X • Found 943 using the CiteSeerX database • Out-edges • Documents cited by paper X • Found 397 using pattern matching on titles

  6. QRELs • Total • 1,340 QRELs • Avg. 13.4 QRELs per document • Previous work: • Anna Richie et. al. (2006) • 82 Topics, Avg. 11.4 QRELs • 196 Topics, Avg. 4.5 QRELs • Last year • 71 Topics, Avg. 2.9 QRELs

  7. Section Queries • RQ1 Do the sections return different results?

  8. Section Queries • RQ2 Do the sections return different relevant results? Avg. = The average number of relevant results returned @ 20. E.g. Abstract queries returned 2 QRELs

  9. Section Queries Average intersection sizes of relevant results E.g. Avg(|Abstract ∩ All|) = 0.63 Avg(|Abstract \ All|) = 1.37 100 - ((0.63 / 2) * 100) = 68.5% difference

  10. Section Queries Average set complement % of relevant results E.g. Section X returned n% different relevant results than section Y

  11. Next • Practical Significance • Does SRF provide benefits over standard RF?

More Related