Boosting the Ranking Function Learning Process using Clustering
This paper discusses an innovative approach to improve the training input of ranking function learning systems by utilizing clustering. With the ever-growing web, search engines yield excessive results that are challenging for users to sift through. The study focuses on deriving user feedback—both explicit and implicit—more efficiently by clustering search results based on similarity. By expanding relevance judgments from a limited set of top results using clustering methods, the research aims to automate user feedback generation, ultimately enhancing the search experience.
Boosting the Ranking Function Learning Process using Clustering
E N D
Presentation Transcript
Boosting the Ranking Function Learning Process using Clustering WIDM 2008
Outline • Introduction • Problem definition • Approach • Evaluation • Conclusion
Introduction • Abstract • Web continuously grows, the results returned by search engines are too many to review • User feedback has gained a lot of attention • Require a big amount of user feedback on the results • Goal: • Produce user feedback “automatically” by using some methods
Problem definition • User feedback • Explicit feedback (user relevnacejudgement) • Implicit feedback • Click information • Users usually inspect only the first few results returned by a search engine, and click even fewer • Collect relevance judgements from clickthrough data is time consuming process • Problem • How to use explicit feedback to generate implicit feedback?(relevance relations expansion)
Approach procedure • Process • Assume that only the relevance judgements of the top-10 results are available for each query (by BM25 feature) • Group all the search results into clusters of documents having similar content • Expand the initial set(top-10 results) of relevance judgements using cluster information
Clustering • Represent each document by a feature vector • total number of distinct terms in all documents • Cluster method • Bisetion clustering • Similarity • Cosine similarity
Relation expansion Train query Train query expansion
Relation expansion • Expansion Algorithm:
Evaluation • Dataset • Letor OHSUMED collection • 348,566 records and 16,140 relevance judgements • 84 training queries and 22 testing queries • Relevance judgement • 0(irrelevant), 1(partially relevant), 2(strongly relevant) • Training method • RankSVM
Evaluation • Clustering precision
Evaluation Use 160 relevance judgements
Conclusion • We presented a methodology for increasing the training input of ranking function learning systems • Future work • Decision on whether a cluster is valid • Different Cluster label ways