1 / 16

Query Dependent Ranking Using K-Nearest Neighbor

Query Dependent Ranking Using K-Nearest Neighbor. SIGIR 2008 Presenter: Hsuan -Yu Lin. Outline. Introduction Motivation Method Experiments Conclusion. Introduction.

seven
Download Presentation

Query Dependent Ranking Using K-Nearest Neighbor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Dependent Ranking Using K-Nearest Neighbor SIGIR 2008 Presenter: Hsuan-Yu Lin

  2. Outline • Introduction • Motivation • Method • Experiments • Conclusion

  3. Introduction • Machine learning techniques proposed in information retrieval • In web search • Queries may vary largely in semantics and the users’ intensions • Goal • To Propose the approach of query-dependent ranking • Use K-Nearest Neighbor(KNN) to find the relative training queries for model training

  4. Motivation • Single ranking model alone couldn’t deal with the cases properly • Not use query classification approach • It is hard to draw clear boundaries between the queries in different cateogries

  5. Motivation • Reduce the feature space from 27 to 2-dimensions by using Principal Component Analysis

  6. Motivation • Why to use KNN approach? • With high probability a query belongs to the same category as those of its neighbors • View KNN as an algorithm performing ‘soft’ classification in the query feature space

  7. KNN Online • Define query feature: • For each query q, using a reference model(BM25)to find its top T ranked documents, and take the mean of the feature values of the T documents as a feature of the query

  8. KNN Online • (b) and (c) cost much time for online algorithm

  9. KNN Offline-1

  10. KNN Offline-2

  11. Time Complexity n: number of documents to be ranked for the test query k: number of nearest neighbors m: number of queries in the training data • It usually spend time on training model

  12. Experiments • Dataset • Two datasets • Dataset 1: 1,500 training queries, 400 test queries • Dataset 2: 3,000 training queries, 800 test queries • Label: 0~5 (perfect, excellent, good, fair, bad) • Feature: 200 • Learning approach • Rank-SVM • Parameter • λ: 0.01 • T(Top T documents for a query):50 • K:400(dataset 1), 800(dataset 2) • Evaluation measure • NDCG

  13. Experiments • Baseline • Single: single model approach • QC: query classification based approach • Classify queries into three categories(topic distillation, name page finding, homepage finding)

  14. Experiments • Dataset 1:

  15. Experiments • Y-axle: change ratio between online and offline method • If change ratio is small, means the two sets have large overlap

  16. Conclusion • Using different models based on different properties of queries • Propose K-Nearest Neighbor approach for selecting training data • Future work • Complexity of offline processing is still high • Use KD-trees or other advanced structures for nearest neighbor search • Query feature definition

More Related