1 / 16

Learning to rank Web Science 2013

Learning to rank Web Science 2013. Jaspreet Singh. Overview. Optimizing search engines using click through data. Thorsten Joachims , SIGKDD 2002 . Large Scale learning to rank. D. Sculley . Machine Learning Algorithm. Retrieval function.

brent
Download Presentation

Learning to rank Web Science 2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to rankWeb Science 2013 Jaspreet Singh

  2. Overview • Optimizing search engines using click through data. Thorsten Joachims, SIGKDD 2002. • Large Scale learning to rank. D. Sculley. Machine Learning Algorithm Retrieval function

  3. Optimizing search engines using click through data. • Explicit feedback vs Click through data • Click through data as triplets (q,r,c) qis the query, r is the ranking, c is the list of links the user has clicked on Assuming that the user scanned the ranking from top to bottom, he must have observed link 2 before clicking on 3, making a decision to not click on it.

  4. Learning of retrieval functions • Exact ordering of documents close to impossible • Measure similarity between optimal ordering and given ordering using average precision (Kendall’s tau) • Maximizing Kendall’s tau is equivalent to reducing the average rank. • For a fixed but unknown distribution Pr(q,r∗) of queries and target rankings on a document collection D with m documents, the goal is to learn a retrieval function f(q) for which the expected Kendall’s τ is maximal • The above equation is equivalent to a risk function where – τ is the loss. • Empirical risk minimization principle states that the learning algorithm should choose a hypothesis  which minimizes the empirical risk

  5. Rank SVM • Is it possible to design an algorithm and a family of ranking functions F so that finding the function f belonging to F maximizing τ is efficient and that this function generalizes well beyond the training data. • Usage of weight vectors to adjust rank. • Instead of maximizing τ directly, it is equivalent to minimize the number of discordant pairs in the calculation of τ. This is equivalent to finding the weight vector so that the maximum number of the following inequalities is fulfilled:

  6. Rank SVM • NP hard problem similar to SVM classification • Use some regularization parameters to bound and approximate the result. • SVM light

  7. Experiments • Meta search engine used to collect results from the best search engines and combine them into a single list by union. • To be able to compare the quality of different retrieval functions, the key idea is to present two rankings at the same time. Then measure which ranking has more clicks. Ranking A D1 D2 D3 Ranking B D4 D5 D6 Union D1 D4 D2 D5 D3 D6

  8. Experiments • Offline experiment: verify that the Ranking SVM can indeed learn a retrieval function maximizing Kendall’s tau on partial preference feedback. • Split the collected queries into training and test set and then train the classifier using SVM light. • Result: Ranking SVM can learn regularities in the preferences. More the training queries lesser the error. • Online experiment: verifies that the learned retrieval function does improve retrieval quality as desired. • The learned retrieval function is compared against : Google, MSNSearch, Toprank • Result: More links from the learned ranking clicked on.

  9. Conclusion • The key insight is that such click through data can provide training data in the form of relative preferences • The experimental results show that the Ranking SVM can successfully learn an improved retrieval function from click through data. Without any explicit feedback or manual parameter tuning, it has automatically adapted to the particular preferences of a group of 20 users(112 queries). • There is a trade-off between the amount of training data (ie. large group) and maximum homogeneity (ie. single user)

  10. Overview • Optimizing search engines using click through data. Thorsten Joachims, SIGKDD 2002. • Large Scale learning to rank. D. Sculley. Machine Learning Algorithm Retrieval function

  11. Large scale learning to rank • Pair-wise learning to rank methods such as Rank SVM give good performance, but suffer from the computational burden of optimizing an objective defined over O(n2) possible pairs for data sets with n examples. • Removal of super-linear dependence on training set size by sampling pairs from an implicit pair-wise expansion and applying efficient stochastic gradient descent learners for approximate SVMs • The main approach of this paper is to adapt the pair-wise learning to rank problem into the stochastic gradient descent framework

  12. Optimization and stochastic gradient descent • The paper is restricted to solving the classic Rank SVM optimization problem, first posed by Joachims: Minimize the hinge loss. • Stochastic gradient descent is a gradient descent optimization method for minimizing an objective function that is written as a sum of differentiable functions. • generalization ability of stochastic gradient descent relies only on the number of stochastic steps taken, not the size of the data set

  13. Indexed Sampling - GetRandomnPair • 2 level nested hashmap • First level : query is key • Second level: rank is key

  14. Stochastic gradient descent • Stochastic implies sampling • Gradient descent is a step wise process to find the local minimum of a function. • Rank SVM has a hinge loss function. The hinge loss is used for "maximum-margin" classification. • Hence we need to minimize this function and get a good classifier. • The hinge loss is a convex function, so many of the usual convex optimizers used in machine learning can work with it.  Hence we can use SGD. • Depending on how they perform updates to the weight vector there are many SGD variations.

  15. LETOR Experiment and Results • LETOR: Learning to Rank for Information Retrieval • Ranking performance: comparable if not better • Training speed: 100 times faster

  16. Conclusion • Click through data can be used as partial relevance feedback • We can learn a retrieval function that can improve mean average precision • Learning retrieval functions can be done on a large scale using stochastic gradient descent. Machine Learning Algorithm Retrieval function

More Related