1 / 51

A fast algorithm for learning large scale preference relations

A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram Siemens medical solutions USA AISTATS 2007. Learning . Many learning tasks can be viewed as function estimation.

dory
Download Presentation

A fast algorithm for learning large scale preference relations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A fast algorithm for learning large scale • preference relations • Vikas C. Raykarand Ramani Duraiswami • University of Maryland College Park • Balaji Krishnapuram • Siemens medical solutions USA • AISTATS 2007

  2. Learning Many learning tasks can be viewed as function estimation.

  3. Learning from examples Not all supervised learning procedures fit in the standard classification/regression framework. In this talk we are mainly concerned with ranking/ordering. Learning algorithm Training

  4. Ranking / Ordering For some applications ordering is more important Example 1: Information retrieval Sort in the order of relevance

  5. Ranking / Ordering For some applications ordering is more important Example 2: Recommender systems Sort in the order of preference

  6. Ranking / Ordering For some applications ordering is more important Example 3: Medical decision making Decide over different treatment options

  7. Plan of the talk • Ranking formulation • Algorithm • Fast algorithm • Results

  8. Given a we can order/rank a set of instances. Preference relations Goal - Learn a preference relation Training data – Set of pairwise preferences

  9. Ranking function Provides a numerical score Not unique Goal - Learn a preference relation New Goal - Learn a ranking function Why not use classifier/ordinal regressor as the ranking function?

  10. Pairwise disagreements Pairwise preference Relations Why is ranking different? Learning algorithm Training

  11. Training data..more formally From these two we can get a set of pairwise preference realtions

  12. Loss function.. Minimize fraction of pairwise disagreements Maximize fraction of pairwise agreements Total # of pairwise agreements Total # of pairwise preference relations Generalized Wilcoxon-Mann-Whitney (WMW) statistic

  13. + + + - + - - + - + - + - Consider a two class problem

  14. Function class..Linear ranking function • Different algorithms use different function class • RankNet – neural network • RankSVM – RKHS • RankBoost – boosted decision stumps

  15. Plan of the talk • Ranking formulation • Training data – Pairwise preference relations • Ideal Loss function – WMW statistic • Function class – linear ranking functions • Algorithm • Fast algorithm • Results

  16. Choose w to maximize The Likelihood Discrete optimization problem Log-likelihood Assumption : Every pair is drawn independently Sigmoid[Burges et.al.]

  17. The MAP estimator

  18. Another interpretation O-1 indicator function What we want to maximize Log-sigmoid What we actually maximize Log-sigmoid is a lower bound for the indicator function

  19. Lower bounding the WMW Log-likelihood <= WMW

  20. Gradient based learning • Use nonlinear conjugate-gradient algorithm. • Requires only gradient evaluations. • No function evaluations. • No second derivatives. • Gradient is given by

  21. Pairwise preference relations Cross entropy Backpropagation neural net RankNet Learning algorithm Training

  22. Pairwise preference relations Pairwise disagreements SVM RKHS RankSVM Learning algorithm Training

  23. Pairwise preference relations Pairwise disagreements Boosting Decision stumps RankBoost Learning algorithm Training

  24. Plan of the talk • Ranking formulation • Training data – Pairwise preference relations • Loss function – WMW statistic • Function class – linear ranking functions • Algorithm • Maximize a lower bound on WMW • Use conjugate-gradient • Quadratic complexity • Fast algorithm • Results

  25. Key idea • Use approximate gradient. • Extremely fast in linear time. • Converges to the same solution. • Requires a few more iterations.

  26. Core computational primitive Weighted summation of erfc functions

  27. Notion of approximation

  28. Example

  29. 1. Beauliu’s series expansion Derive bounds for this to choose the number of terms Retain only the first few terms contributing to the desired accuracy.

  30. 2. Error bounds

  31. 3. Use truncated series

  32. 3. Regrouping Once A and B are precomputed Can be computed in O(pM) Does not depend on y. Can be computed in O(pN) Reduced from O(MN) to O(p(M+N))

  33. 3. Other tricks • Rapid saturation of the erfc function. • Space subdivision • Choosing the parameters to achieve • the error bound • See the technical report

  34. Numerical experiments

  35. Precision vs Speedup

  36. Plan of the talk • Ranking formulation • Training data – Pairwise preference relations • Loss function – WMW statistic • Function class – linear ranking functions • Algorithm • Maximize a lower bound on WMW • Use conjugate-gradient • Quadratic complexity • Fast algorithm • Use fast approximate gradient • Fast summation of erfc functions • Results

  37. Datasets • 12 public benchmark datasets • Five-fold cross-validation experiments • CG tolerance 1e-3 • Accuracy for the gradient computation 1e-6

  38. Direct vs Fast -WMW statistic WMW is similar for both the exact and the fast approximate version.

  39. Direct vs Fast – Time taken

  40. Effect of gradient approximation

  41. Comparison with other methods • RankNet - Neural network • RankSVM - SVM • RankBoost - Boosting

  42. Comparison with other methods • WMW is almost similar for all the methods. • Proposed method faster than all the other methods. • Next best time is shown by RankBoost. • Only proposed method can handle large datasets.

  43. Sample result

  44. Sample result

  45. Application to collaborative filtering • Predict movie ratings for a user based on the ratings provided by other users. • MovieLens dataset (www.grouplens.org) • 1 million ratings (1-5) • 3592 movies • 6040 users • Feature vector for each movie – rating provided by d other users

  46. Collaborative filtering results

  47. Collaborative filtering results

  48. Plan/Conclusion of the talk • Ranking formulation • Training data – Pairwise preference relations • Loss function – WMW statistic • Function class – linear ranking functions • Algorithm • Maximize a lower bound on WMW • Use conjugate-gradient • Quadratic complexity • Fast algorithm • Use fast approximate gradient • Fast summation of erfc functions • Results • Similar accuracy as other methods • But much much faster

  49. Future work • Ranking formulation • Training data – Pairwise preference relations • Loss function – WMW statistic • Function class – linear ranking functions • Algorithm • Maximize a lower bound on WMW • Use conjugate-gradient • Quadratic complexity • Fast algorithm • Use fast approximate gradient • Fast summation of erfc functions • Results • Similar accuracy as other methods • But much much faster Other applications neural network Probit regression Code coming soon

  50. Future work • Ranking formulation • Training data – Pairwise preference relations • Loss function – WMW statistic • Function class – linear ranking functions • Algorithm • Maximize a lower bound on WMW • Use conjugate-gradient • Quadratic complexity • Fast algorithm • Use fast approximate gradient • Fast summation of erfc functions • Results • Similar accuracy as other methods • But much much faster Nonlinear Kernelized Variation. Other applications neural network Probit regression

More Related