1 / 38

Combining Predictions for Accurate Recommender Systems

Combining Predictions for Accurate Recommender Systems. M. Jahrer 1 , A. Töscher 1 , R. Legenstein 2 1 Commendo Research & Consulting 2 Institute for Theoretical Computer Science, Graz University of Technology KDD ‘10 2010. 11. 26.

oro
Download Presentation

Combining Predictions for Accurate Recommender Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining Predictions for Accurate Recommender Systems M. Jahrer1, A. Töscher1, R. Legenstein2 1Commendo Research & Consulting 2Institute for Theoretical Computer Science, Graz University of Technology KDD ‘10 2010. 11. 26. Summarized and Presented by Sang-il Song, IDS Lab., Seoul National University

  2. Contents • The Netflix Prize • Neflix Dataset • Challenge of Recommendation • Review: Collaborative Techniques • Motivation • Blending Techniques • Linear Regression • Binned Linear Regression • Neural Network • Bagged Gradient Boosted Decision Tree • Kernel Ridge Regression • K-Nearest Neighbor Blending • Results • Conclusion

  3. The Netflix Prize Open competition for the best collaborative filtering algorithm The objective is to improve the performance of Netflix’s own recommendation algorithm by 10%

  4. Netflix Dataset 480,189 users 17,770 movies 100,480,507 ratings (training data) Each rating is formed as <user, movie, date of grade, grade>

  5. Recommendation Problem

  6. Measure of CF algorithm error • Root Mean Square Error (RMSE) • is estimated rating by algorithm • N is size of test dataset • The original Netflix Algorithm, called “Cinematch”, achieved an RMSE of about 0.95

  7. Challenges of Recommender System Reference R. Bell – Lesson From the Netflix Prize • Size of Data • Places premium on efficient algorithms • Stretched memory limits of standard PCs • 99% of data are missing • Eliminates many standard prediction methods • Certainly not missing at random • Countless Factors may affect ratings • Large imbalance in training data • Number of ratings per user or movie varies by several orders of magnitude • Information to estimate individual parameters varies widely

  8. Collaborative Filtering Techniques • Memory based Approach • KNN user-user • KNN item-item • Model based Approach • Singular Value Decomposition (SVD) • Asymmetric Factor Model (AFM) • Restricted Boltzmann Machine (RBM) • Global Effect (GE) • Combination: Residual Training

  9. KNN user-user • Traditional Approach for Collaborative Filtering • Methods • Find k similar users with user u • Aggregate their ratings for item i

  10. KNN user-user

  11. KNN item-item • Symmetric Approach to KNN user-user • Just flip user and item sides • Methods • Find k similar items with item i • Aggregate their ratings for user u

  12. KNN item-item

  13. SVD (matrix factorization) Singular Value Decomposition Dimension Reduction Technique by Matrix Factorization Capturing Latent Semantics

  14. SVD Example is factorized into R = x x

  15. Asymmetric Factorization Model user Item 1 Item 2 Item 3 An Extension of SVD Item is represented by feature vector (same as SVD) User is represented by items (different from SVD)

  16. Restricted Boltzmann Machine (RBM) Neural Network with one input layer and one hidden layer Handling sparsity problem of data very well

  17. Global Effects • Motivated from Data normalization • Based on user and item features • support (number of votes) • mean rating • mean standard deviation • Effective when applied to residuals of other algorithms

  18. Residual Training Model 1 Model 2 Model 3 A popular method to combine CF algorithms Several models are trained by sequentially

  19. Motivation • Combinations of different kinds of collaborative filtering • leads to significant performance improvements over individual algorithms

  20. Rookies “Thanks to Paul Harrison's collaboration, a simple mix of our solutions improved our result from 6.31 to 6.75”

  21. Arek Paterek “My approach is to combine the results of many methods (also two-way interactions between them) using linear regression on the test set. The best method in my ensemble is regularized SVD with biases, post processed with kernel ridge regression” http://rainbow.mimuw.edu.pl/~ap/ap_kdd.pdf

  22. U of Toronto “When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix’s own system.” http://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf

  23. Gravity home.mit.bme.hu/~gtakacs/download/gravity.pdf

  24. When Gravity and Dinosaurs Unite “Our common team blends the result of team Gravity and team Dinosaur Planet.” Might have guessed from the name…

  25. BellKor / KorBell And, yes, the top team which is from AT&T… “Our final solution (RMSE=0.8712) consists of blending 107 individual results. “

  26. Blending Problem

  27. Blending Methods Linear Regression (baseline) Binned Linear Regression Neural Network Bagged Gradient Boosted Decision Tree Kernel Ridge Regression K-Nearest Neighbor Blending

  28. Linear Regression • Baseline • Assume a quadratic error function • Find optimal linear combination weight w • By solving the least squares problem • Weight w can be calculated with ridge regression

  29. Binned Linear Regression • A Simple Extension of Linear Regression • Training dataset can be divided into B disjoint subjects • Training dataset may be very huge • Each subset can be used to learn different weight wb • Training set can be split by using following criteria: • Support (number of votes) • Time • Frequency (number of ratings from a user at day t).

  30. Neural Network (NN) Rating Alg 1 Alg 2 Alg 3 Alg 4 Efficient for huge data sets

  31. Bagged Gradient Boosted Decision Tree (BGBDT) • Single Decision Tree • Discretized output => limits its ability to model smooth functions • The number of possible outputs corresponds to the number of leaves • A Single tree is trained recursively by splitting always that leaf which provides the output value for the largest number of training samples • Bagging • Training Nbag copies of the model slightly different training set • (Stochastic Gradient) Boosting • Each model learns only a fraction of the desired function Ω

  32. BGBDT

  33. Kernel Ridge Regression Blending (KRR) • Kernel Ridge Regression • Regularized least square method for classification and regression • Similar to an SVM • But, emphasis on points which don’t close to the decision boundary • Suitable for a small number of features and many training data sets. • Training complexity: O(n3) • Space requirement s: O(n2)

  34. K-Nearest Neighbor Blending (KNN) Find k Similar Training Data Samples <user,item> Aggregate the target value

  35. Experimental Setup • 18 CF Algorithms • 4 versions of AFM • 4 versions of GE • 4 versions of KNN-item • 2 versions of RBM • 4 versions of SVD • 1,400,000 samples • Running at 3.8 GHz CPU with 12GB main memory

  36. Results

  37. Conclusions The combinations of Collaborative Filtering Algorithms outperforms the single collaborative filtering algorithms

  38. Thank you

More Related