1 / 42

Investigation of Various Factorization Methods for Large Recommender Systems

G. Takács, I. Pilászy, B. Németh and D. Tikk www.gravityrd.com 10th International Workshop on High Performance Data Mining (in conjunction with ICDM) Pisa, December 15th 2008. Investigation of Various Factorization Methods for Large Recommender Systems. Content. Problem definition

callum
Download Presentation

Investigation of Various Factorization Methods for Large Recommender Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. G. Takács, I. Pilászy, B. Németh and D. Tikk www.gravityrd.com 10th International Workshop on High Performance Data Mining (in conjunction with ICDM) Pisa, December 15th 2008 Investigation of Various Factorization Methods for Large Recommender Systems

  2. Content • Problemdefinition • Approaches • Matrixfactorization • Basics, BRISMF, Semipositive, Retraining • Further enhancements • Transductive MF, Neighbor based correction • Experimentalresults

  3. Collaborative filtering

  4. Problem definition I. 1 4 3 4 4 4 2 4

  5. Problem definition II. • The phenomenon can be modeled by the random triplet (U, I, R). • A realization of the phenomenon (u, i, r) means that the u-th user rated the i-th item with value r. • user id (range: {1, …, M}) • item id (range: {1, …, N}) • rating value (range: {r1, …, rL})

  6. Problem definition III. • The goal: predict R from on (U, I). • Error criterion: mean squared error (RMSE). • The task is nothing else than the classical regression estimation. • Classical methods fail because of the unusual characteristics of the predictor variables.

  7. Content • Problemdefinition • Approaches • Matrixfactorization • Basics, BRISMF, Semipositive, Retraining • Further enhancements • Transductive MF, Neighbor based correction • Experimentalresults

  8. Approaches • Matrix factorization: approximates the rating matrix by the product of two lower-rank matrices. • Neighbor based approach: defines similarity between the rows or the columns of the rating matrix. • Support based approach: characterizes the users based on the binarized rating matrix. • Restricted Boltzmann machine: models each user by a stochastic, recurrent neural network. • Global effects: cascades 1-variable predictors.

  9. Content • Problemdefinition • Approaches • Matrixfactorization • Basics, BRISMF, Semipositive, Retraining • Further enhancements • Transductive MF, Neighbor based correction • Experimentalresults

  10. Matrix Factorization (MF) • Idea: approximate the rating matrix as the product of two lower-rank matrices R≈P∙Q • Problem: huge number of parameters (e.g. 10 million), R is partially unknown. • Solution: incremental gradient descent. R: rating matrix (M x N) Q: item feature matrix: (K x N) P: user feature matrix (M x K)

  11. MF sample - learning P R 1 4 3 1.2 -0.5 4 4 1.2 0.9 4 2 4 0.4 -0.4 Q 1.4 0.8 -1.3 -0.1 0.5 -0.2 0.3 1.6 -0.4 0.5

  12. MF sample - learning P R 1 4 3 1.2 -0.5 4 4 1.2 0.9 4 2 4 0.4 -0.4 Q 1.4 0.8 -1.3 -0.1 0.5 -0.2 0.3 1.6 -0.4 0.5

  13. MF sample - learning P R 1 4 3 1.1 -0.4 4 4 1.2 0.9 4 2 4 0.4 -0.4 Q 1.3 0.8 -1.3 -0.1 0.5 -0.1 0.3 1.6 -0.4 0.5

  14. MF sample - learning P R 1 4 3 1.1 -0.4 4 4 1.2 0.9 4 2 4 0.4 -0.4 Q 1.3 0.8 -1.3 -0.1 0.5 -0.1 0.3 1.6 -0.4 0.5

  15. MF sample - learning P R 1 4 3 1.2 -0.3 4 4 1.2 0.9 4 2 4 0.4 -0.4 Q 1.3 0.9 -1.3 -0.1 0.5 -0.1 0.3 1.6 -0.4 0.4

  16. MF sample - learning P R 1 4 3 1.2 -0.3 4 4 1.2 0.9 4 2 4 0.4 -0.4 Q 1.3 0.9 -1.3 -0.1 0.5 -0.1 0.3 1.6 -0.4 0.4

  17. MF sample - learning P R 1 4 3 1.1 -0.2 4 4 1.2 0.9 4 2 4 0.4 -0.4 Q 1.3 0.9 -1.3 -0.0 0.5 -0.1 0.3 1.5 -0.4 0.4

  18. MF sample - learning P R 1 4 3 1.1 -0.2 4 4 1.2 0.9 4 2 4 0.4 -0.4 Q 1.3 0.9 -1.3 -0.0 0.5 -0.1 0.3 1.5 -0.4 0.4

  19. MF sample - learning P R 1 4 3 1.1 -0.2 4 4 1.1 0.8 4 2 4 0.4 -0.4 Q 1.3 0.9 -1.2 -0.0 0.5 -0.1 0.3 1.5 -0.3 0.4

  20. MF sample - learning P R 1 4 3 1.1 -0.2 4 4 1.1 0.8 4 2 4 0.4 -0.4 Q 1.3 0.9 -1.2 -0.0 0.5 -0.1 0.3 1.5 -0.3 0.4

  21. MF sample - learning P R 1 4 3 1.1 -0.2 4 4 1.2 0.9 4 2 4 0.4 -0.4 Q 1.3 0.9 -1.2 0.1 0.5 -0.1 0.3 1.6 -0.3 0.4

  22. MF sample - learning P R 1 4 3 1.1 -0.2 4 4 1.2 0.9 4 2 4 0.4 -0.4 Q 1.3 0.9 -1.2 0.1 0.5 -0.1 0.3 1.6 -0.3 0.4

  23. MF sample - learning P R 1 4 3 1.1 -0.2 4 4 1.2 0.9 4 2 4 0.5 -0.3 Q 1.5 0.9 -1.2 0.1 0.5 0.0 0.3 1.6 -0.3 0.4

  24. MF sample - learning P R 1 4 3 1.1 -0.2 4 4 1.2 0.9 4 2 4 0.5 -0.3 Q 1.5 0.9 -1.2 0.1 0.5 0.0 0.3 1.6 -0.3 0.4

  25. MF sample - learning P R 1 4 3 1.1 -0.2 4 4 1.2 0.9 4 2 4 0.4 -0.2 Q 1.5 0.9 -1.1 0.1 0.5 0.0 0.3 1.6 -0.2 0.4

  26. MF sample - learning P R 1 4 3 1.1 -0.2 4 4 1.2 0.9 4 2 4 0.4 -0.2 Q 1.5 0.9 -1.1 0.1 0.5 0.0 0.3 1.6 -0.2 0.4

  27. MF sample - learning P R 1 4 3 1.1 -0.2 4 4 1.2 0.9 4 2 4 0.5 -0.1 Q 1.5 0.9 -1.1 0.1 0.6 0.0 0.2 1.6 -0.2 0.4

  28. After a while...

  29. MF sample - learning P R 1 4 3 1.4 1.1 4 4 0.9 1.9 4 2 4 2.5 -0.3 Q 1.5 2.1 1.0 0.7 1.6 -1.0 0.0 1.8 1.6 0.8

  30. MF sample - prediction P R 1 4 3 3.3 2.4 1.4 1.1 4 4 -0.5 3.5 1.5 0.9 1.9 4 2 4 4.9 1.1 2.5 -0.3 Q 1.5 2.1 1.0 0.7 1.6 -1.0 0.0 1.8 1.6 0.8

  31. BRISMF • Enhancements on the previous model: • User and item Biases (offsets). • Regularization. • We can call this Biased Regularized Incremental Simultaneous MF (BRISMF). • This is a very effective MF variant indeed. • Leaving out any of these characteristics (B, R, I, S) leads to inferior accuracy.

  32. Semipositive MF • It is useful to put a nonnegativity constraint on the user feature matrix P. • There are many possible ways to implement this (e.g. PLSA, alternating least squares). • Our solution: if a user feature becomes negative after the update, then it is set to zero.

  33. Reset User Features • Disadvantage of BRISMF: user features updated at the beginning of an epoch may be inappropriate at the end of the epoch. • Solution: • 1) Reset user features at the end of the training. • 2A) Retrain user features. • 2B) Retrain both user and item features. R P P' Q

  34. Content • Problemdefinition • Approaches • Matrixfactorization • Basics, BRISMF, Semipositive, Retraining • Further enhancements • Transductive MF, Neighbor based correction • Experimentalresults

  35. Transductive MF • How is it possible to use the Netflix Qualifying set in the correction phase? • We use the following simple solution:

  36. Fast and Accurate NB Correction I. • Neighbor based (NB) methods can improve the accuracy of factor models, but conventional NB methods are not scalable. • Is it possible to integrate the NB approach into the factor model without losing scalability?

  37. Fast and Accurate NB Correction II. • Where sjk is (normalized scalar product based similarity): • OR (normalized Euclidean distance based similarity)

  38. NB Correction sample P R 1 4 4.1 1.4 1.6 0.5 4.2 4.2 Q 2.1 1.0 2.2 1.6 1.5 0.0 -1.0 0.7 1.6 0.8 Similarity: 0.2, Error: -0.5 Similarity: 0.8, Error: +0.2 Correction: -0.1

  39. Content • Problemdefinition • Approaches • Matrixfactorization • Basics, BRISMF, Semipositive, Retraining • Further enhancements • Transductive MF, Neighbor based correction • Experimentalresults

  40. Results I.

  41. Results II.

  42. Thanks!?

More Related