1 / 29

EigenTaste: A Constant Time Collaborative Filtering Algorithm

EigenTaste: A Constant Time Collaborative Filtering Algorithm. Ken Goldberg Students: Theresa Roeder, Dhruv Gupta, Chris Perkins Industrial Engineering and Operations Research Electrical Engineering and Computer Science UC Berkeley. CF Problem Definition.

iokina
Download Presentation

EigenTaste: A Constant Time Collaborative Filtering Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EigenTaste:A Constant Time Collaborative Filtering Algorithm Ken Goldberg Students: Theresa Roeder, Dhruv Gupta, Chris Perkins Industrial Engineering and Operations Research Electrical Engineering and Computer Science UC Berkeley

  2. CF Problem Definition • A set of objects (movies, books, jokes) • A user rates a subset of objects • Based on the ratings, retrieve objects from the complement of this subset. Criteria: • Effective : recommended objects should receive high ratings • Efficient : the online recommendation process should run quickly and be scalable

  3. Some Previous Work • D. Goldberg, et al. - Tapestry (1992) • Riedel, Resnick, Konstan et. al. - GroupLens(1994-) • Shardanand and Maes - Ringo (1995) • Resnick and Varian (1997) • Breese et. al. at Microsoft Research (1998) • Pazzani (1999) • Herlocker et. al. - GroupLens (1999)

  4. WWW-based Recommender Systems MovieLens Firefly MovieCritic

  5. EigenTaste Algorithm 1) Principal Component Analysis 2) Universal Queries (dense ratings matrix) 3) Fine-grained ratings bar (captures nuances) 4) Offline and Online Processing 5) Online: Constant time recommendations

  6. Universal Queries • Most CF systems require users to select which items they want to rate: sparse ratings matrix • Eigentaste allows users to rate all items based on short unbiased descriptions (eg, film synopsis) • Eigentaste uses a subset of highly discriminatory items for the gauge set

  7. Continuous Rating Scale Disapprove Approve

  8. EigenTaste Algorithm • A is the n x m normalized rating matrix • n users • m objects • C is the k x k reduced correlation matrix • k objects in the gauge set: • C = (1/n) ATA • assumes ratings are continuous with linear rel. • E is the ortho. matrix of eigenvectors of C •  is the diagonal matrix of eigenvalues

  9. Correlation Matrix

  10. EigenTaste • ECET =  • C = ETE • Let B = AET • RB = (1/n) BTB = ECET =  • transformed points are uncorrelated and each column of B has variancei • Principle Components (Pearson 1901) • consider m largest eigenvectors, Em • Bm = AEmT • choose m based on “knee” in eigenvalues

  11. Dimensionality Reduction • First two principal components (eigenvectors) account for nearly 50% of the variation in user ratings • Project user ratings along first two principal components: x = AE2T • Facilitates visualization ...

  12. Eigen Plane Recursive Clustering

  13. The EigenTaste Algorithm • Offline: • Compute eigenvectors and project users onto eigen plane. • Cluster and compute average ratings for each cluster. • Online: • Collect ratings for objects in gauge set • Project onto the eigen plane • Find representative cluster • Recommend objects based on average ratings within that cluster

  14. First Application (1999)Jester: Recommending Jokes • Sense of humor is difficult to specify • Advantages: • Rating process is not altogether unpleasant • Can evaluate jokes quickly: • Dense ratings matrix (large sample size) • Disadvantages: • Offensive/Shaggy Dog jokes • Temporal Effects, Portfolio Effects • Priming/Masking

  15. Jester: User Interface

  16. System Architecture Login Interface CGI Web Server Recommendation Engine CGI Client Content Database User Rating Profiles Internet

  17. Measure of Effectiveness Metric: Normalized Mean Absolute Error (NMAE): Average absolute deviation of actual ratings from predicted ratings, normalized over rating range. MAE = 1/c  |r - p| NMAE = MAE / (r_max - r_min)

  18. Based on 18,000 users Effectiveness

  19. Computational Complexity n - number of users k - number of objects in gauge set Nearest Neighborhood algorithm : Online processing - O(kn) EigenTaste algorithm: Offline processing - O(k2n) Online processing - O(k)

  20. Effectiveness and Efficiency

  21. Prediction Speed Time to Algorithm process 9000 users 28 hours Nearest Neighbor EigenTaste 3 minutes

  22. Current Jester Dataset 62,000 registered users approx. 3,000,000 ratings

  23. Second Application (2000) Sleeper: Recommending Books

  24. EigenTaste Algorithm 1) Principal Component Analysis 2) Universal Queries (dense ratings matrix) 3) Fine-grained ratings bar (captures nuances) 4) Offline and Online Processing 5) Online: Constant time recommendations • Patent application • 21 December 1999 by UC Regents

  25. www.cs.berkeley.edu/~goldberg goldberg@cs.berkeley.edu Eigentaste: A Constant Time Collaborative Filtering Algorithm (to appear: Information Retrieval Journal, 2001)

More Related