290 likes | 461 Views
Item-Based Collaborative Filtering Recommendation Algorithms. Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army HPC Research Center Department of Computer Science and Engineering University of Minnesota, Minneapolis, 2001 2008. Nov. 05
E N D
Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army HPC Research Center Department of Computer Science and Engineering University of Minnesota, Minneapolis, 2001 2008. Nov. 05 Presented by Eun-gyeong Kim, IDS Lab.
Contents • Introduction • Collaborative Filtering Based Recommender Systems • Overview of the Collaborative Filtering Process • Challenges of User-based Collaborative Filtering Algorithms • Item-based Collaborative Filtering Algorithm • Item Similarity Computation • Prediction Computation • Performance Implications • Experimental Evaluation • Contributions • Discussion & Conclusion Center for E-Business Technology
Introduction (What is Collaborative filtering?) • Now it is time to create the technologies that can help us sift through all the available information to find that which is most valuable to us. • One of the most promising such technologies is collaborative filtering • Collaborative filtering (by Wikipedia) • The process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. • The underlying assumption of CF approach is that those who agreed in the past tend to agree again in the future • CF systems usually take two steps • Look for users who share the same rating patterns with the active user • Use the ratings from those like-minded users found in step 1 to calculate a prediction for the active user Center for E-Business Technology
Two main Categories of CF algorithms • Memory-based CF Algorithms • Utilize the entire user-item database to generate a prediction • Employ statistical techniques to find the neighbors • Model-based CF Algorithms • First developing a model of user ratings. • Computing the expected value of a user prediction , given his/her ratings on other items. • To build the model • Bayesian network (probabilistic) • clustering (classification) • rule-based approaches (association rules between co-purchased items) Center for E-Business Technology
Recommendation Algorithms • User-based collaborative filtering • Traditional Collaborative Filtering • Cluster Models • Item-based collaborative filtering • Search-based Methods • Item-to-item collaborative filtering Amazon.com Recommendations: Item-to-Item Collaborative Filtering http://www.win.tue.nl/~laroyo/2L340/resources/Amazon-Recommendations.pdf Center for E-Business Technology
CF Based Recommender Systems • provide item recommendations or predictions based on the opinions of other like-minded users 2 3 4 Center for E-Business Technology
Traditional Collaborative Filtering (1) • Represents a customer as an N-dimensional vector of items, where N is the number of distinct catalog items • For almost all customers, this vector is extremely sparse • Generates recommendations based on a few customers(neighbors) who are most similar to the user • Measure the similarity of two customers, A and B Center for E-Business Technology
Traditional Collaborative Filtering (2) • Generate recommendations • A common technique is to rank each item according to how many similar customers purchased it • O(MN) in the worst case • Performance tends to be closer to O(M+N) because the average customer vector is extremely sparse • Scaling issues • Reduce the data size • Reduce M by randomly sampling the customers or discarding customers with few purchases • Reduce N by discarding very popular or unpopular items • Reduce recommendation quality • We need better algorithms to scale to large data sets and at the same time produce high-quality recommendations Center for E-Business Technology
Challenges of User-based CF Algorithms • Challenges • Sparsity • A person may have purchased well under 1% of the items • (1% of 2 million books is 20,000 books) • The accuracy of recommendations may be poor • Scalability • Computation grows with both the number of users and the number of items • Traditional CF does little or no offline computation, and its online computation scales with the number of customers and catalog items. => The key to item-to-item CF’s scalability and performance is that it creates the expensive similar-items table offline Center for E-Business Technology
Item-based CF Algorithm • Similarity computation between two item i and j • First isolate the users who have rated both of these items • Then apply a similarity computation technique to determine the similarity • Prediction generation • Take a weighted average of the target user’s ratings on these similar items Center for E-Business Technology
Item Similarity Computation Center for E-Business Technology
Item Similarity Computation Center for E-Business Technology
Prediction Computation Center for E-Business Technology
Prediction Computation • Weighted Sum • Compute the sum of the ratings given by the user on the items similar to I • Each ratings is weighted by the corresponding similarity • Regression • Similarities computed using cosine or correlation measures may be misleading • Approximated values based on a linear regression model are used (Instead of using the similar item N’s “raw” ratings values ) Center for E-Business Technology
Weighted Sum Example • Let’s predict the value of item i1 for u4 Center for E-Business Technology
Item-to-item CF in Amazon.com • We could build a product-to-product matrix by iterating through all item pairs and computing a similarity metric for each pair. • However, many product pairs have no common customers, thus the approach is inefficient in terms of processing time and memory usage • Better approach by calculating the similarity between a single product and all related products • in the worst case • in practical Center for E-Business Technology
Performance Implications • Precompute item-item similarity scores • In a typical E-Commerce scenario, we usually have a set of item that is static compared to the number of users that changes most often • Compute all-to-all similarity and then performing a quick table look-up to retrieve the required similarity values • Generating predictions for a user u on item i • Retrieves the precomputed k most similar items corresponding to the target item i • Then intersect between those k items and items purchased by the user u • The prediction is computed using basic item-based CF algorithm Center for E-Business Technology
Experimental Evaluation: Data set • Movie data • Data from MovieLens • 943 users (among 43,000 users ) • 1682 movies (among over 3,500 different movies) • 100,000 ratings (only considered users that had rated 20 or more movies) • Divided the DB into a training set and a test set. • X=0.8 (80% of the data is used as training set) • Sparsity level: Center for E-Business Technology
Experimental Evaluation: Evaluation Metrics • Statistical accuracy metrics • Mean Absolute Error (MAE) is a measure of the deviation of recommendations from their true user-specified values. • The lower the MAE, the more accurately the recommendation engine predicts user ratings. • Decision support accuracy metrics Center for E-Business Technology
Experimental Results (1) • Effect of Similarity Algorithms Center for E-Business Technology
Experimental Results (2) • Sensitivity of Training/Test Ratio • Experiments with neighborhood size Center for E-Business Technology
Experimental Results (3) • Quality Experiments Center for E-Business Technology
Sensitivity of the Model Size • The High accuracy that can be achieved using only a fraction of items • It is useful to precompute the item similarities using only a fraction of items and yet possible to obtain good prediction quality 100% 96% 98.3% Center for E-Business Technology
Impact of the model size on run-time and throughput Center for E-Business Technology
Contributions • Analysis of the item-based prediction algorithms and identification of different ways to implement its subtasks • Formulation of a precomputed model of item similarity to increase the online scalability of item-based recommendations • An experimental comparison of the quality of several different item-based algorithms to the classic user-based (nearest neighbor) algorithms Center for E-Business Technology
Discussion & Conclusion • Discussion • Item-item scheme provides better quality of predictions than the user-user scheme • Item neighborhood is fairly static, which can be pre-computed, which results in very high online performance • Possible to retain only a small subset of items and produce reasonably good prediction quality • Conclusion • Item-based techniques allow CF-based algorithms to scale to large data sets and at the same time produce high-quality recommendations Center for E-Business Technology
My comments • Lack of explanations about recommendation process • Does the calculated similarity really represent the similarity of items? • Lack of explanations about the range of similarity value • Can’t we precompute the similarity of users? Center for E-Business Technology
References • Amazon.com Recommendations: Item-to-Item Collaborative Filtering http://www.win.tue.nl/~laroyo/2L340/resources/Amazon-Recommendations.pdf • Item-based Collaborative Filtering Recommendation Algorithms http://www.grouplens.org/papers/pdf/www10_sarwar.pdf Center for E-Business Technology