Item-Based Collaborative Filtering Recommendation Algorithms

Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army HPC Research Center Department of Computer Science and Engineering University of Minnesota, Minneapolis, 2001 2008. Nov. 05 Presented by Eun-gyeong Kim, IDS Lab.

Contents • Introduction • Collaborative Filtering Based Recommender Systems • Overview of the Collaborative Filtering Process • Challenges of User-based Collaborative Filtering Algorithms • Item-based Collaborative Filtering Algorithm • Item Similarity Computation • Prediction Computation • Performance Implications • Experimental Evaluation • Contributions • Discussion & Conclusion Center for E-Business Technology

Introduction (What is Collaborative filtering?) • Now it is time to create the technologies that can help us sift through all the available information to find that which is most valuable to us. • One of the most promising such technologies is collaborative filtering • Collaborative filtering (by Wikipedia) • The process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. • The underlying assumption of CF approach is that those who agreed in the past tend to agree again in the future • CF systems usually take two steps • Look for users who share the same rating patterns with the active user • Use the ratings from those like-minded users found in step 1 to calculate a prediction for the active user Center for E-Business Technology

Two main Categories of CF algorithms • Memory-based CF Algorithms • Utilize the entire user-item database to generate a prediction • Employ statistical techniques to find the neighbors • Model-based CF Algorithms • First developing a model of user ratings. • Computing the expected value of a user prediction , given his/her ratings on other items. • To build the model • Bayesian network (probabilistic) • clustering (classification) • rule-based approaches (association rules between co-purchased items) Center for E-Business Technology

Recommendation Algorithms • User-based collaborative filtering • Traditional Collaborative Filtering • Cluster Models • Item-based collaborative filtering • Search-based Methods • Item-to-item collaborative filtering Amazon.com Recommendations: Item-to-Item Collaborative Filtering http://www.win.tue.nl/~laroyo/2L340/resources/Amazon-Recommendations.pdf Center for E-Business Technology

CF Based Recommender Systems • provide item recommendations or predictions based on the opinions of other like-minded users 2 3 4 Center for E-Business Technology

Traditional Collaborative Filtering (1) • Represents a customer as an N-dimensional vector of items, where N is the number of distinct catalog items • For almost all customers, this vector is extremely sparse • Generates recommendations based on a few customers(neighbors) who are most similar to the user • Measure the similarity of two customers, A and B Center for E-Business Technology

Traditional Collaborative Filtering (2) • Generate recommendations • A common technique is to rank each item according to how many similar customers purchased it • O(MN) in the worst case • Performance tends to be closer to O(M+N) because the average customer vector is extremely sparse • Scaling issues • Reduce the data size • Reduce M by randomly sampling the customers or discarding customers with few purchases • Reduce N by discarding very popular or unpopular items • Reduce recommendation quality • We need better algorithms to scale to large data sets and at the same time produce high-quality recommendations Center for E-Business Technology

Challenges of User-based CF Algorithms • Challenges • Sparsity • A person may have purchased well under 1% of the items • (1% of 2 million books is 20,000 books) • The accuracy of recommendations may be poor • Scalability • Computation grows with both the number of users and the number of items • Traditional CF does little or no offline computation, and its online computation scales with the number of customers and catalog items. => The key to item-to-item CF’s scalability and performance is that it creates the expensive similar-items table offline Center for E-Business Technology

Item-based CF Algorithm • Similarity computation between two item i and j • First isolate the users who have rated both of these items • Then apply a similarity computation technique to determine the similarity • Prediction generation • Take a weighted average of the target user’s ratings on these similar items Center for E-Business Technology

Item Similarity Computation Center for E-Business Technology

Prediction Computation Center for E-Business Technology

Prediction Computation • Weighted Sum • Compute the sum of the ratings given by the user on the items similar to I • Each ratings is weighted by the corresponding similarity • Regression • Similarities computed using cosine or correlation measures may be misleading • Approximated values based on a linear regression model are used (Instead of using the similar item N’s “raw” ratings values ) Center for E-Business Technology

Weighted Sum Example • Let’s predict the value of item i1 for u4 Center for E-Business Technology

Item-to-item CF in Amazon.com • We could build a product-to-product matrix by iterating through all item pairs and computing a similarity metric for each pair. • However, many product pairs have no common customers, thus the approach is inefficient in terms of processing time and memory usage • Better approach by calculating the similarity between a single product and all related products • in the worst case • in practical Center for E-Business Technology

Performance Implications • Precompute item-item similarity scores • In a typical E-Commerce scenario, we usually have a set of item that is static compared to the number of users that changes most often • Compute all-to-all similarity and then performing a quick table look-up to retrieve the required similarity values • Generating predictions for a user u on item i • Retrieves the precomputed k most similar items corresponding to the target item i • Then intersect between those k items and items purchased by the user u • The prediction is computed using basic item-based CF algorithm Center for E-Business Technology

Experimental Evaluation: Data set • Movie data • Data from MovieLens • 943 users (among 43,000 users ) • 1682 movies (among over 3,500 different movies) • 100,000 ratings (only considered users that had rated 20 or more movies) • Divided the DB into a training set and a test set. • X=0.8 (80% of the data is used as training set) • Sparsity level: Center for E-Business Technology

Experimental Evaluation: Evaluation Metrics • Statistical accuracy metrics • Mean Absolute Error (MAE) is a measure of the deviation of recommendations from their true user-specified values. • The lower the MAE, the more accurately the recommendation engine predicts user ratings. • Decision support accuracy metrics Center for E-Business Technology

Experimental Results (1) • Effect of Similarity Algorithms Center for E-Business Technology

Experimental Results (2) • Sensitivity of Training/Test Ratio • Experiments with neighborhood size Center for E-Business Technology

Experimental Results (3) • Quality Experiments Center for E-Business Technology

Sensitivity of the Model Size • The High accuracy that can be achieved using only a fraction of items • It is useful to precompute the item similarities using only a fraction of items and yet possible to obtain good prediction quality 100% 96% 98.3% Center for E-Business Technology

Impact of the model size on run-time and throughput Center for E-Business Technology

Contributions • Analysis of the item-based prediction algorithms and identification of different ways to implement its subtasks • Formulation of a precomputed model of item similarity to increase the online scalability of item-based recommendations • An experimental comparison of the quality of several different item-based algorithms to the classic user-based (nearest neighbor) algorithms Center for E-Business Technology

Discussion & Conclusion • Discussion • Item-item scheme provides better quality of predictions than the user-user scheme • Item neighborhood is fairly static, which can be pre-computed, which results in very high online performance • Possible to retain only a small subset of items and produce reasonably good prediction quality • Conclusion • Item-based techniques allow CF-based algorithms to scale to large data sets and at the same time produce high-quality recommendations Center for E-Business Technology

My comments • Lack of explanations about recommendation process • Does the calculated similarity really represent the similarity of items? • Lack of explanations about the range of similarity value • Can’t we precompute the similarity of users? Center for E-Business Technology

References • Amazon.com Recommendations: Item-to-Item Collaborative Filtering http://www.win.tue.nl/~laroyo/2L340/resources/Amazon-Recommendations.pdf • Item-based Collaborative Filtering Recommendation Algorithms http://www.grouplens.org/papers/pdf/www10_sarwar.pdf Center for E-Business Technology

Item-Based Collaborative Filtering Recommendation Algorithms

Item-Based Collaborative Filtering Recommendation Algorithms

Presentation Transcript

Collaborative Filtering

Collaborative Filtering

Collaborative Filtering

Amazon.com Recommendation Item-to-Item Collaborative Filtering

Item Based Collaborative Filtering Recommendation Algorithms

Collaborative Filtering

Collaborative Filtering Recommendation

Clustering-based Collaborative filtering for web page recommendation

Tag-based Contextual Collaborative Filtering

Collaborative Filtering

Collaborative Filtering

Collaborative Filtering

Collaborative Filtering

Collaborative Filtering

Algorithms for Efficient Collaborative Filtering

Item Based Collaborative Filtering Recommendation Algorithms

Collaborative Filtering Recommendation