Crawling the Algorithmic Foundations of Recommendation Technologies

Crawling the Algorithmic Foundations of Recommendation Technologies A presentation given in partial fulfillment of the requirements for the degree of Master of Science Manos Papagelis Computer Science Department School of Sciences and Engineering University of Crete & Institute of Computer Science Foundation of Research and Technology, Hellas Heraklion, Greece, March 24, 2005 Email: papaggel@csd.uoc.gr Supervisor Dimitris Plexousakis Associate Professor

Presentation Outline Part I: Recommendation Algorithms Part II: Qualitative Analysis of Prediction Algorithms Part III: Addressing the Scalability Problem Part IV: Addressing the Sparsity Problem Part V: Conclusions

Recommendation Algorithms Part I

Recommendation Algorithms Research Topic Information Retrieval Personalization Social Networks Trust Management Research Areas Motivation & Placement of the Research Topic • Motivation Information Overload Need for Personalization New Items, Books, Journals, Research Papers TV Programs, Music CDs, Movie Titles E-commerce products, Matchmaking and other e-Services Web pages, Usenet Articles, emails • Research Topic Placement

Introduction to Recommendation Systems • Recommendation Systems were developed to address two problems • Overwhelming numbers of on-topic documents • Filtering non-text documents mainly based on rating activity • Formulation of the Recommendation Problem • Estimation of user ratings to not seen items (Predictions) • Recommendation of the top-N predictions • Classification of Recommendation Algorithms • Content-based • Collaborative Filtering • Hybrid • Challenges and Limitations of Collaborative Filtering Methods • Scalability • Sparsity • Cold Start

Qualitative Analysis of Prediction Algorithms Part II

Unfolding the Recommendation Process • Which items to recommend? The list of N items with respect to the top N predictions • How could predictions be achieved? Exploitation of other users’ activity • Which users’ activity to take up? Those who share the same or relevant interests “it may be of benefit to one’s search for information to consult the behavior of other users who share the same or relevant interests” ? ? ? Collaborative Filtering

z u1 u6 u2 y u3 Motivating illustration of user models based on rating activity x u5 u4 Collaborative Filtering (CF) Co-rated Items How similar are they? • - Cosine Vector Similarity • - Spearman Correlation • - Mean-squared Difference • - Entropy-based Uncertainty • Pearson Correlation Coefficient

Rating Activity Explicit Rating: A rating that expresses the preference of a user to a specific item Implicit Rating: Each explicit rating of a user to a specific item “implicitly” identifies the user’s preference to the categories that this item belongs to Example Action Result r Implicit Explicit item user CatA CatC CatB

Similarity Measures • Distinctions • User-based vs. Item-based Similarity • Explicit Rating vs. Implicit Rating • Definition of three matrices • User-Item, User-Category, Item-Category Matrices User-Item Matrix User-Category Matrix Item-Category Bitmap • User-based Similarity derived from • Explicit Ratings (kx,y) • Implicit Ratings (λx,y) • Item-based Similarity derived from • Explicit Ratings (μx,y) • Implicit Ratings (νx,y)

Prediction Algorithms Prediction = Average + Adjustment User-based Prediction algorithms • CFUB-ER: Based on Explicit Ratings • CFUB-ER-CB: Based on Explicit Ratings, Content Boosted • CFUB-IR: Based on Implicit Ratings Item-based Prediction algorithms • CFIB-ER: Based on Explicit Ratings • CFIB-IR: Based on Implicit Ratings

Experimental Evaluation & Results • Data Set • 2100 ratings (range from 1 to 10), 115 users, 650 items, 20 item categories • Sparsity 97% • 300-item sample sets • Accuracy Metrics • Mean Absolute Error (MAE) • Receiver Operating Curve (ROC)

Mean Absolute Error (MAE) We plot MAE vs. Sparsity 1,703 1,385 1, 35 1, 34 0,838

Receiver Operating Curve (ROC) We plot True Positive Fraction vs. False Positive Fraction 0,71 0,59 0,55 0,53 0,39

Addressing the Scalability Problem Part III

The Scalability Challenge Facts • Large numbers of users and items (e.g. Amazon.com) • CF requires expensive computations that grow with the number of items and users Requirements • Need for quick formulation of recommendations • Need for immediate incorporation of new rating information • Need for preservation of CF’s quality

Related Work • Clustering Approaches -[Breese et al. 1998, Ungar and Foster 1998] • Dimensionality Reduction of the User-Item Matrix -[Sarwar et al. 2001] • Data reduction or data focusing techniques -[Yu et al. 2002, Zeng et al. 2003] • Offline Computations -[Linden et al. 2003]

User-to-User Similarities User-to-User similarities User-Item Matrix User-Item Matrix User-Item Matrix Intuitive Rating Process Recommendation Process Classic Collaborative Filtering Compute User-to- User Similarities Request Recommend High Rated Items Find Neighbors Response Recommendation Engine Our Method Request Find Neighbors Recommend High Rated Items Response Recommendation Engine

Incremental Collaborative Filtering (ICF) Key Idea: Incremental computation B, C, D factors after each single rating • Classic Collaborative Filtering (Based on Pearson Correlation) Number of co-rated items between ua and uy Actual Rating of ua and uy to item ih Average rating of ua and uy • Incremental Collaborative Filtering (ICF)

ICF: Cases to be examined in the Rating Process Case 1: Submission of a new rating uy uy ua ua … … ia ia Item ia has not been rated by user uy Item ia has been rated by user uy Case 2: Update of an existing rating uy uy ua ua … … ia ia Item ia has not been rated by user uy Item ia has been rated by user uy

Caching Computation of the factors that appear in increments e, f, g

Complexity Issues m: The number of users n: The number of items m’<<m: The number of users with at least one co-rated item with the active user n’<<n: The number of items that have not been rated by the active user and have been rated by at least one of its similar users n’’<<n: The number of co-rated items between the active user and another user Worst-case and approximation complexities of Classic CF and Incremental CF

Experimental Evaluation of ICF Remarks • Performance-accuracy tradeoff in Classic CF is confirmed • ICF proves to be highly scalable by retaining the best quality of CF • Performance of ICF grows linearly only with the number of items Evaluation metric: Response Time in relation to Accuracy

Addressing the Sparsity Problem Part IV

The Sparsity Challenge Facts • Large number of users and items • Even active users result in rating only a fraction of items in db • It is possible that the similarity between two users cannot be defined • Negative impact on the effectiveness of CF Requirements • Be able to define similarity between two users • Be able to recommend new and obscure items • Be able to recommend items to new users

Related Work • Use of profile information when calculating similarities (e.g. demographic filtering) -[Pazzani 1999] • Dimensionality reduction (e.g. Singular Value Decomposition, Latent Semantic Indexing, Principle Component Analysis) -[Sarwar et al. 2000, Deerwester et al. 1990, Goldberg et al. 2001] • Content-boosted Collaborative Filtering -[Melville et al. 2002] • Item-based similarity -[Sarwar et. al. 2001, Popescul et al. 2001]

Association Rating Activity Social Networks in RS • Underlying Social Networks in Recommendation Systems • Associations based on trust • Trust through user-to-user similarity (Pearson correlation) … Item Space User Space User-Item Matrix

Direct Association Rating Activity Trust Inferences and Paths Trust Inferences • are transitive associations between users in the underlying network • are sources of additional information for recommendation purposes • form trust paths between distant users i1 i2 N S T T N1 S S N2 Trust Inferences Trust Paths Web of Trusts Inferred Association

Confidence Confidence, Uncertainty and Subjectiveness Confidence and Uncertainty in Trust Paths User with the most Co-rated items Number of co-rated Items umax_conf Uncertainty u1 … 1 … … 0.57 S un-1 0.57 Confidence 0.43 u2 umax_conf u1 u2 u3 un-1 Users Confidence Uncertainty Subjectiveness S 0.57 … 0.34 T …

Managing Multiple Paths • Path Composition • Average Composition • Weighted Average Composition • Path Selection • Maximum Path Confidence • Minimum Mean Absolute Deviation TST(pA)=0.44 CST(pA)=0.14 PA T=0.9 C=0.4 n(IN1∩IN2)=5 T=0.2 C=0.5 n(IN2∩IT)=6 T=0.5 C=0.7 n(IS∩IN1)=8 S N1 N2 T T=0.4 C=0.7 n(IS∩IN3)=7 T=0.6 C=0.8 n(IN3∩IT)=3 N3 TST(pB)=0.46 CST(pB)=0.56 PB Illustrating Example

Power-law Distribution of User’s Ratings

Trust Inference Impact

Statistical Accuracy of Our Method (MAE)

Decision-support Accuracy of Our Method (ROC)

Conclusions Part V

Extensions of Recommendation Technologies (1/2) • More Advanced Profiling Techniques • Currently rely on rating information • E.g. data mining rules, sequences, signatures to describe user’s interests • Adoption of advancements in mathematical approximation theory (e.g. radial basis functions) • Multidimensionality of Recommendations • Currently operates on the two-dimensional User-Item space • Need for contextualrecommendations (taking into account time, conditions, etc.) • Multi-criteria Ratings • Need to incorporate ratings for a variety of criteria concerning a single item

Extensions of Recommendation Technologies (2/2) • Non-intrusiveness • Implicit Rating (e.g. time spent in a webpage), HCI issues • Flexibility in Integration of Recommendation Technologies RECOMMEND Movie TO User BASED ON Rating SHOW TOP 3 FROM MovieRecommender WHERE Movie.Length > 120 AND User.City = “Toronto” • Effectiveness of Recommendations • Need for metrics that adequately capture usefulness and quality • Trustworthiness and Online Feedback Mechanisms Issues • Privacy issues

Conclusions and Discussion • Qualitative Analysis of user- and item-based prediction algorithms • Incremental Collaborative Filtering (ICF) to deal with Scalability • Trust Inferences to deal with Sparsity and Cold-start • Roadmap to Future Research Work

Published Work • Papagelis, M. and Plexousakis, D. Recommendation Based Discovery of Dynamic Virtual Communities. In Short Paper Proceedings of the 15th Conference on Advanced Information Systems Engineering, 2003 • Papagelis, M. and Plexousakis, D. Qualitative Analysis of User-based and Item-based Prediction Algorithms for Recommendation Agents. Eighth International Workshop on Cooperative Information Agents, 2004 • Papagelis, M. and Plexousakis, D. Qualitative Analysis of User-based and Item-based Prediction Algorithms for Recommendation Agents. Journal of Engineering Applications of Artificial Intelligence, 18(4), June, 2005 • Papagelis, M., Plexousakis, Kutsuras, T. A method for alleviating the Sparsity Problem in Collaborative Filtering Using Trust Inferences. Proceedings of the 3rd International Conference on Trust Management, 2005 • Papagelis, M., Plexousakis, D., Rousidis, I., and Theoharopoulos, E. Qualitative Analysis of User-based and Item-based Prediction Algorithms for Recommendation Systems. 3rd Hellenic Data Management Symposium, 2004 • Papagelis, M., Rousidis, I., Plexousakis, D., and Theoharopoulos, E. Incremental Collaborative Filtering for Highly-Scalable Recommendation algorithms. 15th International Symposium on Methodologies of Intelligent Systems, 2005

Questions?

Thanks!

Crawling the Algorithmic Foundations of Recommendation Technologies