320 likes | 334 Views
CSE 482. Lecture 15 (Collaborative Filtering). Outline. What is a recommender system? What is collaborative filtering? What are the collaborative filtering techniques?. Information Overload. Recommender Systems. Automated systems that make recommendation based on the preference of users
E N D
CSE 482 Lecture 15 (Collaborative Filtering)
Outline • What is a recommender system? • What is collaborative filtering? • What are the collaborative filtering techniques?
Recommender Systems • Automated systems that make recommendation based on the preference of users • Motivation from User’s Perspective • Lots of online products, books, movies, etc. • Help me narrow the choices available… • Motivation from Business’ Perspective “ If I have 3 million customers on the web, I should have 3 million stores on the web.” CEO of Amazon.com
Collaborative Filtering • The technology behind most recommender systems • The process of filtering information by soliciting judgments from others to overcome the information overload problem • "Based on the premise that people looking for information should be able to make use of what others have already found and evaluated." (Maltz & Ehrlich, 1995)
Another Application: Netflix $1M Prize Task Given customer ratings on some movies Predict customer ratings on other movies If John rates “Mission Impossible” a 5 “Over the Hedge” a 3, and “Back to the Future” a 4, how would he rate “Harry Potter”, … ?
Collaborative filtering techniques are used to predict how well a user will like an item that he/she has not rated given a set of historical preference judgments for a community of users. Collaborative Filtering
Technique: Nearest Neighbor • User-Based Nearest Neighbor • Given a user u, generate a prediction for an item i by using the ratings for i from users in u’s neighborhood • Need to define similarity measure and neighborhood size
Technique: Nearest Neighbor • User-Based Nearest Neighbor • Given a user u, generate a prediction for an item i by using the ratings for i from users in u’s neighborhood • Neighbor = users with similar interests Average ratings of neighbor n Average ratings of user u
Technique: Nearest Neighbor • Item-Based Nearest Neighbor • Given a user u, generate a prediction for an item i by using a weighted sum of the user u’s ratings for items that are most similar to i.
Technique: Nearest Neighbor • Item-Based Nearest Neighbor • Given a user u, generate a prediction for an item i by using a weighted sum of the user u’s ratings for items that are most similar to i.
Similarity Measure • Numerical measure of how alike two data instances are. • Higher when the instances are more alike • Examples of similarity measures
Jaccard Similarity • Let x and y be a pair of binary 0/1 vectors • Mij: number of elements in which x = i and y = j Jaccard(x, y) = (M11) / (M01 + M10 + M11) • Example Jaccard(John, Mary) = = 0.25
Cosine Similarity • If d1 and d2 are two document vectors, then cos( d1, d2 ) = (d1d2) / ||d1|| ||d2|| , where indicates vector dot product and || d || is the length of vector d. • Example: d1= 3 2 0 5 0 0 0 2 0 0 d2 = 1 0 0 0 0 0 0 1 0 2 d1d2= 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5 ||d1|| = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5 = (42) 0.5 = 6.481 ||d2|| = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2)0.5= (6) 0.5 = 2.245 cos( d1, d2 ) = .3150
Gaussian Radial Basis Function • Let x and y be the feature vectors for 2 data instances • Example: = 0.1
Technique: Matrix Factorization • Items are not independent and have inherent groupings • Movies can be grouped based on genres • Books can be grouped based on their topic areas • The groups can be treated as “latent” features of the data Given: ratings matrix R (users x items)
Technique: Matrix Factorization • Movie ratings What if genre is not the optimal grouping (since some movies may belong to multiple genres)? Can we automatically find an appropriate grouping of features?
Technique: Matrix Factorization • Given: ratings matrix R (users x items) • Goal is to factorize R into a product of two latent matrices, U and M, such that the following quantity is minimized: • where (R) is the set of non-missing ratings in R
Technique: Matrix Factorization Given: ratings matrix R (users x items) Goal: To decompose matrix R into a product of matrices U and MT (the superscript T denote a matrix transpose operation) that best approximates R T Predicted matrix U MT user feature matrix U (users features) item feature matrix M (items features) =
Technique: Matrix Factorization • Given: an incomplete matrix R and parameter k • Alternating least-square (ALS) algorithm • Randomly initialize U, M, and the missing values in R • Repeat until convergence • Find M such that ||R – UMT||F is minimized • Find U such that ||RT–MUT||F is minimized • For each missing value in Rij, replace with the corresponding value in (UMT)ij
Example ratings matrix R (users x items) Iteration = 1 U = M = UMT =
Example ratings matrix R (users x items) Iteration = 50 U = M = UMT =
Example ratings matrix R (users x items) Iteration = 100 U = M = UMT =
Example ratings matrix R (users x items) Iteration = 200 U = M = UMT =
Example ratings matrix R (users x items) Iteration = 500 U = M = UMT =
Cold-Start Problem • What will you recommend to a new user who has not provided any ratings? • Utilize side information to make the recommendation • Examples: demographic and item content information • How to incorporate side information? • Factorization machines or more generally, User features Item features Cast into a regression problem!