Collaborative Filtering Techniques for Recommender Systems

CSE 482 Lecture 15 (Collaborative Filtering)

Outline • What is a recommender system? • What is collaborative filtering? • What are the collaborative filtering techniques?

Information Overload

Recommender Systems • Automated systems that make recommendation based on the preference of users • Motivation from User’s Perspective • Lots of online products, books, movies, etc. • Help me narrow the choices available… • Motivation from Business’ Perspective “ If I have 3 million customers on the web, I should have 3 million stores on the web.” CEO of Amazon.com

Book Recommendation

Movie Recommendation

Collaborative Filtering • The technology behind most recommender systems • The process of filtering information by soliciting judgments from others to overcome the information overload problem • "Based on the premise that people looking for information should be able to make use of what others have already found and evaluated." (Maltz & Ehrlich, 1995)

Another Application: Netflix $1M Prize Task Given customer ratings on some movies Predict customer ratings on other movies If John rates “Mission Impossible” a 5 “Over the Hedge” a 3, and “Back to the Future” a 4, how would he rate “Harry Potter”, … ?

Collaborative filtering techniques are used to predict how well a user will like an item that he/she has not rated given a set of historical preference judgments for a community of users. Collaborative Filtering

Technique: Nearest Neighbor • User-Based Nearest Neighbor • Given a user u, generate a prediction for an item i by using the ratings for i from users in u’s neighborhood • Need to define similarity measure and neighborhood size

Technique: Nearest Neighbor • User-Based Nearest Neighbor • Given a user u, generate a prediction for an item i by using the ratings for i from users in u’s neighborhood • Neighbor = users with similar interests Average ratings of neighbor n Average ratings of user u

Technique: Nearest Neighbor • Item-Based Nearest Neighbor • Given a user u, generate a prediction for an item i by using a weighted sum of the user u’s ratings for items that are most similar to i.

Similarity Measure • Numerical measure of how alike two data instances are. • Higher when the instances are more alike • Examples of similarity measures

Jaccard Similarity • Let x and y be a pair of binary 0/1 vectors • Mij: number of elements in which x = i and y = j Jaccard(x, y) = (M11) / (M01 + M10 + M11) • Example Jaccard(John, Mary) = = 0.25

Cosine Similarity • If d1 and d2 are two document vectors, then cos( d1, d2 ) = (d1d2) / ||d1|| ||d2|| , where  indicates vector dot product and || d || is the length of vector d. • Example: d1= 3 2 0 5 0 0 0 2 0 0 d2 = 1 0 0 0 0 0 0 1 0 2 d1d2= 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5 ||d1|| = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5 = (42) 0.5 = 6.481 ||d2|| = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2)0.5= (6) 0.5 = 2.245 cos( d1, d2 ) = .3150

Gaussian Radial Basis Function • Let x and y be the feature vectors for 2 data instances • Example:  = 0.1

Python Example

Technique: Matrix Factorization • Items are not independent and have inherent groupings • Movies can be grouped based on genres • Books can be grouped based on their topic areas • The groups can be treated as “latent” features of the data Given: ratings matrix R (users x items)

Technique: Matrix Factorization • Movie ratings What if genre is not the optimal grouping (since some movies may belong to multiple genres)? Can we automatically find an appropriate grouping of features?

Technique: Matrix Factorization • Given: ratings matrix R (users x items) • Goal is to factorize R into a product of two latent matrices, U and M, such that the following quantity is minimized: • where (R) is the set of non-missing ratings in R

Technique: Matrix Factorization Given: ratings matrix R (users x items) Goal: To decompose matrix R into a product of matrices U and MT (the superscript T denote a matrix transpose operation) that best approximates R T Predicted matrix U  MT user feature matrix U (users  features) item feature matrix M (items  features)  =

Technique: Matrix Factorization • Given: an incomplete matrix R and parameter k • Alternating least-square (ALS) algorithm • Randomly initialize U, M, and the missing values in R • Repeat until convergence • Find M such that ||R – UMT||F is minimized • Find U such that ||RT–MUT||F is minimized • For each missing value in Rij, replace with the corresponding value in (UMT)ij

Example ratings matrix R (users x items) Iteration = 1 U = M = UMT =

Cold-Start Problem • What will you recommend to a new user who has not provided any ratings? • Utilize side information to make the recommendation • Examples: demographic and item content information • How to incorporate side information? • Factorization machines or more generally, User features Item features Cast into a regression problem!

Collaborative Filtering Techniques for Recommender Systems

Collaborative Filtering Techniques for Recommender Systems

Presentation Transcript

HSERV 482 # 14

CSE 482: Big Data Analysis

482 Fighter Wing

Hserv 482 Session 9

Hserv 482 Session 6

GENETICS 482 Guidelines

IE-482 HUMANITARIAN LOGISTICS

IS 482

MIS 482 spring 2012

482 – Lecture 1

KNR 482 Advanced Biomechanics

GENETICS 482 Guidelines

KINE 482 Summarizing (Introduction)

Hserv 482

482

1Z0-482 Preparation Material

1Z0-482 Exam Dumps - Actual 1Z0-482 Dumps PDF

1Z0-482 Exam Dumps | Why 1Z0-482 Dumps Matter in 1Z0-482 Exam Preparation

- 482 -

Visa Subclass 482 | TSS 482 Visa