CoBaFi : Collaborative Bayesian Filtering

CoBaFi:Collaborative Bayesian Filtering Alex Beutel Joint work with Kenton Murray, Christos Faloutsos, Alex Smola April 9, 2014 – Seoul, South Korea

Online Recommendation Movies 5 5 2 Users 5 3 5

Online Rating Models

Online Rating Models Reality Normal Collaborative FilteringFit a Gaussian - Minimize the error Minimizing error isn’t good enough - Understanding the shape matters!

Online Rating Models Normal Collaborative FilteringFit a Gaussian - Minimize the error Our Model

Our Goals and Challenges • Given: A matrix of user ratings • Find: A model that best fits and predicts user preferences • Goals: • G1. Fit the recommender distribution • G2. Understand users who rate few items • G3. Detect abnormal spam behavior

1. Background Outline 2. Model Formulation 3. Inference 4. Catching Spam 5. Experiments

Collaborative Filtering [Background] Movies V Users X U ≈ Genres 5 = 5 =

Bayesian Probabilistic Matrix Factorization (Salakhutdinov& Mnih, ICML 2008) μU ~ [Background] …

1. Background Outline 2. Our Model 3. Inference 4. Catching Spam 5. Experiments

Our Model Cluster users (& items) Share preferences within clusters Use user preferences to predict ratings

The Recommender Distribution First introduced by Tan et al, 2013 Linear Normalization Quadratic Normalization θ1 = 0 Vary θ2 θ2= 0.4 θ2= -1.0

The Recommender Distribution ui Genre Preferences General Leaning How Polarized • Goal 1: Fit the recommender distribution

Understanding varying preferences 5 2 5 3 1 1 5

Resulting Co-clustering V U

Finding User Preferences μU μU’ • Goal 2: Understand users who rate few items

Chinese Restaurant Process μ1 μ3 μ2

Gibbs Sampling - Clusters [Details] Probability of picking a cluster = Probability of a cluster based on size (CRP) x Probability uiwould come from the cluster

Sampling user parameters [Details] Probability of user preferences ui = Probability of preferences ui given cluster parameters x Probability of predicting ratings ri,jusing new preferences Recommender distribution is non-conjugate Can’t sample directly!

Review Spam and Fraud 1 5 5 5 1 1 5 1 1 5 1 1 5 1 1 5 Image from http://sinovera.deviantart.com/art/Cute-Devil-117932337

Clustering Fraudsters μ3 μ1 μ2 New Spam Cluster Previous “Real” Cluster

Clustering Fraudsters μ3 μ1 μ2 Too much spam – get separated into “fraud” cluster Trying to “hide” just means (a) very little spam or (b) camouflage reinforcing realistic reviews.

Clustering Fraudsters μ4 μ1 μ3 μ2 μ5 Naïve Spammers Spam + Noise Hijacked Accounts • Goal 3: Detect abnormal spam behavior

Does it work? Better Fit

Catching Naïve Spammers Injection 83% are clustered together

Clustered Hijacked Accounts Clustered hijacked accounts Clustered “attacked” movies Injection

Real world clusters

Shape of real world data

Shape of Netflix reviews More Skewed More Gaussian

Shape of Amazon Clothing reviews Nearly all are heavily polarized!

Shape of Amazon Electronics reviews Nearly all are heavily polarized!

Shape of BeerAdvocate reviews Nearly all are Gaussian!

Hypotheses on shape of data vs. • Hard to evaluate beyond binary • Selection bias – Only committed viewers watch Season 4 of a TV series • Hard to compare value across very different items. • Lots of beers and movies to compare • Fewer TV shows • Even fewer jeans or hard drives

Key Points • Modeling: Fit real data with flexible recommender distribution • Prediction: Predict user preferences • Anomaly Detection: When does a user not match the normal model?

Questions? Alex Beutel abeutel@cs.cmu.edu http://alexbeutel.com

Sampling Cluster Parameters μα Hyperparametersμα, λα, Wα, ν μa Priors on μα, λα, Wα u5 u6

Gibbs Sampling - Clusters [Details] Probability uiwould be sampled from cluster a Probability of a cluster (CRP)

Sampling user parameters [Details] Probability of uigiven cluster parameters Probability of predicting ratings ri,j Recommender distribution is non-conjugate Can’t sample directly! Use a Laplace approximation and perform Metropolis-Hastings Sampling

Sampling user parameters [Details] Use candidate normal distribution Mode of p(ui) “Variance” of p(ui) Metropolis-Hastings Sampling: Sample Keep new with probability

Sampling Cluster Parameters [Details] Users/Items in the cluster Priors

Inferring Hyperparameters [Details] Solved directly – no sampling needed! Prior hidden as additional cluster

Does Metropolis Hasting work? • Have to use non-standard sampling procedure: • 99.12% acceptance rate for Amazon Electronics • 77.77% acceptance rate for Netflix 24k

Does it work? Compare on Predictive Probability (PP) to see how well our model fits the data

Handling Spammers Random naïve spammers in Amazon Electronics dataset Random hijacked accounts in Netflix 24k dataset

Clustered Naïve Spammers 83% are clustered together

Clustered Hijacked Accounts Clustered hijacked accounts Clustered “attacked” movies

CoBaFi : Collaborative Bayesian Filtering