cobafi collaborative bayesian filtering n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CoBaFi : Collaborative Bayesian Filtering PowerPoint Presentation
Download Presentation
CoBaFi : Collaborative Bayesian Filtering

Loading in 2 Seconds...

play fullscreen
1 / 49

CoBaFi : Collaborative Bayesian Filtering - PowerPoint PPT Presentation


  • 236 Views
  • Uploaded on

CoBaFi : Collaborative Bayesian Filtering. Alex Beutel Joint work with Kenton Murray, Christos Faloutsos , Alex Smola April 9, 2014 – Seoul, South Korea. Online Recommendation. Movies. 5. 5. 2. Users. 5. 3. 5. Online Rating Models. Online Rating Models. Reality.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

CoBaFi : Collaborative Bayesian Filtering


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    cobafi collaborative bayesian filtering

    CoBaFi:Collaborative Bayesian Filtering

    Alex Beutel

    Joint work with Kenton Murray, Christos Faloutsos, Alex Smola

    April 9, 2014 – Seoul, South Korea

    online recommendation
    Online Recommendation

    Movies

    5

    5

    2

    Users

    5

    3

    5

    online rating models 1
    Online Rating Models

    Reality

    Normal Collaborative FilteringFit a Gaussian - Minimize the error

    Minimizing error isn’t good enough -

    Understanding the shape matters!

    online rating models 2
    Online Rating Models

    Normal Collaborative FilteringFit a Gaussian - Minimize the error

    Our Model

    our goals and challenges
    Our Goals and Challenges
    • Given: A matrix of user ratings
    • Find: A model that best fits and predicts user preferences
    • Goals:
      • G1. Fit the recommender distribution
      • G2. Understand users who rate few items
      • G3. Detect abnormal spam behavior
    outline

    1. Background

    Outline

    2. Model Formulation

    3. Inference

    4. Catching Spam

    5. Experiments

    collaborative filtering
    Collaborative Filtering

    [Background]

    Movies

    V

    Users

    X

    U

    Genres

    5 =

    5 =

    outline 1

    1. Background

    Outline

    2. Our Model

    3. Inference

    4. Catching Spam

    5. Experiments

    our model
    Our Model

    Cluster users (& items)

    Share preferences within clusters

    Use user preferences to predict ratings

    the recommender distribution
    The Recommender Distribution

    First introduced by Tan et al, 2013

    Linear

    Normalization

    Quadratic

    Normalization

    θ1 = 0

    Vary θ2

    θ2= 0.4

    θ2= -1.0

    the recommender distribution 1
    The Recommender Distribution

    ui

    Genre Preferences

    General Leaning

    How Polarized

    • Goal 1: Fit the recommender distribution
    finding user preferences
    Finding User Preferences

    μU

    μU’

    • Goal 2: Understand users who rate few items
    outline 2

    1. Background

    Outline

    2. Our Model

    3. Inference

    4. Catching Spam

    5. Experiments

    gibbs sampling clusters
    Gibbs Sampling - Clusters

    [Details]

    Probability of picking a cluster =

    Probability of a cluster based on size (CRP)

    x Probability uiwould come from the cluster

    sampling user parameters
    Sampling user parameters

    [Details]

    Probability of user preferences ui =

    Probability of preferences ui given cluster parameters

    x Probability of predicting ratings ri,jusing new preferences

    Recommender distribution is non-conjugate

    Can’t sample directly!

    outline 3

    1. Background

    Outline

    2. Our Model

    3. Inference

    4. Catching Spam

    5. Experiments

    review spam and fraud
    Review Spam and Fraud

    1

    5

    5

    5

    1

    1

    5

    1

    1

    5

    1

    1

    5

    1

    1

    5

    Image from http://sinovera.deviantart.com/art/Cute-Devil-117932337

    clustering fraudsters
    Clustering Fraudsters

    μ3

    μ1

    μ2

    New Spam Cluster

    Previous “Real” Cluster

    clustering fraudsters 1
    Clustering Fraudsters

    μ3

    μ1

    μ2

    Too much spam – get separated into “fraud” cluster

    Trying to “hide” just means (a) very little spam or (b) camouflage reinforcing realistic reviews.

    clustering fraudsters 2
    Clustering Fraudsters

    μ4

    μ1

    μ3

    μ2

    μ5

    Naïve Spammers

    Spam + Noise

    Hijacked

    Accounts

    • Goal 3: Detect abnormal spam behavior
    outline 4

    1. Background

    Outline

    2. Our Model

    3. Inference

    4. Catching Spam

    5. Experiments

    does it work
    Does it work?

    Better Fit

    catching na ve spammers
    Catching Naïve Spammers

    Injection

    83% are clustered together

    clustered hijacked accounts
    Clustered Hijacked Accounts

    Clustered hijacked accounts

    Clustered “attacked” movies

    Injection

    shape of netflix reviews
    Shape of Netflix reviews

    More Skewed

    More Gaussian

    shape of amazon clothing reviews
    Shape of Amazon Clothing reviews

    Nearly all are heavily polarized!

    shape of amazon electronics reviews
    Shape of Amazon Electronics reviews

    Nearly all are heavily polarized!

    shape of beeradvocate reviews
    Shape of BeerAdvocate reviews

    Nearly all are Gaussian!

    hypotheses on shape of data
    Hypotheses on shape of data

    vs.

    • Hard to evaluate beyond binary
    • Selection bias – Only committed viewers watch Season 4 of a TV series
    • Hard to compare value across very different items.
      • Lots of beers and movies to compare
      • Fewer TV shows
      • Even fewer jeans or hard drives
    key points
    Key Points
    • Modeling: Fit real data with flexible recommender distribution
    • Prediction: Predict user preferences
    • Anomaly Detection: When does a user not match the normal model?
    questions
    Questions?

    Alex Beutel

    abeutel@cs.cmu.edu

    http://alexbeutel.com

    sampling cluster parameters
    Sampling Cluster Parameters

    μα

    Hyperparametersμα, λα, Wα, ν

    μa

    Priors on μα, λα, Wα

    u5

    u6

    gibbs sampling clusters 1
    Gibbs Sampling - Clusters

    [Details]

    Probability uiwould be sampled from cluster a

    Probability of a cluster (CRP)

    sampling user parameters 1
    Sampling user parameters

    [Details]

    Probability of uigiven cluster parameters

    Probability of predicting ratings ri,j

    Recommender distribution is non-conjugate

    Can’t sample directly!

    Use a Laplace approximation and perform Metropolis-Hastings Sampling

    sampling user parameters 2
    Sampling user parameters

    [Details]

    Use candidate normal distribution

    Mode of p(ui)

    “Variance” of p(ui)

    Metropolis-Hastings Sampling:

    Sample

    Keep new with probability

    sampling cluster parameters 1
    Sampling Cluster Parameters

    [Details]

    Users/Items in the cluster

    Priors

    inferring hyperparameters
    Inferring Hyperparameters

    [Details]

    Solved directly – no sampling needed!

    Prior hidden as additional cluster

    does metropolis hasting work
    Does Metropolis Hasting work?
    • Have to use non-standard sampling procedure:
      • 99.12% acceptance rate for Amazon Electronics
      • 77.77% acceptance rate for Netflix 24k
    does it work 1
    Does it work?

    Compare on Predictive Probability (PP) to see how well our model fits the data

    handling spammers
    Handling Spammers

    Random naïve spammers in Amazon Electronics dataset

    Random hijacked accounts in Netflix 24k dataset

    clustered na ve spammers
    Clustered Naïve Spammers

    83% are clustered together

    clustered hijacked accounts 1
    Clustered Hijacked Accounts

    Clustered hijacked accounts

    Clustered “attacked” movies