a content based approach to collaborative filtering
Download
Skip this Video
Download Presentation
A Content-Based Approach to Collaborative Filtering

Loading in 2 Seconds...

play fullscreen
1 / 16

A Content-Based Approach to Collaborative Filtering - PowerPoint PPT Presentation


  • 100 Views
  • Uploaded on

A Content-Based Approach to Collaborative Filtering. Brandon Douthit-Wood CS 470 – Final Presentation. Collaborative Filtering. Method of automating word-of-mouth Large groups of users collaborate by rating products, services, news articles, etc.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Content-Based Approach to Collaborative Filtering' - nedaa


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a content based approach to collaborative filtering

A Content-Based Approach to Collaborative Filtering

Brandon Douthit-Wood

CS 470 – Final Presentation

collaborative filtering
Collaborative Filtering
  • Method of automating word-of-mouth
  • Large groups of users collaborate by rating products, services, news articles, etc.
  • Analyze ratings data of the group to produce recommendations for individual users
    • Find users with similar tastes
problems with collaborative filtering methods
Problems with Collaborative Filtering Methods
  • Performance
    • Prohibitively large dataset
  • Scalability
    • Will the solution scale to millions of users on the Internet?
  • Sparsity of data
    • User who has rated few items
    • Item with few ratings
problems with collaborative filtering methods1
Problems with Collaborative Filtering Methods
  • Cannot compare users that have no common ratings

(Ratings on a scale of 1-5)

a content based approach
A Content-Based Approach
  • Build a feature list for each user based on content of items rated
  • Compare users’ features to make recommendations
  • Now we can find similarity between users with no common ratings
data source
Data Source
  • EachMovie Project
    • Compaq Systems Research Center
    • Over 18 months collected 2,811,983 ratings for 1,628 movies from 72,916 users
    • Ratings given on 1-5 scale
    • Dataset split into 75% training, 25% testing
  • Internet Movie Database (IMDb)
    • Huge database of movie information
      • Actors, director, genre, plot description, etc.
creating the feature list
Creating the Feature List
  • Retrieve content information for each movie from IMDb dataset – create “bag of words”
  • Throw out common words (i.e.: the, and, but)
  • Calculate frequency of remaining words, create movie’s feature list
    • Frequencies weighted based on total number of terms
comparing users
Comparing Users
  • Each user has positive and negative feature list
    • Combine feature lists of movies they have rated
  • Compare user’s feature lists using Pearson Correlation Coefficient
  • Users can be compared with no common ratings
  • Able to recommend items with few ratings
  • Users only need to rate a few items to receive recommendations
methods
Methods
  • Three methods attempted to improve performance:
    • Clustering of users
    • Random groups of users
    • Compare users directly to items
user clustering
User Clustering
  • Simple algorithm, starting with first user:
    • Compare to existing clusters first
      • If similarity is high, merge user into cluster
    • Compare to each remaining user
    • Stop if correlation is above threshold
    • Once a similar user is found, create a new cluster from the two users
      • Cluster has combined feature list of all its users
  • Not as efficient as possible - O(n2)
user clustering1
User Clustering
  • Once clusters are formed, we can predict ratings for each item
    • For each user, find their 10 nearest neighbors
    • Predicted rating is the average rating of item from these neighbors
selecting a random group
Selecting a Random Group
  • Randomly select 5000 users as a (hopefully) representative sample
  • As before, find a user’s 10 nearest neighbors from the random group
    • Predicted rating is the average rating of item from these neighbors
  • Much less work than clustering
    • How much accuracy (if any) will be lost?
comparing users to items
Comparing Users to Items
  • No collaborative filtering involved
  • Compare the positive and negative feature lists of user to feature list of item
    • Make prediction based on which feature list has higher correlation with item
  • Pretty quick and easy to do
    • How accurate will this be?
analyzing predictions
Analyzing Predictions
  • Collected 3 metrics to evaluate predictions
    • Accuracy: all items predicted correctly
    • Precision: positive items predicted correctly
    • Recall: unseen positive items predicted correctly
  • Precision and recall have inverse relationship
conclusions
Conclusions
  • Large gain from clustering users
    • Is the extra work worth it?
    • Depends on the application
  • Purely content-based predictions worked pretty well
    • Simple, fast solution
  • Random group prediction also performed reasonably well
  • Problems solved by content-based analysis:
    • Sparsity of data
    • Performance
    • Scalability
ad