a shot at netflix challenge l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A shot at Netflix Challenge PowerPoint Presentation
Download Presentation
A shot at Netflix Challenge

Loading in 2 Seconds...

play fullscreen
1 / 5

A shot at Netflix Challenge - PowerPoint PPT Presentation


  • 112 Views
  • Uploaded on

A shot at Netflix Challenge. Hybrid Recommendation System Priyank Chodisetti. Problem and Approach. A data set of 240,000 users and their ratings for 17770 movies is provided. Given a user ‘p’ and movie ‘m’, we should predict how much the ‘p’ will rate the movie ‘m’

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A shot at Netflix Challenge' - dalia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a shot at netflix challenge

A shot at Netflix Challenge

Hybrid Recommendation System

Priyank Chodisetti

problem and approach
Problem and Approach
  • A data set of 240,000 users and their ratings for 17770 movies is provided.
  • Given a user ‘p’ and movie ‘m’, we should predict how much the ‘p’ will rate the movie ‘m’
  • My Idea: Take entirely two different approaches and merge those results.
  • Applied Latent Semantic Analysis and Collaborative filtering techniques on the dataset independently
  • Through LSI, Mapped the dataset to lower dimensional space and tried to extract relation between different movies
  • Through Collaborative filtering, tried to find the user tastes by comparing with other similar users
  • Major Problems:
    • Computationally large, for example one soultion of mine ran for 14 hrs with most diappointing results
    • Matrix is Sparse for almost ~99 and hence ~99% missing values
handling major problems
Handling Major Problems
  • Generally missing values are handled by taking the average rating given by the user or overall average rating of all users. But I believe that, $1,000,000 winner will be the one who handles the missing values well.
  • Adopted method described in [2] which aptly fits in the current situtation.
  • LSI:
    • Apply SVD on the Matrix, retain the first ‘k’ higher singular values. It gives us the space in ‘k’ dimensions or best ‘k’ rank approximation
    • But to How Many Dimensions?? Experiment
    • To make a prediction for person p's rating for movie m, we would take the mth row of U, matrix multiply with S, and matrix multiply that with the pth column of V(t)
  • Collaborative Filtering:
    • Find the kNN and come out with predicted rating.
    • If we consider Euclidean distance as distance measure, we have 17770 dimensions. So consider Pearson Co-efficient
implementation
Implementation
  • Mixing LSI and Collaborative Filtering
    • Find kNN in reduced dimension space, and consider euclidean distance as the distance measure.
  • Used SVDLIBC which used Lanczo method for Singular Value Decomposition
  • Computational Challenges:
    • All the files in the training set are converted into one single larget file, so as to reduce disk access and increase the response time
    • Converted the whole data into sparse text format
    • Also generated a large data set, which gives in terms of user: movie, his rating format in contrast to given movie: user, his rating format
    • Using C++
  • Future Extensions this Winter
    • Plans to implement General Hebbian Algorithm, so as to reduce the computation time and will be easier to handle missing values.
    • Interested and motivated friends can join me this winter
references
References
  • M Brand. Fast Online SVD revisions for lightweight recommender systems. In Proc. SIAM International Conference on Data Mining. 2003
  • M. W. Berry. Incremental Singular Value Decomposition of uncertain data. In Proceedings, European conference on the SIGIR. ACM. 1999
  • B. Sarwar, G. Karypis, J.Konstan, and J.Riedi. Application of Dimensionality Reduction in recommender System - a case study. In ACM WebKDD Workshop, 2000