Download Presentation
A shot at Netflix Challenge

Loading in 2 Seconds...

1 / 5

A shot at Netflix Challenge - PowerPoint PPT Presentation

A shot at Netflix Challenge. Hybrid Recommendation System Priyank Chodisetti. Problem and Approach. A data set of 240,000 users and their ratings for 17770 movies is provided. Given a user ‘p’ and movie ‘m’, we should predict how much the ‘p’ will rate the movie ‘m’

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'A shot at Netflix Challenge' - dalia

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

A shot at Netflix Challenge

Hybrid Recommendation System

Priyank Chodisetti

Problem and Approach
• A data set of 240,000 users and their ratings for 17770 movies is provided.
• Given a user ‘p’ and movie ‘m’, we should predict how much the ‘p’ will rate the movie ‘m’
• My Idea: Take entirely two different approaches and merge those results.
• Applied Latent Semantic Analysis and Collaborative filtering techniques on the dataset independently
• Through LSI, Mapped the dataset to lower dimensional space and tried to extract relation between different movies
• Through Collaborative filtering, tried to find the user tastes by comparing with other similar users
• Major Problems:
• Computationally large, for example one soultion of mine ran for 14 hrs with most diappointing results
• Matrix is Sparse for almost ~99 and hence ~99% missing values
Handling Major Problems
• Generally missing values are handled by taking the average rating given by the user or overall average rating of all users. But I believe that, \$1,000,000 winner will be the one who handles the missing values well.
• Adopted method described in [2] which aptly fits in the current situtation.
• LSI:
• Apply SVD on the Matrix, retain the first ‘k’ higher singular values. It gives us the space in ‘k’ dimensions or best ‘k’ rank approximation
• But to How Many Dimensions?? Experiment
• To make a prediction for person p's rating for movie m, we would take the mth row of U, matrix multiply with S, and matrix multiply that with the pth column of V(t)
• Collaborative Filtering:
• Find the kNN and come out with predicted rating.
• If we consider Euclidean distance as distance measure, we have 17770 dimensions. So consider Pearson Co-efficient
Implementation
• Mixing LSI and Collaborative Filtering
• Find kNN in reduced dimension space, and consider euclidean distance as the distance measure.
• Used SVDLIBC which used Lanczo method for Singular Value Decomposition
• Computational Challenges:
• All the files in the training set are converted into one single larget file, so as to reduce disk access and increase the response time
• Converted the whole data into sparse text format
• Also generated a large data set, which gives in terms of user: movie, his rating format in contrast to given movie: user, his rating format
• Using C++
• Future Extensions this Winter
• Plans to implement General Hebbian Algorithm, so as to reduce the computation time and will be easier to handle missing values.
• Interested and motivated friends can join me this winter
References
• M Brand. Fast Online SVD revisions for lightweight recommender systems. In Proc. SIAM International Conference on Data Mining. 2003
• M. W. Berry. Incremental Singular Value Decomposition of uncertain data. In Proceedings, European conference on the SIGIR. ACM. 1999
• B. Sarwar, G. Karypis, J.Konstan, and J.Riedi. Application of Dimensionality Reduction in recommender System - a case study. In ACM WebKDD Workshop, 2000