Collaborative filtering 101
1 / 37

Collaborative Filtering 101 - PowerPoint PPT Presentation

  • Updated On :

Collaborative Filtering 101. Adnan Masood About Me aka. Shameless Self Promotion. Sr. Software Engineer / Tech Lead for Green Dot Corp. (Financial Institution) Design and Develop Connected Systems

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Collaborative Filtering 101' - odina

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Collaborative filtering 101 l.jpg

Collaborative Filtering 101

Adnan Masood

About me aka shameless self promotion l.jpg
About Meaka. Shameless Self Promotion

  • Sr. Software Engineer / Tech Lead for Green Dot Corp. (Financial Institution)

  • Design and Develop Connected Systems

  • Involved with SoCal Dev community, co-founded San Gabriel Valley .NET Developers Group. Published author and speaker.

  • MS. Computer Science, MCPD (Enterprise Developer), MCT, MCSD.NET

  • Doctoral Student - Areas of Interest: Machine learning, Bayesian Inference, Data Mining, Collaborative Filtering, Recommender Systems.

  • Contact at [email protected]

  • Read my Blog at

  • Doing a session in IASA 2008 in San Francisco on Aspect Oriented Programming; for details visit

Agenda what this presentation covers l.jpg
AgendaWhat this Presentation Covers?

  • Defines Collaborative Filtering and it’s use in Recommendation Systems.

  • Background and Current State of the Applications on Collaborative Filtering Algorithms and their Feature set.

  • Illustrative implementation of the Algorithms with example.

  • Results on the large dataset via different Algorithms.

  • Recommendations on what to use when doing collaborative filtering on large scale dataset.

  • Overview of SQL Server BI and Prediction Engine

What is collaborative filtering and what problem does it solve l.jpg
What is Collaborative Filtering and What problem does it solve?

  • Collaborative filtering simply means that people collaborate to help one another perform filtering by recording their reactions to documents they read. Such reactions may be that a document was particularly interesting (or particularly uninteresting). These reactions, more generally called annotations, can be accessed by others’ filters.” -Communications of the ACM – Dec. 1992

  • Collaborative Filtering (CF) finds items of interest to a user based on the preferences of other similar users. Assumes that human behavior is predictable.

  • Recommender Systems (or recommenders) suggest items of interest based on a user’s preferences, behavior and information about the items themselves -Recommenders Everywhere – WikiSym ’07, ACM

  • With the large amounts of data generated in the e-commerce systems, the classical methods of recommendation are insufficient and cannot handle information overload. The modern automated recommendation systems are built using Collaborative filtering to help dealing with large scale datasets.

  • Information overload problem - 20K movies Netflix, 250K songs on Yahoo Music, Total number of books on Amazon?

  • First ACM Recommender System Conference in October 19-20, 2007 -- Minneapolis, Minnesota, USA by SIGCHI

Types of recommendation systems l.jpg
Types of Recommendation Systems solve?

  • Recommender systems use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices [Resnick and Varian 1997].

Applications l.jpg
Applications solve?

  • Search

  • Social Networking

  • Product Recommendations

  • Demographic Targeted Advertisements

  • Fraud Detection

    • Pattern Detection / Clustering

  • Security

    • Firewall outlier analysis

    • Text Mining Outliers

Major challenges in recommender system design l.jpg
Major Challenges in Recommender System Design solve?

  • Scalability

  • Real-time Analysis and Prediction

  • Performance

  • Accuracy

  • Robustness

  • Growing Area of Research in KDD, Machine Learning and AI

Issues and future research directions l.jpg
Issues and Future solve?Research Directions

  • K-NN Optimization

  • Explainability(D. Billsus and M. Pazzani, “A Personal News Agent that Talks, Learns and Explains,” Proc. Third Ann. Conf. Autonomous Agents, 1999.)

  • Hybrid Algorithms between Memory based and Model based techniques. [Pennock, David M. and Horvitz, Eric 1999]

  • Cold Start Problems (A.I. Schein, A. Popescul, L.H. Ungar, and D.M. Pennock, “Methods and Metrics for Cold-Start Recommendations,” Proc. 25th Ann. Int’l ACM SIGIR Conf., 2002.)

  • Privacy (N. Ramakrishnan, B.J. Keller, B.J. Mirza, A.Y. Grama, and G. Karypis, “Privacy Risks in Recommender Systems,” IEEE Internet Computing, vol. 5, no. 6, pp. 54-62, Nov./Dec. 2001.)

  • Error Method with Look Ahead

  • Boltzman Machines

  • Vertical Niche Markets

Popular recommendation systems l.jpg
Popular Recommendation Systems solve?

  • Lotus Notes [Turnbull, 1998]

  • Mosaic system [Turnbull,1997]

  • PHOAKS (People Helping One Another Know Stuff) [Terveen et. al, 1997]

  • Pointers [Maltz, 1995]

  • Siteseer [Turnbull, 1997]

  • Tapestry [Goldberg, 1992].

  • Yahoo [Turnbull, 1998]

  • The WebWatcher system [Joachims, 1996]

  • Do-I-Care [Turnbull, 1998; Collaborative Filtering workshop, 1996]

  • Fab recommendation system [Turnbull, 1998]

  • Firefly [Turnbull, 1997 and 1998]

  • GAB (group asynchronous browsing) [Wittenburg, et. al., 1998]

  • Grassroots system [Turnbull, 1998]

  • Resnick [Resnick, et al. 1994]

  • Let's browse/ Letizia, [Lieberman, 1996; Pryor, 1998]

Classification of collaborative filtering algorithms l.jpg
Classification of Collaborative Filtering Algorithms solve?

  • A popular classification of CF algorithms was proposed by Breese et al (Convergent algorithms for collaborative filtering, Proceedings of the 4th ACM conference on Electronic commerce) into Memory-based and Model-based methods.

  • Memory-Based methods work on the principal of aggregating the labeled data and attempt to match recommenders to those seeking recommendations. Most common memory-based methods works are based on the notion of nearest neighbor, using a variety of distance metrics.

    • Use the entire database of user ratings to make predictions.

    • Find users with similar voting histories to the active user.

    • Use these users’ votes to predict ratings for products not voted on by the active user

  • Model-based Methods, on the other hand, try to learn a compact model from the training data, for example learn parameters of a para-metric posterior distribution. From an operational point of view, memory-based methods potentially work with the entire training set and scale linearly with the amount of training data, while model-based methods are constant time.

    • Construct a model from the vote database.

    • Use the model to predict the active user’s ratings

Classification of collaborative filtering algorithms16 l.jpg
Classification of Collaborative Filtering Algorithms solve?

  • Memory-based Algorithm and Model-based Algorithms. (Breese,,1998)

  • Memory-based Algorithms

    • Mean Squared Differences

    • Pearson Correlation (Neighborhood based interpolation k-NN)

    • Vector Similarity

  • Model-Based Algorithms

    • Bayesian Network Models:

    • Neural Network Models (Boltzman Machines)

  • Other / Hybrid Algorithms

    • A hybrid memory- and model-based approach [Pennock, David M. and Horvitz, Eric 1999]

    • Singular Value Decomposition (SVD)

    • Probabilistic Latent Semantic Analysis

Algorithms and their performance l.jpg
Algorithms and their Performance solve?

Reference: The Netflix Prize by James Bennett Stan Lanning, KDDCup’07, August 12, 2007, San Jose, California, USA.

Data sets l.jpg
Data Sets solve?

Netflix Database

  • There are 17770 movies.

  • There are 480189 users.

  • ustomerIDs range from 1 to 2649429, with gaps.

  • Ratings are on a five star (integral) scale from 1 to 5.

  • YearOfRelease range from 1890 to 2005.

  • Training set consists of 100 million records. Qualifying dataset size is 2817131. It contains from 1-9999 movies ids. Prediction needs to be submitted on this dataset.

  • Probe dataset size is 1408395. It contains from 1-9999 movies ids. This dataset is meant to be used for checking the rmse before proceeding for qualifying dataset prediction.

  • Download Link

MovieLens Database

  • DataSet 1 Consists of 100,000 ratings for 1682 movies by 943 users.

  • The second one consists of approximately 1 million ratings for 3900 movies by 6040 users

  • Download Link:

  • UCIrvine Datasets

Experiment details and methodologies l.jpg
Experiment Details and Methodologies solve?

  • Hardware

    • Cluster of 3 P-IV Machines with ~2 GB RAM along with a remote desktop laptop (controller)

    • ~ 1TB Storage (with backups)

  • DataSet

    • Netflix DataSet

    • Netflix provides a large movie rating dataset consisting of over 100 million ratings (and their dates) from approximately 480,000 randomly-chosen users and 18,000 movies. The data were collected between October, 1998 and December, 2005 and represent the distribution of all ratings Netflix obtained during this time period. Given this dataset, the task is to predict the actual ratings of over 3 million unseen ratings from these same users over the same set of movies”. [Yew Jin Lim and Yee Whye The, “Variational Bayesian Approach to Movie Rating Prediction”, KDDCup.07 August 12, 2007, San Jose, California, USA]

  • Benchmarking

    • Matrix Calculated on time and accuracy (RMSE) results.

Averages and mean statistics l.jpg
Averages and Mean Statistics solve?

Reference: Gillic et al, 2006 – Stanford University

K nearest neighbor how does it work l.jpg
K-Nearest Neighbor solve?How does it work?

The technique uses individual user distributions to measure distance between users, then makes predictions r(ui, mj) based on the ratings given mjby users near ui. The intuition here is that if many users rate two movies the same, the movies should be considered similar. Conversely, if many users rate two movies differently, the movies should be considered different. (Don Gillick, UC Berkley)

Calculate the "similarity" between each user by comparing how each user has rated common content. If Frank has rated something 4/5 stars and Jane has also rated it 4/5 stars, then these users would be considered similar. These calculations are very time consuming as it essentially becomes the "handshake" problem. I.e. the calculation has to be performed for each unique combination of users. The number of unique combinations is: n (n - 1) / 2. For the Netflix challenge, the number of unique combinations is 115,290,497,766...yes that's 115 billion

  • 1. Monsters

  • 2. Shrek (Full-screen)

  • 3. Shrek 2

  • 4. LOTR: The Two Towers

  • 5. Pirates of the

  • Caribbean: The Curse

  • of the Black Pearl

  • 6. The Incredibles

  • 7. The Sixth Sense

  • 8. The Shawshank

  • Redemption: Special

  • Edition

  • 9. LOTR: The Fellowship

  • of the Ring

  • 10. Forrest Gump

  • 1. LOTR: The Two Towers

  • 2. LOTR: The Return of the King

  • 3. LOTR: The Fellowship of the

  • Ring: Extended Edition

  • 4. LOTR: The Two Towers:

  • Extended Edition

  • 5. Raiders of the Lost Ark

  • 6. LOTR: The Return of the

  • King: Extended Edition

  • 7. Pirates of the Caribbean:

  • The Curse of the Black Pearl

  • 8. The Matrix

  • 9. The Shawshank Redemption: Special Edition

  • 10. Braveheart

K nearest neighbor how does it work23 l.jpg
K-Nearest Neighbor solve?How does it work?

K nearest neighbor how does it work an example l.jpg
K-Nearest Neighbor solve?How does it work? An Example

Step 1: Content bases survey classification.

Now the new user rates a new movie for X1 = 3 for X2 = 7. Without another expensive survey, can we guess what the classification of this new movie is?

1. Determine parameter K = number of nearest neighbors

Suppose use K = 3

K nearest neighbor how does it work an example cont l.jpg
K-Nearest Neighbor solve?How does it work? An Example (cont.)

Step 2: Calculate the distance between the query-instance and all the training samples

Coordinate of query instance is (3, 7), instead of calculating the distance we compute square distance which is faster to calculate (without square root)

K nearest neighbor how does it work an example cont26 l.jpg
K-Nearest Neighbor solve?How does it work? An Example (cont.)

  • Step 3. Sort the distance and determine nearest neighbors based on the K-th minimum distance

K nearest neighbor how does it work an example cont27 l.jpg
K-Nearest Neighbor solve?How does it work? An Example (cont.)

Step 4. Gather the category of the nearest neighbors. Notice in the second row last column that the category of nearest neighbor (Y) is not included because the rank of this data is more than 3 (=K).

Step 5. Use simple majority of the category of nearest neighbors as the prediction value of the query instance. We have 2 “Not likely to be seen by an action fan” and 1 “Likely to be seen by an action fan”, since 2>1 then we conclude that a new movie with X1 = 3 and X2 = 7 is included in Former category.

Singular value decomposition svd l.jpg
Singular Value Decomposition (SVD) solve?

  • The user rating vectors can be represented by a mn matrix A, with m users and n products, where is the rating of user for product . [Qu & Yang, 2000]

  • Through singular value decomposition, A can by factored into USVT , where U and V are orthogonal matrices and the S is a zero matrix, except for the diagonal entries which are defined as the singular value of A.

  • U is representative of the response of each user to certain features.

  • V is representative of the amount of each feature present in each product.

  • S is a matrix related to the feature importance in overall determination of the rating. The S matrix is a zero matrix, except for the diagonal entries which are defined as the singular values of A [Pryor, H. Michael,1998]

How does svd work an example for inner workings of the algorithm l.jpg
How does SVD work? solve?An Example for inner workings of the Algorithm

  • Movies

  • Pulp Fiction: The movie has excellent cinematic value and storyline but has long dialogues and conversation sequences.

  • From Dusk Till Dawn: The movie has lots of action, decent storyline and gets to the point fairly quick but isn't a cinematic magic.

  • The Big Lebowski: Low budget but with excellent dialogues and quite artistic niche. Not the best cinema work and continuity.

  • Children of Men: Excellent cinematography but rather long story line, sometimes not keeping the user captivated. Not of artistic value.

  • Reviewer

  • Andrea the Action fan - likes action, short and well put together movies. Long stories artsy stuff does not typically attract her but always appreciates good cinematography.

  • Arthur the Art Lover - Loves niche movies but also appreciates action; does not mind long movies as long as they have good artistic value.

  • Dave the director - A film school graduate who loves action, good camera work, story line and dialogues. Not a big art fan.

  • Jim the average movie guy - Likes action and thrillers but detest long movies.

How does svd work l.jpg
How does SVD work? solve?

The Reviewer – Movie - Rating Matrix

1 2 3 4A 5 4 2 6

B 3 7 5 2

C 6 4 1 4

D ? ? ? ?

A = U * W * V^T

How does svd work predicting what a new user would like l.jpg
How does SVD work? solve?Predicting what a new user would like

W is the main component for Principle components and identifies

14.49 0.00 0.00 0.00

0.00 4.93 0.00 0.00

0.00 0.00 1.65 0.00

  • Now imagine that Jim rated the first movie 2


    2 = U41 S11 V11

    We solve for U1. To predict R 2 R 3 R 4 , & we substitute U1 into the above equation we get.

    P = [2 2.1554 1.1577 1.7312]

  • Now he has rated the second movie 7

    R1 = U41 S11 V11 + U42 S22 V12

    R2 = U41 S11 V21 + U42 S22 V22

    By solving for bothU1 andU2 , we can recalculate the predictions.

    P = [2 7 5.3660 1.0166]

    Similar to B [3 7 5 2]

Recommendations for large scale recommender systems l.jpg
Recommendations for Large Scale Recommender Systems solve?

  • There is no silver-bullet. The BellKor solution to the Netflix Prize used modified k-NN and the final solution (RMSE=0.8712) consists of blending 107 individual results.

  • Occam’s Razor – Simplicity is good on smaller scale.

  • Algorithms Performance on Accuracy (low to high)

    • Averages, Bayesian, Multinominal Distribution (Co-Variance), k-NN (Pearson Correlation), Singular Value Decomposition, Specialized Hybrid Techniques

  • Algorithms Performance on Time-Space (low to high)

    • Averages, Singular Value Decomposition, Specialized Hybrid Techniques, Multinominal Distribution (Co-Variance), Bayesian, k-NN (Pearson Correlation),

  • Algorithms Performance on Scalability (low to high)

    • Averages, k-NN (Pearson Correlation), Multinominal Distribution (Co-Variance), Specialized Hybrid Techniques, Bayesian, Singular Value Decomposition

  • Perform offline processing and cache the results regardless for maximum performance and scalability.

  • Build hybrid design to support the cold-start, privacy and content control.

  • Use adaptive models for better recommendations progressively.

Sql server data mining l.jpg
SQL Server Data Mining solve?

What's new in BI for SQL Server 2008

Lynn Langit

Room: 107




Sql server dm algorithms l.jpg
SQL Server DM Algorithms solve?

Microsoft Association AlgorithmMicrosoft Clustering AlgorithmMicrosoft Decision Trees AlgorithmMicrosoft Naive Bayes AlgorithmMicrosoft Neural Network Algorithm (SSAS)Microsoft Sequence Clustering AlgorithmMicrosoft Time Series AlgorithmMicrosoft Linear Regression AlgorithmMicrosoft Logistic Regression Algorithm

Sql server prediction queries l.jpg
SQL Server Prediction Queries solve?

The following query retrieves report data indicating which customers are likely to purchase a bicycle, and the probability that they will do so.

  • SELECT t.FirstName, t.LastName, (Predict ([Bike Buyer])) as [PredictedValue], (PredictProbability([Bike Buyer])) as [Probability] From [TM Decision Tree] PREDICTION JOIN OPENQUERY([Adventure Works DW], 'SELECT [FirstName], [LastName], [CustomerKey], [MaritalStatus], [Gender], [YearlyIncome], [TotalChildren], [NumberChildrenAtHome], [HouseOwnerFlag], [NumberCarsOwned], [CommuteDistance] FROM [dbo].[DimCustomer] ') AS t ON [TM Decision Tree].[Marital Status] = t.[MaritalStatus] AND [TM Decision Tree].[Gender] = t.[Gender] AND [TM Decision Tree].[Yearly Income] = t.[YearlyIncome] AND [TM Decision Tree].[Total Children] = t.[TotalChildren] AND [TM Decision Tree].[Number Children At Home] = t.[NumberChildrenAtHome] AND [TM Decision Tree].[House Owner Flag] = t.[HouseOwnerFlag] AND [TM Decision Tree].[Number Cars Owned] = t.[NumberCarsOwned] AND [TM Decision Tree].[Commute Distance] = t.[CommuteDistance] WHERE (Predict ([Bike Buyer]))[email protected] AND (PredictProbability([Bike Buyer]))>@Probability

Sql server support for prediction l.jpg
SQL Server Support for Prediction solve?

  • SELECT FLATTENED TopCount(Predict([Invoice Detail], INCLUDE_STATISTICS), $AdjustedProbability, 5) FROM [assoc1] NATURAL PREDICTION JOIN ( SELECT 'Female' AS [Gender], 25 AS [Age], ( SELECT 'Mountain bottle cage' AS [Product Name] UNION SELECT 'Hydration pack -70oz' AS [Product Name] -- specify Gender, Marital Status, Income) AS [Invoice Detail] ) AS t

Questions l.jpg
Questions? solve?