1 / 33

EigenRank : A Ranking-Oriented Approach to Collaborative Filtering

EigenRank : A Ranking-Oriented Approach to Collaborative Filtering. Nathan N. Liu & Qiang Yang. SIGIR 2008. IDS Lab. Seminar Spring 2009. May 21 st , 2009. 강 민 석. Center for E -Business Technology Seoul National University Seoul, Korea. minsuk@europa.snu.ac.kr. Contents.

oliana
Download Presentation

EigenRank : A Ranking-Oriented Approach to Collaborative Filtering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EigenRank:A Ranking-Oriented Approach to Collaborative Filtering Nathan N. Liu & Qiang Yang SIGIR 2008 IDS Lab. Seminar Spring 2009 May 21st, 2009 강 민 석 Center for E-Business Technology Seoul National University Seoul, Korea minsuk@europa.snu.ac.kr

  2. Contents • Introduction • Related Work • Rating Oriented Collaborative Filtering • Ranking Oriented Collaborative Filtering • Experiments • Conclusions

  3. Introduction • Recommender Systems • Content-based filtering • Analyze content information associated with items and users • E.g. product descriptions, user profiles, etc. • Represent users and items using a set of features • Collaborative filtering • NOT require content information about items • Assumption that a user is interested in items preferred by other similar users Content-based filtering collaborative filtering

  4. Introduction • Collaborative Filtering Application Scenario • Rating prediction • one individual item at a time with a predicted rating • Top-N recommended items • an ordered list of top-N recommended items Rating Prediction (MovieLens) Top-N List (Amazon)

  5. Introduction • Motivation • In most CF, adopt rating-oriented approach • predict potential ratings first, then rank them • Higher accuracy in rating predictiondoes NOT necessarily lead to better ranking effectiveness • Example • Same error for two prediction algorithm, but for “predicted 2”, predicted ranking is incorrect • Mostexisting methods predict ratingwithout considering user’s preferences regarding pair of items

  6. Introduction • Overview • Ranking-oriented Approach to CF • directly address item ranking problem • Without inter-mediate step of rating prediction • Contribution • Similarity measure for two user’s rankings • Kendall rank correlation coefficient • Methods for producing item rankings • Greedy order algorithm, Random walk model Rating prediction Rank items

  7. Contents • Introduction • Related Work • Neighborhood-based Approach • Model-based Approach • Rating Oriented Collaborative Filtering • Ranking Oriented Collaborative Filtering • Experiments • Conclusions

  8. Neighborhood-based Approach • User-based Model • Estimate unknown ratings of a target user • based on ratings of neighboring users by using user-user similarity • Difficulties in User-based Model • Raw ratings may contain biases • E.g. Some tends to give high ratings. • Use user-specific means • User-item ratings data is sparse • dimensionality reduction • data-smoothing methods

  9. Neighborhood-based Approach • Item-based Model • similar, but use item-item similarity • Less sensitive to sparsity problem • # of items < # of users • Higher accuracy while allowing more efficient computations • Sarwar et al., 2001 Item-based model (Amazon)

  10. Model-based Approach • Model-based Approach • Use observed user-item ratings to train a compact model • Rating prediction via the model instead of directly manipulating data • Algorithms • Clustering methods • Aspect models • Bayesian networks • Learning to Rank • Rank items represented in some feature space • Methods Try to • Learn an item scoring function • Learn a classifier for classifying item pairs

  11. Contents • Introduction • Related Work • Rating Oriented Collaborative Filtering • Similarity Measure • Rating Prediction • Ranking Oriented Collaborative Filtering • Experiments • Conclusions

  12. Rating-based Similarity Measures • Pearson Correlation Coefficient • Similarity between two users • normalize ratings using average • Vector Similarity • Another way of user-user similarity • view each user as a vector • cosine of the angle between two vectors • Item-Item similarity • Adjusted cosine similarity most effective

  13. Rating Prediction • User-based Model • select a set of k most similar users • compute weighted average of ratings • Item-based Model • similar to user-based model • Set of k items most similar to i

  14. Contents • Introduction • Related Work • Rating Oriented Collaborative Filtering • Ranking Oriented Collaborative Filtering • Similarity Measure – Kendall Rank Correlation Coefficient • Preference Functions – Greedy Order & Random Walk Model • Experiments • Conclusions

  15. Similarity Measure • Motivation • PCC and VS are rating-based measures • In ranking-based, similarity is determined by users’ preferences over items. • E.g. for user 1 and 2, rating values are different, but preferences are very close. • Kendall Rank Correlation Coefficient 2 different preference 2 same preference

  16. Preference Functions • Modeling a user’s preference function • Given two items i and j, which item is more preferable and how much? • means item i is more preferable • indicates the strength of preference • Characteristics • For same item : • Anti-symmetric : • NOT transitive : do not imply

  17. Preference Functions • Derive Preference Function • Key challenge is to get preference that have NOT been rated. • Use the same idea of neighborhood-based CF • Find the set of neighbors of target user who have rated both items 17

  18. Preference Functions • Produce Ranking • Given preference function, we want to get a ranking of items. • Ranking that agree with pairwise preferences as much as possible • Ranking • ρ : ranking of item in item set I • : item i is ranked higher than j • Value function • How ρ is consistent with the preference function Ψ • Our goal is to find that maximizes value function • Optimal solution • NP-Complete problem : Use Greedy algorithm

  19. Greedy Order Algorithm • Motivation • Find an approximately optimal ranking • Algorithm • Input : item set I, preference function Ψ • Output : ranking • Complexity is O(n2), more than half of optimal potential valuehigher when more items less preferred than i find highest ranked item remove highest one,then iterate

  20. Random Walk Model for Item Ranking • Random Walk based on User Preferences • Motivation • some rated i > j, others rated j > k, but only few rated all three i, j, k • want to infer preference between i and k (implicit relationships) • Use multi-step random walks • Markov chain model • Google PageRank • Random walk on Web pages based on hyperlink • Surfer randomly pick hyperlink • Stationary distribution used to PageRank • Model for item ranking • Similarly, there are implicit links between two items • less preferred item jlink to more preferred item i • transitional probability • Stationary distribution used to item ranking • At each step the system may change its state from the current state to another state according to a probability distribution. The changes of state are called transitions …(Wikipedia) link page page preference item item

  21. Random Walk Model for Item Ranking • Random Walk based on User Preferences • Transitional probability • Probability of switching current item i to another item j • higher for items that are more preferred than i • depend on user’s preference function Why exp function? non-negative

  22. Random Walk Model for Item Ranking • Compute the Item Rankings • Think of PageRank algorithm you may know • We can use matrix notations • P : transition matrix • entry : transition probability • : probability of being at item iafter t walking steps • define • get these probabilities using power iteration method for solving eigenvector • Stationary probabilities • It works? • Existence and uniqueness guaranteed iffP is irreducible • entries of P are all non-negative

  23. Random Walk Model for Item Ranking • Personalization Vector (teleport) • To avoid the reducibility of the stochastic matrix (Brin and Page, 1998) • Revised transition matrix • PageRank • Web surfer sometimes “teleport” to other pages. • Teleport according to probability distribution defined by personalization vector v • ε controls how often surfer teleport rather than following hyperlinks. • Our model • similar idea to define personalization vector • Teleport to items with high ratings more often • Unrated items have equal probabilities

  24. Contents • Introduction • Related Work • Rating Oriented Collaborative Filtering • Ranking Oriented Collaborative Filtering • Experiments • Conclusions

  25. Experiments • Issues • Is ranking-oriented approach better than rating-oriented? • Which is better, greedy order algorithm and random walk model? • Is the ranking-oriented similarity measure (Kendall’s) more effective? 3 1 2

  26. Experiments • Data Sets • Two Movie ratings data sets • EachMovie and Netflix • Users • rate >40 different movies • 10,000 for training • 100 for parameter tuning • 500 for testing • Evaluation Protocol • For each user in the test set, • 50% for model construction • 50% for hold-out data for evaluation

  27. Evaluation Metric • Which metric to use? • Rating-oriented CF • MAE (Mean Absolute Error) and RMSE (Root Mean Square Error) • Focus on difference between true rating and predicted rating • Ranking-oriented CF • Our emphasis is on improving item rankings. • NDCG (Normalized Discounted Cumulative Gain) • Evaluate over the top-k items on ranked list discounting factor Increase with position in ranking

  28. Impact of Parameters • Impact of Neighborhood Size • size of neighborhood affect performance • Result • When neighbor size ↑, NDCG ↑ until 100because given more neighbors, preference function more accurate • But, start to decrease when exceed 100, due to many non-similar users

  29. Impact of Parameters • Impact of ε • How often “teleport” operation affect performance? • Result • When ε ↑, NDCG ↑ • But, NOT too big (0.8~0.9)

  30. Comparisons with Other Algorithms • Issues • Is ranking-oriented approach better than rating-oriented? • Which is better, greedy order algorithm and random walk model? • Is the ranking-oriented similarity measure (Kendall’s) more effective? • Comparison • 4 rating oriented settings, 6 ranking oriented settings

  31. Comparisons with Other Algorithms • Result • Ranking-oriented is better than rating-oriented about 8.8% for NDCG1 • Random walk model outperformed all the rating-oriented • Random walk model is little better than greedy order • Kendall rank correlation coefficient is more effective for ranking-oriented

  32. Conclusion • Ranking-oriented Framework for CF • Item ranking w/o rating prediction as intermediate step • Extend neighborhood-based CF by identifying preferences • Two methods for computing item ranking • Greedy order algorithm • Random walk model Greedy order Similarity measure Preference function Random walk model Kendall rank corr. coeff.

  33. Clustering the Tagged Web Thank you~

More Related