1 / 47

A Review of Information Filtering Part II: Collaborative Filtering

A Review of Information Filtering Part II: Collaborative Filtering. Chengxiang Zhai Language Technologies Institiute School of Computer Science Carnegie Mellon University. Outline. A Conceptual Framework for Collaborative Filtering (CF) Rating-based Methods (Breese et al. 98)

Samuel
Download Presentation

A Review of Information Filtering Part II: Collaborative Filtering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Review of Information FilteringPart II: Collaborative Filtering Chengxiang Zhai Language Technologies Institiute School of Computer Science Carnegie Mellon University

  2. Outline • A Conceptual Framework for Collaborative Filtering (CF) • Rating-based Methods (Breese et al. 98) • Memory-based methods • Model-based methods • Preference-based Methods (Cohen et al. 99 & Freund et al. 98) • Summary & Research Directions

  3. What is Collaborative Filtering (CF)? • Making filtering decisions for an individual user based on the judgments of other users • Inferring individual’s interest/preferences from that of other similar users • General idea • Given a user u, find similar users {u1, …, um} • Predict u’s preferences based on the preferences of u1, …, um

  4. CF: Applications • Recommender Systems: books, CDs, Videos, Movies, potentially anything! • Can be combined with content-based filtering • Example (commercial) systems • GroupLens (Resnick et al. 94): usenet news rating • Amazon: book recommendation • Firefly (purchased by Microsoft?): music recommendation • Alexa: web page recommendation

  5. CF: Assumptions • Users with a common interest will have similar preferences • Users with similar preferences probably share the same interest • Examples • “interest is IR” => “read SIGIR papers” • “read SIGIR papers” => “interest is IR” • Sufficiently large number of user preferences are available

  6. CF: Intuitions • User similarity • If Jamie liked the paper, I’ll like the paper • ? If Jamie liked the movie, I’ll like the movie • Suppose Jamie and I viewed similar movies in the past six months … • Item similarity • Since 90% of those who liked Star Wars also liked Independence Day, and, you liked Star Wars • You may also like Independence Day

  7. Collaborative Filtering vs. Content-based Filtering • Basic filtering question: Will user U like item X? • Two different ways of answering it • Look at what U likes • Look at who likes X • Can be combined => characterize X => content-based filtering => characterize U => collaborative filtering

  8. Rating-based vs. Preference-based • Rating-based: User’s preferences are encoded using numerical ratings on items • Complete ordering • Absolute values can be meaningful • But, values must be normalized to combine • Preferences: User’s preferences are represented by partial ordering of items • Partial ordering • Easier to exploit implicit preferences

  9. A Formal Framework for Rating Objects: O o1 o2 … oj … on 3 1.5 …. … 2 2 1 3 Users: U Xij=f(ui,oj)=? u1 u2 … ui ... um The task • Assume known f values for some (u,o)’s • Predict f values for other (u,o)’s • Essentially function approximation, like other learning problems ? Unknown function f: U x O R

  10. Where are the intuitions? • Similar users have similar preferences • If u  u’, then for all o’s, f(u,o)  f(u’,o) • Similar objects have similar user preferences • If o  o’, then for all u’s, f(u,o)  f(u,o’) • In general, f is “locally constant” • If u  u’ and o  o’, then f(u,o)  f(u’,o’) • “Local smoothness” makes it possible to predict unknown values by interpolation or extrapolation • What does “local” mean?

  11. Two Groups of Approaches • Memory-based approaches • f(u,o) = g(u)(o)  g(u’)(o) if u  u’ • Find “neighbors” of u and combine g(u’)(o)’s • Model-based approaches • Assume structures/model: object cluster, user cluster, f’ defined on clusters • f(u,o) = f’(cu, co) • Estimation & Probabilistic inference

  12. Memory-based Approaches (Breese et al. 98) • General ideas: • Xij: rating of object j by user i • ni: average rating of all objects by user i • Normalized ratings: Vij = Xij - ni • Memory-based prediction • Specific approaches differ in w(a,i) -- the distance/similarity between user a and i

  13. User Similarity Measures • Pearson correlation coefficient (sum over commonly rated items) • Cosine measure • Many other possibilities!

  14. Improving User Similarity Measures (Breese et al. 98) • Dealing with missing values: default ratings • Inverse User Frequency (IUF): similar to IDF • Case Amplification: use w(a,I)p, e.g., p=2.5

  15. Model-based Approaches (Breese et al. 98) • General ideas • Assume that data/ratings are explained by a probabilistic model with parameter  • Estimate/learn model parameter  based on data • Predict unknown rating using E [xk+1 | x1, …, xk], which is computed using the estimated model • Specific methods differ in the model used and how the model is estimated

  16. Probabilistic Clustering • Clustering users based on their ratings • Assume ratings are observations of a multinomial mixture model with parameters p(C), p(xi|C) • Model estimated using standard EM • Predict ratings using E[xk+1 | x1, …, xk]

  17. Bayesian Network • Use BN to capture object/item dependency • Each item/object is a node • (Dependency) structure is learned from all data • Model parameters: p(xk+1 |pa(xk+1)) where pa(xk+1) is the parents/predictors of xk+1 (represented as a decision tree) • Predict ratings using E[xk+1 | x1, …, xk]

  18. Three-way Aspect Model(Popescul et al. 2001) • CF + content-based • Generative model • (u,d,w) as observations • z as hidden variable • Standard EM • Essentially clustering the joint data • Evaluation on ResearchIndex data • Found it’s better to treat (u,w) as observations

  19. Evaluation Criteria (Breese et al. 98) • Rating accuracy • Average absolute deviation • Pa = set of items predicted • Ranking accuracy • Expected utility • Exponentially decaying viewing probabillity •  ( halflife )= the rank where the viewing probability =0.5 • d = neutral rating

  20. Datasets

  21. Results - BN & CR+ are generally better than VSIM & BC - BN is best with more training data - VSIM is better with little training data - Inverse User Freq. Is effective - Case amplification is mostly effective

  22. Summary of Rating-based Methods • Effectiveness • Both memory-based and model-based methods can be effective • The correlation method appears to be robust • Bayesian network works well with plenty of training data, but not very well with little training data • The cosine similarity method works well with little training data

  23. Summary of Rating-based Methods (cont.) • Efficiency • Memory based methods are slower than model-based methods in predicting • Learning can be extremely slow for model-based methods

  24. Preference-based Methods(Cohen et al. 99, Freund et al. 98) • Motivation • Explicit ratings are not always available, but implicit orderings/preferences might be available • Only relative ratings are meaningful, even if when ratings are available • Combining preferences has other applications, e.g., • Merging results from different search engines

  25. A Formal Model of Preferences • Instances: O={o1,…, on} • Ranking function: R: (U x) O x O  [0,1] • R(u,v)=1 means u is strongly preferred to v • R(u,v)=0 means v is strongly preferred to u • R(u,v)=0.5 means no preference • Feedback: F = {(u,v)}, u is preferred to v • Minimize Loss: Hypothesis space

  26. The Hypothesis Space H • Without constraints on H, the loss is minimized by any R that agrees with F • Appropriate constraints for collaborative filtering • Compare this with

  27. The Hedge Algorithm for Combining Preferences • Iterative updating of w1, w2, …, wn • Initialization: wi is uniform • Updating:   [0,1] • L=0 => weight stays • L is large => weight is decreased

  28. Some Theoretical Results • The cumulative loss of Ra will not be much worse than that of the best ranking expert/feature • Preferences Ra => ordering  => R L(R,F) <= DISAGREE(,Ra)/|F| + L(Ra,F) • Need to find  that minimizes disagreement • General case: NP-complete

  29. A Greedy Ordering Algorithm • Use weighted graph to represent preferences R • For each node, compute the potential value, I.e., outgoing_weights - ingoing_weights • Rank the node with the highest potential value above all others • Remove this node and its edges, repeat • At least half of the optimal agreement is guaranteed

  30. Improvement • Identify all the strongly connected components • Rank the components consistently with the edges between them • Rank the nodes within a component using the basic greedy algorithm

  31. Evaluation of Ordering Algorithms • Measure: “weight coverage” • Datasets = randomly generated small graphs • Observations • The basic greedy algorithm works better than a random permutation baseline • Improved version is generally better, but the improvement is insignificant for large graphs

  32. Metasearch Experiments • Task: Known item search • Search for a ML researchers’ homepage • Search for a university homepage • Search expert = variant of query • Learn to merge results of all search experts • Feedback • Complete : known item preferred to all others • Click data : known item preferred to all above it • Leave-one-out testing

  33. Metasearch Results • Measures: compare combined preferences with individual ranking function • sign test: to see which system tends to rank the known relevant article higher. • #queries with the known relevant item ranked above k. • average rank of the known relevant item • Learned system better than individual expert by all measure (not surprising, why?)

  34. Metasearch Results (cont.)

  35. Direct Learning of an Ordering Function • Each expert is treated as a ranking feature fi: O  R U {0} (allow partial ranking) • Given preference feedback : X x X R • Goal: to learn H that minimizes the loss • D (x0,x1): a distribution over X x X(actually a uniform dist. over pairs with feedback order) D (x0,x1) = c max{0, (x0,x1) }

  36. The RankBoost Algorithm • Iterative updating of D(x0,x1) • Initialization: D1= D • For t=1,…,T: • Train weak learner using Dt • Get weak hypothesis ht: X R • Choose t >0 • Update • Final hypothesis:

  37. How to Choose t and Design ht ? • Bound on the ranking loss • Thus, we should choose t that minimizes the bound • Three approaches: • Numerical search • Special case: h is either 0 or 1 • Approximation of Z, then find analytic solution

  38. Efficient RankBoost for Bipartite Feedback X0 Bipartite feedback: Essentially binary classification X1 Complexity at each round: O(|X0||X1|)  O(|X0|+|X1|)

  39. Evaluation of RankBoost • Meta-search: Same as in (Cohen et al 99) • Perfect feedback • 4-fold cross validation

  40. EachMovie Evaluation # users #movies/user #feedback movies

  41. Performance ComparisonCohen et al. 99 vs. Freund et al. 99

  42. Summary • CF is “easy” • The user’s expectation is low • Any recommendation is better than none • Making it practically useful • CF is “hard” • Data sparseness • Scalability • Domain-dependent

  43. Summary (cont.) • CF as a Learning Task • Rating-based formulation • Learn f: U x O -> R • Algorithms • Instance-based/memory-based (k-nearest neighbors) • Model-based (probabilistic clustering) • Preference-based formulation • Learn PREF: U x O x O -> R • Algorithms • General preference combination (Hedge), greedy ordering • Efficient restricted preference combination (RankBoost)

  44. Summary (cont.) • Evaluation • Rating-based methods • Simple methods seem to be reasonably effective • Advantage of sophisticated methods seems to be limited • Preference-based methods • More effective than rating-based methods according to one evaluation • Evaluation on meta-search is weak

  45. Research Directions • Exploiting complete information • CF + content-based filtering + domain knowledge + user model … • More “localized” kernels for instance-based methods • Predicting movies need different “neighbor users” than predicting books • Suggesting using items similar to the target item as features to find neighbors

  46. Research Directions (cont.) • Modeling time • There might be sequential patterns on the items a user purchased (e.g., bread machine -> bread machine mix) • Probabilistic model of preferences • Making preference function a probability function, e.g, P(A>B|U) • Clustering items and users • Minimizing preference disagreements

  47. References • Cohen, W.W., Schapire, R.E., and Singer, Y. (1999) "Learning to Order Things", Journal of AI Research, Volume 10, pages 243-270. • Freund, Y., Iyer, R.,Schapire, R.E., & Singer, Y. (1999). An efficient boosting algorithm for combining preferences. Machine Learning Journal. 1999. • Breese, J. S., Heckerman, D., and Kadie, C. (1998). Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in Articial Intelligence, pp. 43-52. • Alexandrin Popescul and Lyle H. Ungar, Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments, UAI 2001. • N. Good, J.B. Schafer, J. Konstan, A. Borchers, B. Sarwar, J. Herlocker, and J. Riedl. "Combining Collaborative Filtering with Personal Agents for Better Recommendations." Proceedings AAAI-99. pp 439-446. 1999.

More Related