A Review of Information Filtering Part II: Collaborative Filtering

A Review of Information FilteringPart II: Collaborative Filtering Chengxiang Zhai Language Technologies Institiute School of Computer Science Carnegie Mellon University

Outline • A Conceptual Framework for Collaborative Filtering (CF) • Rating-based Methods (Breese et al. 98) • Memory-based methods • Model-based methods • Preference-based Methods (Cohen et al. 99 & Freund et al. 98) • Summary & Research Directions

What is Collaborative Filtering (CF)? • Making filtering decisions for an individual user based on the judgments of other users • Inferring individual’s interest/preferences from that of other similar users • General idea • Given a user u, find similar users {u1, …, um} • Predict u’s preferences based on the preferences of u1, …, um

CF: Applications • Recommender Systems: books, CDs, Videos, Movies, potentially anything! • Can be combined with content-based filtering • Example (commercial) systems • GroupLens (Resnick et al. 94): usenet news rating • Amazon: book recommendation • Firefly (purchased by Microsoft?): music recommendation • Alexa: web page recommendation

CF: Assumptions • Users with a common interest will have similar preferences • Users with similar preferences probably share the same interest • Examples • “interest is IR” => “read SIGIR papers” • “read SIGIR papers” => “interest is IR” • Sufficiently large number of user preferences are available

CF: Intuitions • User similarity • If Jamie liked the paper, I’ll like the paper • ? If Jamie liked the movie, I’ll like the movie • Suppose Jamie and I viewed similar movies in the past six months … • Item similarity • Since 90% of those who liked Star Wars also liked Independence Day, and, you liked Star Wars • You may also like Independence Day

Collaborative Filtering vs. Content-based Filtering • Basic filtering question: Will user U like item X? • Two different ways of answering it • Look at what U likes • Look at who likes X • Can be combined => characterize X => content-based filtering => characterize U => collaborative filtering

Rating-based vs. Preference-based • Rating-based: User’s preferences are encoded using numerical ratings on items • Complete ordering • Absolute values can be meaningful • But, values must be normalized to combine • Preferences: User’s preferences are represented by partial ordering of items • Partial ordering • Easier to exploit implicit preferences

A Formal Framework for Rating Objects: O o1 o2 … oj … on 3 1.5 …. … 2 2 1 3 Users: U Xij=f(ui,oj)=? u1 u2 … ui ... um The task • Assume known f values for some (u,o)’s • Predict f values for other (u,o)’s • Essentially function approximation, like other learning problems ? Unknown function f: U x O R

Where are the intuitions? • Similar users have similar preferences • If u  u’, then for all o’s, f(u,o)  f(u’,o) • Similar objects have similar user preferences • If o  o’, then for all u’s, f(u,o)  f(u,o’) • In general, f is “locally constant” • If u  u’ and o  o’, then f(u,o)  f(u’,o’) • “Local smoothness” makes it possible to predict unknown values by interpolation or extrapolation • What does “local” mean?

Two Groups of Approaches • Memory-based approaches • f(u,o) = g(u)(o)  g(u’)(o) if u  u’ • Find “neighbors” of u and combine g(u’)(o)’s • Model-based approaches • Assume structures/model: object cluster, user cluster, f’ defined on clusters • f(u,o) = f’(cu, co) • Estimation & Probabilistic inference

Memory-based Approaches (Breese et al. 98) • General ideas: • Xij: rating of object j by user i • ni: average rating of all objects by user i • Normalized ratings: Vij = Xij - ni • Memory-based prediction • Specific approaches differ in w(a,i) -- the distance/similarity between user a and i

User Similarity Measures • Pearson correlation coefficient (sum over commonly rated items) • Cosine measure • Many other possibilities!

Improving User Similarity Measures (Breese et al. 98) • Dealing with missing values: default ratings • Inverse User Frequency (IUF): similar to IDF • Case Amplification: use w(a,I)p, e.g., p=2.5

Model-based Approaches (Breese et al. 98) • General ideas • Assume that data/ratings are explained by a probabilistic model with parameter  • Estimate/learn model parameter  based on data • Predict unknown rating using E [xk+1 | x1, …, xk], which is computed using the estimated model • Specific methods differ in the model used and how the model is estimated

Probabilistic Clustering • Clustering users based on their ratings • Assume ratings are observations of a multinomial mixture model with parameters p(C), p(xi|C) • Model estimated using standard EM • Predict ratings using E[xk+1 | x1, …, xk]

Bayesian Network • Use BN to capture object/item dependency • Each item/object is a node • (Dependency) structure is learned from all data • Model parameters: p(xk+1 |pa(xk+1)) where pa(xk+1) is the parents/predictors of xk+1 (represented as a decision tree) • Predict ratings using E[xk+1 | x1, …, xk]

Three-way Aspect Model(Popescul et al. 2001) • CF + content-based • Generative model • (u,d,w) as observations • z as hidden variable • Standard EM • Essentially clustering the joint data • Evaluation on ResearchIndex data • Found it’s better to treat (u,w) as observations

Evaluation Criteria (Breese et al. 98) • Rating accuracy • Average absolute deviation • Pa = set of items predicted • Ranking accuracy • Expected utility • Exponentially decaying viewing probabillity •  ( halflife )= the rank where the viewing probability =0.5 • d = neutral rating

Datasets

Results - BN & CR+ are generally better than VSIM & BC - BN is best with more training data - VSIM is better with little training data - Inverse User Freq. Is effective - Case amplification is mostly effective

Summary of Rating-based Methods • Effectiveness • Both memory-based and model-based methods can be effective • The correlation method appears to be robust • Bayesian network works well with plenty of training data, but not very well with little training data • The cosine similarity method works well with little training data

Summary of Rating-based Methods (cont.) • Efficiency • Memory based methods are slower than model-based methods in predicting • Learning can be extremely slow for model-based methods

Preference-based Methods(Cohen et al. 99, Freund et al. 98) • Motivation • Explicit ratings are not always available, but implicit orderings/preferences might be available • Only relative ratings are meaningful, even if when ratings are available • Combining preferences has other applications, e.g., • Merging results from different search engines

A Formal Model of Preferences • Instances: O={o1,…, on} • Ranking function: R: (U x) O x O  [0,1] • R(u,v)=1 means u is strongly preferred to v • R(u,v)=0 means v is strongly preferred to u • R(u,v)=0.5 means no preference • Feedback: F = {(u,v)}, u is preferred to v • Minimize Loss: Hypothesis space

The Hypothesis Space H • Without constraints on H, the loss is minimized by any R that agrees with F • Appropriate constraints for collaborative filtering • Compare this with

The Hedge Algorithm for Combining Preferences • Iterative updating of w1, w2, …, wn • Initialization: wi is uniform • Updating:   [0,1] • L=0 => weight stays • L is large => weight is decreased

Some Theoretical Results • The cumulative loss of Ra will not be much worse than that of the best ranking expert/feature • Preferences Ra => ordering  => R L(R,F) <= DISAGREE(,Ra)/|F| + L(Ra,F) • Need to find  that minimizes disagreement • General case: NP-complete

A Greedy Ordering Algorithm • Use weighted graph to represent preferences R • For each node, compute the potential value, I.e., outgoing_weights - ingoing_weights • Rank the node with the highest potential value above all others • Remove this node and its edges, repeat • At least half of the optimal agreement is guaranteed

Improvement • Identify all the strongly connected components • Rank the components consistently with the edges between them • Rank the nodes within a component using the basic greedy algorithm

Evaluation of Ordering Algorithms • Measure: “weight coverage” • Datasets = randomly generated small graphs • Observations • The basic greedy algorithm works better than a random permutation baseline • Improved version is generally better, but the improvement is insignificant for large graphs

Metasearch Experiments • Task: Known item search • Search for a ML researchers’ homepage • Search for a university homepage • Search expert = variant of query • Learn to merge results of all search experts • Feedback • Complete : known item preferred to all others • Click data : known item preferred to all above it • Leave-one-out testing

Metasearch Results • Measures: compare combined preferences with individual ranking function • sign test: to see which system tends to rank the known relevant article higher. • #queries with the known relevant item ranked above k. • average rank of the known relevant item • Learned system better than individual expert by all measure (not surprising, why?)

Metasearch Results (cont.)

Direct Learning of an Ordering Function • Each expert is treated as a ranking feature fi: O  R U {0} (allow partial ranking) • Given preference feedback : X x X R • Goal: to learn H that minimizes the loss • D (x0,x1): a distribution over X x X(actually a uniform dist. over pairs with feedback order) D (x0,x1) = c max{0, (x0,x1) }

The RankBoost Algorithm • Iterative updating of D(x0,x1) • Initialization: D1= D • For t=1,…,T: • Train weak learner using Dt • Get weak hypothesis ht: X R • Choose t >0 • Update • Final hypothesis:

How to Choose t and Design ht ? • Bound on the ranking loss • Thus, we should choose t that minimizes the bound • Three approaches: • Numerical search • Special case: h is either 0 or 1 • Approximation of Z, then find analytic solution

Efficient RankBoost for Bipartite Feedback X0 Bipartite feedback: Essentially binary classification X1 Complexity at each round: O(|X0||X1|)  O(|X0|+|X1|)

Evaluation of RankBoost • Meta-search: Same as in (Cohen et al 99) • Perfect feedback • 4-fold cross validation

EachMovie Evaluation # users #movies/user #feedback movies

Performance ComparisonCohen et al. 99 vs. Freund et al. 99

Summary • CF is “easy” • The user’s expectation is low • Any recommendation is better than none • Making it practically useful • CF is “hard” • Data sparseness • Scalability • Domain-dependent

Summary (cont.) • CF as a Learning Task • Rating-based formulation • Learn f: U x O -> R • Algorithms • Instance-based/memory-based (k-nearest neighbors) • Model-based (probabilistic clustering) • Preference-based formulation • Learn PREF: U x O x O -> R • Algorithms • General preference combination (Hedge), greedy ordering • Efficient restricted preference combination (RankBoost)

Summary (cont.) • Evaluation • Rating-based methods • Simple methods seem to be reasonably effective • Advantage of sophisticated methods seems to be limited • Preference-based methods • More effective than rating-based methods according to one evaluation • Evaluation on meta-search is weak

Research Directions • Exploiting complete information • CF + content-based filtering + domain knowledge + user model … • More “localized” kernels for instance-based methods • Predicting movies need different “neighbor users” than predicting books • Suggesting using items similar to the target item as features to find neighbors

Research Directions (cont.) • Modeling time • There might be sequential patterns on the items a user purchased (e.g., bread machine -> bread machine mix) • Probabilistic model of preferences • Making preference function a probability function, e.g, P(A>B|U) • Clustering items and users • Minimizing preference disagreements

References • Cohen, W.W., Schapire, R.E., and Singer, Y. (1999) "Learning to Order Things", Journal of AI Research, Volume 10, pages 243-270. • Freund, Y., Iyer, R.,Schapire, R.E., & Singer, Y. (1999). An efficient boosting algorithm for combining preferences. Machine Learning Journal. 1999. • Breese, J. S., Heckerman, D., and Kadie, C. (1998). Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in Articial Intelligence, pp. 43-52. • Alexandrin Popescul and Lyle H. Ungar, Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments, UAI 2001. • N. Good, J.B. Schafer, J. Konstan, A. Borchers, B. Sarwar, J. Herlocker, and J. Riedl. "Combining Collaborative Filtering with Personal Agents for Better Recommendations." Proceedings AAAI-99. pp 439-446. 1999.

A Review of Information Filtering Part II: Collaborative Filtering

A Review of Information Filtering Part II: Collaborative Filtering

Presentation Transcript

Mathematical Morphology II: Filtering

Packet Filtering

Zebra/Quagga Routing Suite

Wireless Security

Chapter 8. FIR Filter Design

Deconvolution, Deblurring and Restoration

Part V: Collaborative Signal Processing Akbar Sayeed

Tutorial on Junk Mail Filtering

Learning from the Crowd: Collaborative Filtering Techniques for Identifying On-the-Ground Twitterers during Mass Disru

Naïve Bayes

Spatial Filtering

Fall 2004, CIS, Temple University CIS527: Data Warehousing, Filtering, and Mining Lecture 6

Image Processing and Analysis

ELECTRONIC CIRCUITS

Uniform State Residency 2014 Training

防火牆教育訓練

Introduction to audio signal processing

Text Categorization

Introduction to Computer Vision

Advanced Internet Security

CS589-04 Digital Image Processing Lecture 2. Intensity Transformation and Spatial Filtering

Blocked at the border