a review of information filtering part ii collaborative filtering l.
Skip this Video
Loading SlideShow in 5 Seconds..
A Review of Information Filtering Part II: Collaborative Filtering PowerPoint Presentation
Download Presentation
A Review of Information Filtering Part II: Collaborative Filtering

Loading in 2 Seconds...

play fullscreen
1 / 47

A Review of Information Filtering Part II: Collaborative Filtering - PowerPoint PPT Presentation

  • Uploaded on

A Review of Information Filtering Part II: Collaborative Filtering. Chengxiang Zhai Language Technologies Institiute School of Computer Science Carnegie Mellon University. Outline. A Conceptual Framework for Collaborative Filtering (CF) Rating-based Methods (Breese et al. 98)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'A Review of Information Filtering Part II: Collaborative Filtering' - Samuel

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a review of information filtering part ii collaborative filtering
A Review of Information FilteringPart II: Collaborative Filtering

Chengxiang Zhai

Language Technologies Institiute

School of Computer Science

Carnegie Mellon University

  • A Conceptual Framework for Collaborative Filtering (CF)
  • Rating-based Methods (Breese et al. 98)
    • Memory-based methods
    • Model-based methods
  • Preference-based Methods (Cohen et al. 99 & Freund et al. 98)
  • Summary & Research Directions
what is collaborative filtering cf
What is Collaborative Filtering (CF)?
  • Making filtering decisions for an individual user based on the judgments of other users
  • Inferring individual’s interest/preferences from that of other similar users
  • General idea
    • Given a user u, find similar users {u1, …, um}
    • Predict u’s preferences based on the preferences of u1, …, um
cf applications
CF: Applications
  • Recommender Systems: books, CDs, Videos, Movies, potentially anything!
  • Can be combined with content-based filtering
  • Example (commercial) systems
    • GroupLens (Resnick et al. 94): usenet news rating
    • Amazon: book recommendation
    • Firefly (purchased by Microsoft?): music recommendation
    • Alexa: web page recommendation
cf assumptions
CF: Assumptions
  • Users with a common interest will have similar preferences
  • Users with similar preferences probably share the same interest
  • Examples
    • “interest is IR” => “read SIGIR papers”
    • “read SIGIR papers” => “interest is IR”
  • Sufficiently large number of user preferences are available
cf intuitions
CF: Intuitions
  • User similarity
    • If Jamie liked the paper, I’ll like the paper
    • ? If Jamie liked the movie, I’ll like the movie
    • Suppose Jamie and I viewed similar movies in the past six months …
  • Item similarity
    • Since 90% of those who liked Star Wars also liked Independence Day, and, you liked Star Wars
    • You may also like Independence Day
collaborative filtering vs content based filtering
Collaborative Filtering vs. Content-based Filtering
  • Basic filtering question: Will user U like item X?
  • Two different ways of answering it
    • Look at what U likes
    • Look at who likes X
  • Can be combined

=> characterize X => content-based filtering

=> characterize U => collaborative filtering

rating based vs preference based
Rating-based vs. Preference-based
  • Rating-based: User’s preferences are encoded using numerical ratings on items
    • Complete ordering
    • Absolute values can be meaningful
    • But, values must be normalized to combine
  • Preferences: User’s preferences are represented by partial ordering of items
    • Partial ordering
    • Easier to exploit implicit preferences
a formal framework for rating
A Formal Framework for Rating

Objects: O

o1 o2 … oj … on

3 1.5 …. … 2




Users: U







The task

  • Assume known f values for some (u,o)’s
  • Predict f values for other (u,o)’s
  • Essentially function approximation, like other learning problems


Unknown function

f: U x O R

where are the intuitions
Where are the intuitions?
  • Similar users have similar preferences
    • If u  u’, then for all o’s, f(u,o)  f(u’,o)
  • Similar objects have similar user preferences
    • If o  o’, then for all u’s, f(u,o)  f(u,o’)
  • In general, f is “locally constant”
    • If u  u’ and o  o’, then f(u,o)  f(u’,o’)
    • “Local smoothness” makes it possible to predict unknown values by interpolation or extrapolation
  • What does “local” mean?
two groups of approaches
Two Groups of Approaches
  • Memory-based approaches
    • f(u,o) = g(u)(o)  g(u’)(o) if u  u’
    • Find “neighbors” of u and combine g(u’)(o)’s
  • Model-based approaches
    • Assume structures/model: object cluster, user cluster, f’ defined on clusters
    • f(u,o) = f’(cu, co)
    • Estimation & Probabilistic inference
memory based approaches breese et al 98
Memory-based Approaches (Breese et al. 98)
  • General ideas:
    • Xij: rating of object j by user i
    • ni: average rating of all objects by user i
    • Normalized ratings: Vij = Xij - ni
    • Memory-based prediction
  • Specific approaches differ in w(a,i) -- the distance/similarity between user a and i
user similarity measures
User Similarity Measures
  • Pearson correlation coefficient (sum over commonly rated items)
  • Cosine measure
  • Many other possibilities!
improving user similarity measures breese et al 98
Improving User Similarity Measures (Breese et al. 98)
  • Dealing with missing values: default ratings
  • Inverse User Frequency (IUF): similar to IDF
  • Case Amplification: use w(a,I)p, e.g., p=2.5
model based approaches breese et al 98
Model-based Approaches (Breese et al. 98)
  • General ideas
    • Assume that data/ratings are explained by a probabilistic model with parameter 
    • Estimate/learn model parameter  based on data
    • Predict unknown rating using E [xk+1 | x1, …, xk], which is computed using the estimated model
  • Specific methods differ in the model used and how the model is estimated
probabilistic clustering
Probabilistic Clustering
  • Clustering users based on their ratings
    • Assume ratings are observations of a multinomial mixture model with parameters p(C), p(xi|C)
    • Model estimated using standard EM
  • Predict ratings using E[xk+1 | x1, …, xk]
bayesian network
Bayesian Network
  • Use BN to capture object/item dependency
    • Each item/object is a node
    • (Dependency) structure is learned from all data
    • Model parameters: p(xk+1 |pa(xk+1)) where pa(xk+1) is the parents/predictors of xk+1 (represented as a decision tree)
  • Predict ratings using E[xk+1 | x1, …, xk]
three way aspect model popescul et al 2001
Three-way Aspect Model(Popescul et al. 2001)
  • CF + content-based
  • Generative model
  • (u,d,w) as observations
  • z as hidden variable
  • Standard EM
  • Essentially clustering the joint data
  • Evaluation on ResearchIndex data
  • Found it’s better to treat (u,w) as observations
evaluation criteria breese et al 98
Evaluation Criteria (Breese et al. 98)
  • Rating accuracy
    • Average absolute deviation
    • Pa = set of items predicted
  • Ranking accuracy
    • Expected utility
    • Exponentially decaying viewing probabillity
    •  ( halflife )= the rank where the viewing probability =0.5
    • d = neutral rating

- BN & CR+ are generally better

than VSIM & BC

- BN is best with more training data

- VSIM is better with little training data

- Inverse User Freq. Is effective

- Case amplification is mostly effective

summary of rating based methods
Summary of Rating-based Methods
  • Effectiveness
    • Both memory-based and model-based methods can be effective
    • The correlation method appears to be robust
    • Bayesian network works well with plenty of training data, but not very well with little training data
    • The cosine similarity method works well with little training data
summary of rating based methods cont
Summary of Rating-based Methods (cont.)
  • Efficiency
    • Memory based methods are slower than model-based methods in predicting
    • Learning can be extremely slow for model-based methods
preference based methods cohen et al 99 freund et al 98
Preference-based Methods(Cohen et al. 99, Freund et al. 98)
  • Motivation
    • Explicit ratings are not always available, but implicit orderings/preferences might be available
    • Only relative ratings are meaningful, even if when ratings are available
    • Combining preferences has other applications, e.g.,
      • Merging results from different search engines
a formal model of preferences
A Formal Model of Preferences
  • Instances: O={o1,…, on}
  • Ranking function: R: (U x) O x O  [0,1]
    • R(u,v)=1 means u is strongly preferred to v
    • R(u,v)=0 means v is strongly preferred to u
    • R(u,v)=0.5 means no preference
  • Feedback: F = {(u,v)}, u is preferred to v
  • Minimize Loss:

Hypothesis space

the hypothesis space h
The Hypothesis Space H
  • Without constraints on H, the loss is minimized by any R that agrees with F
  • Appropriate constraints for collaborative filtering
  • Compare this with
the hedge algorithm for combining preferences
The Hedge Algorithm for Combining Preferences
  • Iterative updating of w1, w2, …, wn
  • Initialization: wi is uniform
  • Updating:   [0,1]
  • L=0 => weight stays
  • L is large => weight is decreased
some theoretical results
Some Theoretical Results
  • The cumulative loss of Ra will not be much worse than that of the best ranking expert/feature
  • Preferences Ra => ordering  => R L(R,F) <= DISAGREE(,Ra)/|F| + L(Ra,F)
  • Need to find  that minimizes disagreement
  • General case: NP-complete
a greedy ordering algorithm
A Greedy Ordering Algorithm
  • Use weighted graph to represent preferences R
  • For each node, compute the potential value, I.e., outgoing_weights - ingoing_weights
  • Rank the node with the highest potential value above all others
  • Remove this node and its edges, repeat
  • At least half of the optimal agreement is guaranteed
  • Identify all the strongly connected components
  • Rank the components consistently with the edges between them
  • Rank the nodes within a component using the basic greedy algorithm
evaluation of ordering algorithms
Evaluation of Ordering Algorithms
  • Measure: “weight coverage”
  • Datasets = randomly generated small graphs
  • Observations
    • The basic greedy algorithm works better than a random permutation baseline
    • Improved version is generally better, but the improvement is insignificant for large graphs
metasearch experiments
Metasearch Experiments
  • Task: Known item search
    • Search for a ML researchers’ homepage
    • Search for a university homepage
  • Search expert = variant of query
  • Learn to merge results of all search experts
  • Feedback
    • Complete : known item preferred to all others
    • Click data : known item preferred to all above it
  • Leave-one-out testing
metasearch results
Metasearch Results
  • Measures: compare combined preferences with individual ranking function
    • sign test: to see which system tends to rank the known relevant article higher.
    • #queries with the known relevant item ranked above k.
    • average rank of the known relevant item
  • Learned system better than individual expert by all measure (not surprising, why?)
direct learning of an ordering function
Direct Learning of an Ordering Function
  • Each expert is treated as a ranking feature fi: O  R U {0} (allow partial ranking)
  • Given preference feedback : X x X R
  • Goal: to learn H that minimizes the loss
  • D (x0,x1): a distribution over X x X(actually a uniform dist. over pairs with feedback order) D (x0,x1) = c max{0, (x0,x1) }
the rankboost algorithm
The RankBoost Algorithm
  • Iterative updating of D(x0,x1)
  • Initialization: D1= D
  • For t=1,…,T:
    • Train weak learner using Dt
    • Get weak hypothesis ht: X R
    • Choose t >0
    • Update
  • Final hypothesis:
how to choose t and design h t
How to Choose t and Design ht ?
  • Bound on the ranking loss
  • Thus, we should choose t that minimizes the bound
  • Three approaches:
    • Numerical search
    • Special case: h is either 0 or 1
    • Approximation of Z, then find analytic solution
efficient rankboost for bipartite feedback
Efficient RankBoost for Bipartite Feedback


Bipartite feedback:

Essentially binary classification


Complexity at each round: O(|X0||X1|)  O(|X0|+|X1|)

evaluation of rankboost
Evaluation of RankBoost
  • Meta-search: Same as in (Cohen et al 99)
  • Perfect feedback
  • 4-fold cross validation
eachmovie evaluation
EachMovie Evaluation

# users


#feedback movies

  • CF is “easy”
    • The user’s expectation is low
    • Any recommendation is better than none
    • Making it practically useful
  • CF is “hard”
    • Data sparseness
    • Scalability
    • Domain-dependent
summary cont
Summary (cont.)
  • CF as a Learning Task
    • Rating-based formulation
      • Learn f: U x O -> R
      • Algorithms
        • Instance-based/memory-based (k-nearest neighbors)
        • Model-based (probabilistic clustering)
    • Preference-based formulation
      • Learn PREF: U x O x O -> R
      • Algorithms
        • General preference combination (Hedge), greedy ordering
        • Efficient restricted preference combination (RankBoost)
summary cont44
Summary (cont.)
  • Evaluation
    • Rating-based methods
      • Simple methods seem to be reasonably effective
      • Advantage of sophisticated methods seems to be limited
    • Preference-based methods
      • More effective than rating-based methods according to one evaluation
      • Evaluation on meta-search is weak
research directions
Research Directions
  • Exploiting complete information
    • CF + content-based filtering + domain knowledge + user model …
  • More “localized” kernels for instance-based methods
    • Predicting movies need different “neighbor users” than predicting books
    • Suggesting using items similar to the target item as features to find neighbors
research directions cont
Research Directions (cont.)
  • Modeling time
    • There might be sequential patterns on the items a user purchased (e.g., bread machine -> bread machine mix)
  • Probabilistic model of preferences
    • Making preference function a probability function, e.g, P(A>B|U)
    • Clustering items and users
    • Minimizing preference disagreements
  • Cohen, W.W., Schapire, R.E., and Singer, Y. (1999) "Learning to Order Things", Journal of AI Research, Volume 10, pages 243-270.
  • Freund, Y., Iyer, R.,Schapire, R.E., & Singer, Y. (1999). An efficient boosting algorithm for combining preferences. Machine Learning Journal. 1999.
  • Breese, J. S., Heckerman, D., and Kadie, C. (1998). Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in Articial Intelligence, pp. 43-52.
  • Alexandrin Popescul and Lyle H. Ungar, Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments, UAI 2001.
  • N. Good, J.B. Schafer, J. Konstan, A. Borchers, B. Sarwar, J. Herlocker, and J. Riedl. "Combining Collaborative Filtering with Personal Agents for Better Recommendations." Proceedings AAAI-99. pp 439-446. 1999.