a review of information filtering part ii collaborative filtering l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Review of Information Filtering Part II: Collaborative Filtering PowerPoint Presentation
Download Presentation
A Review of Information Filtering Part II: Collaborative Filtering

Loading in 2 Seconds...

play fullscreen
1 / 47

A Review of Information Filtering Part II: Collaborative Filtering - PowerPoint PPT Presentation


  • 240 Views
  • Uploaded on

A Review of Information Filtering Part II: Collaborative Filtering. Chengxiang Zhai Language Technologies Institiute School of Computer Science Carnegie Mellon University. Outline. A Conceptual Framework for Collaborative Filtering (CF) Rating-based Methods (Breese et al. 98)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Review of Information Filtering Part II: Collaborative Filtering' - Samuel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a review of information filtering part ii collaborative filtering
A Review of Information FilteringPart II: Collaborative Filtering

Chengxiang Zhai

Language Technologies Institiute

School of Computer Science

Carnegie Mellon University

outline
Outline
  • A Conceptual Framework for Collaborative Filtering (CF)
  • Rating-based Methods (Breese et al. 98)
    • Memory-based methods
    • Model-based methods
  • Preference-based Methods (Cohen et al. 99 & Freund et al. 98)
  • Summary & Research Directions
what is collaborative filtering cf
What is Collaborative Filtering (CF)?
  • Making filtering decisions for an individual user based on the judgments of other users
  • Inferring individual’s interest/preferences from that of other similar users
  • General idea
    • Given a user u, find similar users {u1, …, um}
    • Predict u’s preferences based on the preferences of u1, …, um
cf applications
CF: Applications
  • Recommender Systems: books, CDs, Videos, Movies, potentially anything!
  • Can be combined with content-based filtering
  • Example (commercial) systems
    • GroupLens (Resnick et al. 94): usenet news rating
    • Amazon: book recommendation
    • Firefly (purchased by Microsoft?): music recommendation
    • Alexa: web page recommendation
cf assumptions
CF: Assumptions
  • Users with a common interest will have similar preferences
  • Users with similar preferences probably share the same interest
  • Examples
    • “interest is IR” => “read SIGIR papers”
    • “read SIGIR papers” => “interest is IR”
  • Sufficiently large number of user preferences are available
cf intuitions
CF: Intuitions
  • User similarity
    • If Jamie liked the paper, I’ll like the paper
    • ? If Jamie liked the movie, I’ll like the movie
    • Suppose Jamie and I viewed similar movies in the past six months …
  • Item similarity
    • Since 90% of those who liked Star Wars also liked Independence Day, and, you liked Star Wars
    • You may also like Independence Day
collaborative filtering vs content based filtering
Collaborative Filtering vs. Content-based Filtering
  • Basic filtering question: Will user U like item X?
  • Two different ways of answering it
    • Look at what U likes
    • Look at who likes X
  • Can be combined

=> characterize X => content-based filtering

=> characterize U => collaborative filtering

rating based vs preference based
Rating-based vs. Preference-based
  • Rating-based: User’s preferences are encoded using numerical ratings on items
    • Complete ordering
    • Absolute values can be meaningful
    • But, values must be normalized to combine
  • Preferences: User’s preferences are represented by partial ordering of items
    • Partial ordering
    • Easier to exploit implicit preferences
a formal framework for rating
A Formal Framework for Rating

Objects: O

o1 o2 … oj … on

3 1.5 …. … 2

2

1

3

Users: U

Xij=f(ui,oj)=?

u1

u2

ui

...

um

The task

  • Assume known f values for some (u,o)’s
  • Predict f values for other (u,o)’s
  • Essentially function approximation, like other learning problems

?

Unknown function

f: U x O R

where are the intuitions
Where are the intuitions?
  • Similar users have similar preferences
    • If u  u’, then for all o’s, f(u,o)  f(u’,o)
  • Similar objects have similar user preferences
    • If o  o’, then for all u’s, f(u,o)  f(u,o’)
  • In general, f is “locally constant”
    • If u  u’ and o  o’, then f(u,o)  f(u’,o’)
    • “Local smoothness” makes it possible to predict unknown values by interpolation or extrapolation
  • What does “local” mean?
two groups of approaches
Two Groups of Approaches
  • Memory-based approaches
    • f(u,o) = g(u)(o)  g(u’)(o) if u  u’
    • Find “neighbors” of u and combine g(u’)(o)’s
  • Model-based approaches
    • Assume structures/model: object cluster, user cluster, f’ defined on clusters
    • f(u,o) = f’(cu, co)
    • Estimation & Probabilistic inference
memory based approaches breese et al 98
Memory-based Approaches (Breese et al. 98)
  • General ideas:
    • Xij: rating of object j by user i
    • ni: average rating of all objects by user i
    • Normalized ratings: Vij = Xij - ni
    • Memory-based prediction
  • Specific approaches differ in w(a,i) -- the distance/similarity between user a and i
user similarity measures
User Similarity Measures
  • Pearson correlation coefficient (sum over commonly rated items)
  • Cosine measure
  • Many other possibilities!
improving user similarity measures breese et al 98
Improving User Similarity Measures (Breese et al. 98)
  • Dealing with missing values: default ratings
  • Inverse User Frequency (IUF): similar to IDF
  • Case Amplification: use w(a,I)p, e.g., p=2.5
model based approaches breese et al 98
Model-based Approaches (Breese et al. 98)
  • General ideas
    • Assume that data/ratings are explained by a probabilistic model with parameter 
    • Estimate/learn model parameter  based on data
    • Predict unknown rating using E [xk+1 | x1, …, xk], which is computed using the estimated model
  • Specific methods differ in the model used and how the model is estimated
probabilistic clustering
Probabilistic Clustering
  • Clustering users based on their ratings
    • Assume ratings are observations of a multinomial mixture model with parameters p(C), p(xi|C)
    • Model estimated using standard EM
  • Predict ratings using E[xk+1 | x1, …, xk]
bayesian network
Bayesian Network
  • Use BN to capture object/item dependency
    • Each item/object is a node
    • (Dependency) structure is learned from all data
    • Model parameters: p(xk+1 |pa(xk+1)) where pa(xk+1) is the parents/predictors of xk+1 (represented as a decision tree)
  • Predict ratings using E[xk+1 | x1, …, xk]
three way aspect model popescul et al 2001
Three-way Aspect Model(Popescul et al. 2001)
  • CF + content-based
  • Generative model
  • (u,d,w) as observations
  • z as hidden variable
  • Standard EM
  • Essentially clustering the joint data
  • Evaluation on ResearchIndex data
  • Found it’s better to treat (u,w) as observations
evaluation criteria breese et al 98
Evaluation Criteria (Breese et al. 98)
  • Rating accuracy
    • Average absolute deviation
    • Pa = set of items predicted
  • Ranking accuracy
    • Expected utility
    • Exponentially decaying viewing probabillity
    •  ( halflife )= the rank where the viewing probability =0.5
    • d = neutral rating
results
Results

- BN & CR+ are generally better

than VSIM & BC

- BN is best with more training data

- VSIM is better with little training data

- Inverse User Freq. Is effective

- Case amplification is mostly effective

summary of rating based methods
Summary of Rating-based Methods
  • Effectiveness
    • Both memory-based and model-based methods can be effective
    • The correlation method appears to be robust
    • Bayesian network works well with plenty of training data, but not very well with little training data
    • The cosine similarity method works well with little training data
summary of rating based methods cont
Summary of Rating-based Methods (cont.)
  • Efficiency
    • Memory based methods are slower than model-based methods in predicting
    • Learning can be extremely slow for model-based methods
preference based methods cohen et al 99 freund et al 98
Preference-based Methods(Cohen et al. 99, Freund et al. 98)
  • Motivation
    • Explicit ratings are not always available, but implicit orderings/preferences might be available
    • Only relative ratings are meaningful, even if when ratings are available
    • Combining preferences has other applications, e.g.,
      • Merging results from different search engines
a formal model of preferences
A Formal Model of Preferences
  • Instances: O={o1,…, on}
  • Ranking function: R: (U x) O x O  [0,1]
    • R(u,v)=1 means u is strongly preferred to v
    • R(u,v)=0 means v is strongly preferred to u
    • R(u,v)=0.5 means no preference
  • Feedback: F = {(u,v)}, u is preferred to v
  • Minimize Loss:

Hypothesis space

the hypothesis space h
The Hypothesis Space H
  • Without constraints on H, the loss is minimized by any R that agrees with F
  • Appropriate constraints for collaborative filtering
  • Compare this with
the hedge algorithm for combining preferences
The Hedge Algorithm for Combining Preferences
  • Iterative updating of w1, w2, …, wn
  • Initialization: wi is uniform
  • Updating:   [0,1]
  • L=0 => weight stays
  • L is large => weight is decreased
some theoretical results
Some Theoretical Results
  • The cumulative loss of Ra will not be much worse than that of the best ranking expert/feature
  • Preferences Ra => ordering  => R L(R,F) <= DISAGREE(,Ra)/|F| + L(Ra,F)
  • Need to find  that minimizes disagreement
  • General case: NP-complete
a greedy ordering algorithm
A Greedy Ordering Algorithm
  • Use weighted graph to represent preferences R
  • For each node, compute the potential value, I.e., outgoing_weights - ingoing_weights
  • Rank the node with the highest potential value above all others
  • Remove this node and its edges, repeat
  • At least half of the optimal agreement is guaranteed
improvement
Improvement
  • Identify all the strongly connected components
  • Rank the components consistently with the edges between them
  • Rank the nodes within a component using the basic greedy algorithm
evaluation of ordering algorithms
Evaluation of Ordering Algorithms
  • Measure: “weight coverage”
  • Datasets = randomly generated small graphs
  • Observations
    • The basic greedy algorithm works better than a random permutation baseline
    • Improved version is generally better, but the improvement is insignificant for large graphs
metasearch experiments
Metasearch Experiments
  • Task: Known item search
    • Search for a ML researchers’ homepage
    • Search for a university homepage
  • Search expert = variant of query
  • Learn to merge results of all search experts
  • Feedback
    • Complete : known item preferred to all others
    • Click data : known item preferred to all above it
  • Leave-one-out testing
metasearch results
Metasearch Results
  • Measures: compare combined preferences with individual ranking function
    • sign test: to see which system tends to rank the known relevant article higher.
    • #queries with the known relevant item ranked above k.
    • average rank of the known relevant item
  • Learned system better than individual expert by all measure (not surprising, why?)
direct learning of an ordering function
Direct Learning of an Ordering Function
  • Each expert is treated as a ranking feature fi: O  R U {0} (allow partial ranking)
  • Given preference feedback : X x X R
  • Goal: to learn H that minimizes the loss
  • D (x0,x1): a distribution over X x X(actually a uniform dist. over pairs with feedback order) D (x0,x1) = c max{0, (x0,x1) }
the rankboost algorithm
The RankBoost Algorithm
  • Iterative updating of D(x0,x1)
  • Initialization: D1= D
  • For t=1,…,T:
    • Train weak learner using Dt
    • Get weak hypothesis ht: X R
    • Choose t >0
    • Update
  • Final hypothesis:
how to choose t and design h t
How to Choose t and Design ht ?
  • Bound on the ranking loss
  • Thus, we should choose t that minimizes the bound
  • Three approaches:
    • Numerical search
    • Special case: h is either 0 or 1
    • Approximation of Z, then find analytic solution
efficient rankboost for bipartite feedback
Efficient RankBoost for Bipartite Feedback

X0

Bipartite feedback:

Essentially binary classification

X1

Complexity at each round: O(|X0||X1|)  O(|X0|+|X1|)

evaluation of rankboost
Evaluation of RankBoost
  • Meta-search: Same as in (Cohen et al 99)
  • Perfect feedback
  • 4-fold cross validation
eachmovie evaluation
EachMovie Evaluation

# users

#movies/user

#feedback movies

summary
Summary
  • CF is “easy”
    • The user’s expectation is low
    • Any recommendation is better than none
    • Making it practically useful
  • CF is “hard”
    • Data sparseness
    • Scalability
    • Domain-dependent
summary cont
Summary (cont.)
  • CF as a Learning Task
    • Rating-based formulation
      • Learn f: U x O -> R
      • Algorithms
        • Instance-based/memory-based (k-nearest neighbors)
        • Model-based (probabilistic clustering)
    • Preference-based formulation
      • Learn PREF: U x O x O -> R
      • Algorithms
        • General preference combination (Hedge), greedy ordering
        • Efficient restricted preference combination (RankBoost)
summary cont44
Summary (cont.)
  • Evaluation
    • Rating-based methods
      • Simple methods seem to be reasonably effective
      • Advantage of sophisticated methods seems to be limited
    • Preference-based methods
      • More effective than rating-based methods according to one evaluation
      • Evaluation on meta-search is weak
research directions
Research Directions
  • Exploiting complete information
    • CF + content-based filtering + domain knowledge + user model …
  • More “localized” kernels for instance-based methods
    • Predicting movies need different “neighbor users” than predicting books
    • Suggesting using items similar to the target item as features to find neighbors
research directions cont
Research Directions (cont.)
  • Modeling time
    • There might be sequential patterns on the items a user purchased (e.g., bread machine -> bread machine mix)
  • Probabilistic model of preferences
    • Making preference function a probability function, e.g, P(A>B|U)
    • Clustering items and users
    • Minimizing preference disagreements
references
References
  • Cohen, W.W., Schapire, R.E., and Singer, Y. (1999) "Learning to Order Things", Journal of AI Research, Volume 10, pages 243-270.
  • Freund, Y., Iyer, R.,Schapire, R.E., & Singer, Y. (1999). An efficient boosting algorithm for combining preferences. Machine Learning Journal. 1999.
  • Breese, J. S., Heckerman, D., and Kadie, C. (1998). Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in Articial Intelligence, pp. 43-52.
  • Alexandrin Popescul and Lyle H. Ungar, Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments, UAI 2001.
  • N. Good, J.B. Schafer, J. Konstan, A. Borchers, B. Sarwar, J. Herlocker, and J. Riedl. "Combining Collaborative Filtering with Personal Agents for Better Recommendations." Proceedings AAAI-99. pp 439-446. 1999.