1 / 22

Top-N Recommendation Algorithm Based on Item-Graph

Top-N Recommendation Algorithm Based on Item-Graph. Allen, Zhenjiang LIN CSE, CUHK June 7, 2007. Outline. 1. Top-N Recommendation Problem 2. Top-N Recommendation Algorithm 3. Item-Graph Model and GCP-based Method Item-Graph Model

damita
Download Presentation

Top-N Recommendation Algorithm Based on Item-Graph

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Top-N Recommendation Algorithm Based on Item-Graph Allen, Zhenjiang LIN CSE, CUHK June 7, 2007

  2. Outline • 1. Top-N Recommendation Problem • 2. Top-N Recommendation Algorithm • 3. Item-Graph Model and GCP-based Method • Item-Graph Model • Generalized Conditional Probability(GCP)-based Recommendation Algorithm • 4. Preliminary Experimental Results • 5. Conclusion and Future Work

  3. Active User Basket 1. Top-N Recommendation Problem • The Top-N Recommendation Problem • Given the preference information of users, recommend a set of N items to a certain user that he might be interested in, based on the items he has selected. • E-commerce system example: Amazon. COM, customers vs products. User-Item matrix

  4. Active User Basket Recommendations Example: the Amazon.com

  5. 1. Top-N Recommendation Problem • Challenges in E-commerce Systems • Huge amounts of data: millions of users and/or items; • Real-time return the results set; • Limited new user’s preference information; • Volatile users’ preference information. • Contributions • Propose the Item-Graph model. • simple & incremental • to reflect the relationship among items • Develop the Generalized Conditional Probability-based top-N recommendation algorithm. • item-centric • based-on the Item-Graph model

  6. 2. Top-N Recommendation Algorithm • Two main paradigms • Content-based: recommend items based on the content (textual information) of items. • Fab system [Balabanovic97], Syskill & Webert system [Pazzani97]. • Collaborative Filtering (CF): recommend items by collecting taste information from other users. • Collaborative between users (link information). • More popular than content-based recommendation, since in many domains (such as music, restaurants) it is hard to extract useful features from items. • Tapestry system [Goldberg92], Video Recommender [Hill95], Ringo [Shardanand95], GroupLens [Konstan97], Jester system [Goldberg01],Amazon [Linden03].

  7. 2. Top-N Recommendation Algorithm • CF algorithms classified by strategy of using data • Memory-based:make recommendations based on the entire collection of references of the users. • No pre-computing is needed, suffer serious scalability problem. • E.g., Correlation-based [Resnick94], Cosine-based [Breese98]. • Model-based:use the collection of user preferences to learn a model, which is then used to make recommendations. • Building a model off-line, more scalable. • E.g., Cluster models [Ungar98], Bayesian network model [Breese98], Association Rule Mining approach [Lin00].

  8. 2. Top-N Recommendation Algorithm • CF algorithms classified by strategy of using objects • User-centric: look for similar (like-minded) users first and then make recommendation. • Similarity between users is relatively dynamic. • Pre-computing user neighborhood may lead to poor predictions. • Item-centric: look for similar (or related) items first and then make recommendation. • Similarity between items is relatively static. • Enables pre-computing of item-item similarity. • Therefore, more scalable. • The aim of our work • Model-based Item-centric CF top-N recommendation algorithm.

  9. 2. Top-N Recommendation Algorithm • Notations • Item set I = {I1, I2, …, Im}. • User set U = {U1, U2, …, Un}. • User-Item matrix D = (Dn,m). • Basket of the active user B I. • Similarity score of x and y: sim(x,y). • Formal definition of top-N recommendation problem • Given a user-item matrix D and a set of items B that have been purchased by the active user, identify an ordered set of items X such that |X| ≤ N, and X ∩B = 0.

  10. 2. Top-N Recommendation Algorithm • Two classical item-item similarity measures • Cosine-based (symmetric) sim(Ii, Ij) = cos(D*,i, D*,j) (1) • Conditional Probability(CP)-based (asymmetric) sim(Ii, Ij) = P(Ij | Ii) ≈Freq(Ii Ij) / Freq(Ii) (2) Freq(X): the number of customers who have purchased the item set X. • The ranking score for item x RS(x) = ∑ b∈B sim(b,x) (3)

  11. 1 2 a b c 3. Item-Graph Model & GCP-based Method • Intuitions behind the Item-Graph • The similarity between two items is proportional to the times of co-purchase of them. • The similarity of item-pairs is transmissible. • E.g., • Definition of the Item-Graph • Given a dataset D = (Dn,m), the Item-Graph is defined by a weighted & undirected graph G(V, E, W), where • V is the item set I. • An edge (x, y)∈E if and only if items x and y have been co-purchased. • The weight of edge (x, y) is defined by the number of co-purchase of items x and y.

  12. 1 2 2 3 a a b b c c (a,b,c) 1 3. Item-Graph Model & GCP-based Method • Updating the Item-Graph is easy • Adding new user’s preference information T into the graph needs O(|T|2) operations, including adding edges and/or increasing weight of edges. • E.g., • Potentially direct application of the Item-Graph • Clustering the items. • Measuring item-item similarity. • Measuring importance of items.

  13. 3. Item-Graph Model & GCP-based Method • Ideas in Generalized Conditional Probability-based method • According to the definition of top-N recommendation problem, for any x in I-B, we just need to compute the “basket-based” conditional probability P(x|B) = Freq(xB) / Freq(B). However, • Freq(xB) or Freq(B) may not exist, or • Freq(xB) or Freq(B) are too small to make much sense. • The CP-based method considers the sum of “1-item”-based conditional probabilities P(x|y) instead, where x∈I-B, y∈B. • However, the “multi-item”-based conditional probabilities may also contribute to the recommendation. • E.g., suppose the ranking scores of x and y computed by the CP-based method are equal, and we also know P(x|B)>P(y|B). Which one should be ranked higher, x or y?

  14. 3. Item-Graph Model & GCP-based Method • The Generalized Conditional Probability (GCP)-based recommendation algorithm • The ranking score of item x is defined by the sum of all possible “multi-item”-based conditional probabilities, that is, GCP(x|B) = ∑ S  B P(x|S) ≈∑ S B (Freq(xS) / Freq(S)). (4) • However, the number of subsets of B is 2|B|. • Use GCPd(x|B) instead (set d=2 in the following experiments) GCPd(x|B) = ∑ S B, |S|≤ d P(x|S). (5) • Freq(xS) and Freq(S) can be extracted from the Item-Graph approximately.

  15. 2 3 a b c 1 3. Item-Graph Model & GCP-based Method • Extracting Freq(A) from Item-Graph approximately • For an item set A, obtaining the exact Freq(A) may not be possible from the Item-Graph. • Extracting approximate Freq(A) from the Item-Graph instead. • Find out the complete sub-graph of A (denoted by CSG(A)) in the Item-Graph, running time O(|A|2). • Freq(A) ≈ minimal weight of edges in CSG(A). • E.g., • for A = {a,b}, Freq(A) ≈ 3. • for B = {a,b,c}, Freq(B) ≈ 1. • P(c|ab) ≈ Freq(abc) / Freq(ab) ≈ 1 / 3.

  16. 4. Preliminary Experimental Results • Dataset • The MovieLens(http://www.grouplens.org/data) • A web-based movies recommender system; • Contains multi-valued ratings that indicate how much each user liked a particular movie or not; • Each user has rated at least 20 movies. • We treat the ratings as an indication that the users have seen the movies (nonzero) or not (zero). Table 1: The characteristics of the MovieLens dataset 1Density: the percentage of nonzero entries in the user-item matrix.

  17. 4. Preliminary Experimental Results-1 • Evaluation Design • Split the dataset into a training and test set by • randomly selecting one rated movie of each user to be part of the test set, • use the remaining rated movies for training. • Cosine(COS)-based, CP-based, GCP-based methods, 10-runs average. • Evaluation Metrics • Hit-Rate (HR) HR = # of hits / n (6) • Average Reciprocal Hit-Rate (ARHR) ARHR = (∑i=1,h1/pi) / n (7) # of hits: the number of items in the test set that were also in the top-N lists. h is the number of hits that occurred at positions p1, p2, … , ph within the top-N lists (i.e., 1 ≤ pi ≤ N).

  18. 4. Preliminary Experimental Results-1 • Performance of Top-N Recommendation Algorithms HR (left):x-axis: top-N items, y-axis: hit-rate of all users. ARHR (right):x-axis: top-N items, y-axis: average reciprocal hit-rate of all users. (For the GCP-based method, set d = 2.)

  19. 4. Preliminary Experimental Results-2 • Testing the Parameter d in GCP Method • Testing the effect of d ( d = 1, 2, 3 ). • Evaluation: Online Shopping Simulation • Randomly selecting part of the user records to be the training set; • Use the remaining user records for training. • STEP 0: Constructing the item-graph based on the training set; • STEP 1: for each user in the training set • randomly moving one item out of the user’s basket and make recommendation based on the remaining items in the basket; • computing the order of this item in the recommendation list; • updating the item-graph. • STEP 2: Computing HR and ARHR metrics.

  20. 4. Preliminary Experimental Results-2 • Performance of Top-N Recommendation Algorithms HR (left):x-axis: top-N items, y-axis: hit-rate of all users. ARHR (right):x-axis: top-N items, y-axis: average reciprocal hit-rate of all users.

  21. 5. Conclusion and Future Work • Conclusion • Top-N Recommendation Problem and item-centric Algorithms • Cosine-based, conditional probability-based • Item-Graph model • Visualizing the relationship among items. • Easy to update. • Generalized Conditional Probability-based top-N recommendation algorithm • Item-centric & based on the Item-Graph model • Future Work • Clustering items and measuring item-item similarities based on the Item-Graph model • Speeding up the GCP method.

  22. References • [Balabanovic97] M. Balabanovic and Y. Shoham. Fab: Content-based, Collaborative Recommendation.Commun. ACM, 40(3):66-72, 1997. • [Breese98] J. S.Breese, D. Heckerman, David and C. Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98),pages 43-52, San Francisco, 1998. • [Deshpande04]M. Deshpande and G. Karypis. Item-based Top-N Recommendation Algorithms.ACM Trans. Inf. Syst., 22(1):143-177, 2004. • [Lin00] W. Lin. Association Rule Mining for Collaborative Recommender Systems. Thesis submitted for theDegree of M.S. inComputer Science. • [Linden03] G. Linden, B. Smith and J. York.Amazon.com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Computing, 7(1):76-80, 2003. • [Resnick94] P.Resnick, N. Iacovou, M. Suchak, P. Bergstorm and J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews.Proc. Computer Supported Cooperative Work Conf., pages 175-186, 1994.

More Related