280 likes | 353 Views
Explore the shift from recommending single items to recommending packages, impacting applications like trip planning and social media. Learn about composite recommendation systems, related works, algorithms, experiments, and conclusions.
E N D
Breaking out of the Box of Recommendations: From Items to Packages M.Xie1, L. Lakshmanan1, P. Wood2 1Univ. of British Columbia 2Univ. Of London RecSys ‘10 2011. 01. 14. Summarized and Presented by Sang-il Song, IDS Lab., Seoul National University
Introduction • Classical Recommendation System provide recommendations consisting of single item • Several Applications can benefit from a system capable of recommending package of items • Trip planning • Recommendation for tweeters to follow • There may be a notion of compatibility among items in a set, modeled in the form of constraints • No more than 3 museums in a package • The total distance covered in visiting all POIs in package should be ≤ 10km
Contents • Composite Recommendation • Related Works • System Architectures • Problem Statements • 0/1 knapsack problem • Algorithms • InsOpt-CR • Greedy-CR • Experiments • Conclusions / Discussion
Composite Recommendation • Each item has • A Value (rating or score) • A Cost • A maximum total cost (budget) • Find a set of items with the highest total value • Assumptions • Items can be accessed by non-increasing order • There are Information sources providing the cost associated with each item • The number of items is very large, and access to these ratings can be relatively expensive
Related Works • CARD (RecSys ’08), FlexRecs (SIGMOD ‘09) • Comprehensive frameworks • User can specify their recommendation preferences using relational query languages extended with additional features or operators • A. Angel et al(EDBT ‘09) • Finding packages of entities • CourseRank (RecSys ’09) • Provides a course recommendation to students • Based on the ratings given to course by past students and subject to the constraints of degree requirements
System Architecture Figure 1. System Architecture • External Cost Source • Provides the cost of a given item • Can be a local database or a web service • Compatibility Checker • Checks whether a package satisfies compatibility constraints
Top-k Composite Recommendation • Find the top-k packages P1,…,Pk such that • Each Pi is feasible (the total cost of Pi <= Budget B) • All packages P1,…,Pk have the k-highest value. • v(P)<v(Pi) for all feasible packages P ∉ {P1,..,Pk} • Top-1 composite recommendation problem (CompRec) • A variation of 0/1 knapsack problem • Items can be accessed by non-increasing order • Background information can be a histogram collected from the external cost source or something as simple as a minimum item cost cmin
0/1 Knapsack Problem Maximize Subject to NP-Complete
Notations if o.w. • S = {t1,…,tn} : set of items • SSi,v : subset of {t1,…,tn} whose total value is exactly v and whose total cost is minimized • i ∈ {1, …, n} • v ∈ {1, …, nv(t1)} • C(i,v) : the cost of SSi,v • It can be calculated by the following recursive function
Algorithm 1: MaxValBound The value V* returned by MaxValBound is an upperbound on value of optimization solution (Lemma 1)
Algorithm 2: InsOpt-CR One item is retrieved from the source at each iteration of the algorithm (line 3) The pseudo-polynomial algorithm to find an optimal solution (line 5) If v(R0) ≥ ½ V*, the algorithm terminates (line 7-8) V* ≥ v(OPT), v(R0) ≥ ½V* ⇒ v(R0) ≥ ½ v(OPT)
Example • Cmin = 0.5, vmin = 1 • After accessing the first 101 items, S = {t1,…,t101} • R0 = {t1} ⋃ {t3,…,t101} • V(R0) = 200 ≥ ½ *398 = ½ * V* • Example • Budget B = 199 • A value and cost of item is as follows:
Instance Optimality • This means that any other 2-approximation algorithm, that can only access items in non-increasing order of their value, must access at least as many items as our algorithm • Definition • 𝓐 be a class of algorithms, and let 𝓘 be a class of problem instance • Given a non-negative cost measure cost (𝓐, 𝓘) of runningalgorithm 𝓐 over 𝓘 • An Algorithm Ais instance optimal over 𝓐 and 𝓘 • If every A’∈𝓐 and every I∈𝓘, • Cost(A,I) ≤ c * Cost(A’,I) + c’ for constants c and c’ • InsOpt-CR is an instance optimal over A an I with an optimality ratio of one
Greedy Algorithms • Greedy-CR is not instance optimal Instance Optimal Algorithms rely on an exact algorithm for the knapsack problem which may lead to high computational cost
Example • Cmin = 0.5, vmin = 1 • After accessing the first 101 items, S = {t1,…,t101} • RG = {t1} • V(RG) = 101 < ½ *398 = ½ * V* • Greedy-CR will continue accessing new items and it accesses another 98 items before it stops • Example • Budget B = 199 • A value and cost of item is as follows:
Top-k Composite Recommendation • Extends the top-1 Composite Recommendation • Apply Lawler’s procedure to InsOpt-CR • Lawler’s procedure • General technique for top-k answers to an optimization problem • Step 1. compute optimal solution x = <x1, …., xn> • Step 2. fix the values of x1, …, xs Then create (n-s) problems by fixing the remaining variable as follows: (1) xs+1 = 1 – xs+1(k) (2) xs+1 = xs+1(k), xs+2 = 1 – xs+2(k) … (n-s) xs+1 = xs+1(k), xs+2 = xs+2(k), …. , xn = 1 – xn(k)
Lawler’s procedure Optimal Solution fix Problem1 Problem2 Problem(n-s) • Computational Complexity: O(knc(n)) • c(n): the cost of computing single optimization problem
Boolean Compatibility Constraints • If the package fails the compatibility check, discard it and search for the next candidate package • Modified InsOpt-CR-Topk algorithm is still instance optimal
Experiments • The goal of experiment • Evaluating the relative quality of Inst-Opt-CR and Greedy-CR compared to the optimal algorithm • Evaluating the relative efficiency of the algorithms with respect to the number of items accessed and the actual run time • Datasets • MovieLens (ratings for movies) • Cost: running time of a movies • TripAdvisor (ratings for POIs) • Cost: number of reviews • The more popular a POI is, the more likely it is to be crowded or the more likely it is for the tickets to be expensive • Synthetic Dataset (correlated and uncorrelated)
Quality of Recommendation Packages Table 1. Quality Comparison for Different Composite Recommendation Algorithms Approximation algorithms do indeed return top-k composite packages whose value is guaranteed to be a 2-approximation of the optimal Approximation algorithms often recommend packages with high average value.
Normalized Discounted Cumulative Gain (NDCG) • Measure of effectiveness of a web search engine algorithms • Using graded relevance scale of documents • Assumption • Highly relevant documents are more useful when appearing earlier in a search engine result list (have higher ranks) • Highly relevant documents are more useful than marginally relevant documents, which are in turn more useful than irrelevant documents.
NDCG Example Results 1 Results 2 optimal D3,D5 are missing! D1 is missing! NDCG = 1.11 NDCG = 1.13 Result1 is closer to optimal
Quality of Recommended Packages Figure 2. NDCG Score for Top-k Packages The greedy algorithm can achieve a very similar overall top-k package quality compared to the instance optimal algorithm Both approximation algorithms have a very small NDCG score
Efficiency Study Figure 3. (a)-(d) Running Time for Different Datasets; (e)-(h) Access Cost for Different Datasets Greedy-CR-Topk has excellent performance in terms of both running time access cost except correlated synthetic dataset
Conclusions • Recommending packages consisting of sets of items • Generating top-k packages • Compatible • Under a cost budget • Two 2-approximation algorithms • InsOpt-CR-Topk (instatnce optimal) • Greedy-CR-Topk (faster) • Experimental shows that two proposed algorithms are • High quality packages • Fast and Practical
Discussion • Contribution • Composite Recommendation Modeling: budget • Proposing Approximation Algorithms with Proves • Good Quality and Fast • Issues • Is their cost model useful in practical? • The cost model is too simple and ideal • Proposed Algorithms seem to be a variation of knapsack problem solution • Choosing the cost in experiment is something weird • No comparison with other algorithms • Baseline: worst case