870 likes | 1.34k Views
Recommendation System. PengBo Dec 4, 2010. Book Recommendation. 拉蒙 - 卡哈尔 ( 西班牙 ) 1906 诺贝尔生理学或医学奖:“现代神经科学的主要代表人物和倡导者” “最重要的问题已经解决完了” “过度关注应用科学” “认为自己缺乏能力”. Outline Today. What: Recommendation System How: Collaborative Filtering (CF) Algorithm Evaluation on CF algorithms.
E N D
Recommendation System PengBo Dec 4, 2010
Book Recommendation • 拉蒙-卡哈尔(西班牙) • 1906诺贝尔生理学或医学奖:“现代神经科学的主要代表人物和倡导者” • “最重要的问题已经解决完了” • “过度关注应用科学” • “认为自己缺乏能力”
Outline Today • What: Recommendation System • How: • CollaborativeFiltering (CF) Algorithm • Evaluation on CF algorithms
检索 分类 The Problem 还有什么 更有效的手段?
This title is a textbook-style exposition on the topic, with its information organized very clearly into topics such as compression, indexing, and so forth. In addition to diagrams and example text transformations, the authors use "pseudo-code" to present algorithms in a language-independent manner wherever possible. They also supplement the reading with mg--their own implementation of the techniques. The mg C language source code is freely available on the Web.
Everyday Examples of Recommendation Systems… • Bestseller lists • Top 40 music lists • The “recent returns” shelf at the library • Many weblogs • “Read any good books lately?” • .... • Common insight: personal tastes arecorrelated: • If Marry and Bob both like X and Marry likes Y then Bob is more likely to like Y • especially (perhaps) if Bob knows Marry
Correlation Between two random variables • Mean • Standard variance • Pearson's correlation • indicating the degree of linear dependence between the variables
Rec System: Applications • Ecommerce • Product recommendations - amazon • Corporate Intranets • Recommendation, finding domain experts, … • Digital Libraries • Finding pages/books people will like • Medical Applications • Matching patients to doctors, clinical trials, … • Customer Relationship Management • Matching customer problems to internal experts
Recommendation Systems • 给出一个users和items集合 • Items 可以是documents, products, other users … • 向一个user推荐items,根据: • users和items的属性信息 • age, genre, price, … • 这个user以及其它user过去的behavior • Who has viewed/bought/liked what? • 来帮助人们 • makedecisions • maintain awareness
Recommender systems are software applications that aim to support users in their decision-making while interacting with large information spaces. • Recommender systems help overcome the information overload problem by exposing users to the most interesting items, and by offering novelty, surprise, and relevance.
The Web, they say, is leaving the era of search and entering one of discovery. What's the difference? Search is what you do when you're looking for something. Discoveryis when something wonderful that you didn't know existed, or didn't know how to ask for, finds you.
Ad Hoc Retrieval and Filtering • Ad hoc retrieval (特别检索: 文档集合保持不变) Q1 Q2 Collection “Fixed Size” Q3 Q4 Q5
Ad Hoc Retrieval and Filtering • Filtering(过滤: 用户需求不变) Docs Filtered for User 2 User 2 Profile User 1 Profile Docs for User 1 Documents Stream
Inputs - more detail • Explicit role/domain/content info: • content/attributes of documents • Document taxonomies • Role in an enterprise • Interest profiles • Past transactions/behavior info from users: • which docs viewed, browsing history • search(es) issued • which products purchased • pages bookmarked • explicit ratings (movies, books … ) Large space Extremely sparse
Links derived from similar attributes, explicit connections Links derived from similar attributes, similar content, explicit cross references (Ratings, purchases, page views, laundry lists, play lists) The Recommendation Space Users Items User-User Links Item-ItemLinks Observed preferences
Definitions • recommendation system • 为user提供对items的recommendation/ prediction/ opinion的系统 • Rule-based systems use manual rules to do this • An item similarity/clustering system • 使用item links • A classic collaborative filtering system • 使用links between users and items • Commonly one has hybrid systems • 使用前面all three kinds of links
Link types • User attributes-based Recommendation • Male, 18-35: Recommend The Matrix • Item attributes-based Content Similarity • You liked The Matrix: recommend The Matrix Reloaded • Collaborative Filtering • People with interests like yours also liked Forrest Gump
Example - behavior only Users Docs viewed U1 d1 d2 U2 d3 ? U1 viewed d1, d2, d3. U2 views d1, d2. Recommend d3 to U2.
Expert finding - simple example Recommend U1 to U2 as someone to talk to? U1 d1 d2 U2 d3
d1 U d2 V d5 W Simplest Algorithm: Naïve k Nearest Neighbors • U viewed d1, d2, d5. • 看还有谁viewed d1, d2 or d5. • 向U推荐:那些users里面viewed最“popular”的doc.
U d2 d5 Simple algorithm - shortcoming • 把所有其它users同等对待 • 实际上,通过过去的历史behavior数据可以发现,users与U相像的程度不同。 d1 怎样改进? 如何区分user对于U的重要度? V W
Matrix View item user • Users-Items Matrix • Aij = 1 if user i viewed item j, • = 0 otherwise. • 共同访问过的items# by pairs of users = ? AAt
item A user Voting Algorithm • AAt的行向量ri • jthentry is the # of items viewed by both user i and user j. • riA是一个向量 • kth entry gives a weighted vote count to item k • 按最高的vote count推荐items. user user ri
Voting Algorithm - implementation issues • 不直接使用maxtrix运算来实现 • use weight-propagation on compressed adjacency lists • 用日志log来维护 “user views doc”信息. • typically, log into database • update vote-propagating structures periodically. • 只保留ri中最大的若干个weights,提高效率 • only in fast structures, not in back-end database.
Different setting/algorithm • user i给出评分 • 一个实数rating Vikfor item k • 每个user i 都拥有一个ratings vector vi • 稀疏,有大量空值 • 计算每一对users i,j 之间的相关性correlation coefficient • measure of how much user pair i,j agrees: wij
Predict user i’s utility for item k • 与voting算法类似,WiV是一个向量 • Sum (over users j such that Vjk is non-zero) • ∑wijVjk • 按这个值为user i 推荐 item k.
Correlation Coefficient Wa,i • K-nearest neighbor • Cosine distance (from IR) • Pearson correlation coefficient (Resnick ’94, Grouplens):
Same algorithm, different scenario • Implicit (user views item) vs. Explicit (user assigns rating to item) • Boolean vs. real-valued utility • In practice, must convert user ratings on a form (say on a scale of 1-5) to real-valued utilities • Can be fairly complicated mapping • Likeminds function (Greening white paper) • Requires understanding user’s interpretation of form
Real data problems • User有各自的rating bias
User Nearest Neighbor Algorithm • vi,j= vote of user i on item j • Ii = items for which user i has voted • Mean vote for i is • User u,v similarity is • avoids overestimating who happen to have rated a few items identically
User Nearest Neighbor Algorithm • 选取user u的nearest neighbor 集合V,计算u对item j的vote如下 • How about Item Nearest Neighbor?
Nearest-Neighbor CF • Basic principle: utilize user’s vote history to predict future votes/recommendations • based on “nearest-neighbors” • A typical normalized prediction scheme: • goal: predict vote for item ‘j’ based on other users, weighted towards those with similar past votes as target user ‘a’
Challenges of Nearest-Neighbor CF • What is “the most optimal weight calculation” to use? • Requires fine tuning of weighting algorithm for the particular data set • What do we do when the target user has not voted enough to provide a reliable set of nearest-neighbors? • One approach: use default votes (popular items) to populate matrix on items neither the target user nor the nearest-neighbor have voted on • A different approach: model-based prediction using Dirichlet priors to smooth the votes • Other factors include relative vote counts for all items between users, thresholding, clustering (see Sarwar, 2000)
Summary of Advantages of Pure CF • No expensive and error-prone user attributes or item attributes • Incorporates qualityand taste • Want not just things that are similar, but things that are similar and good • Works on any rate-able item • One model applicable to many content domains • Users understand it • It’s rather like asking your friends’ opinions
Netflix Prize • NetFlix: on-line DVD-rental company • a collection of 100,000 titles and over 10 million subscribers. • They have over 55 million discs and ship 1.9 million a day, on average • a training data set of over 100 million ratings that over 480,000 users gave to nearly 18,000 movies • Submitted predictions are scored against the true grades in terms of root mean squared error (RMSE)
Netflix Prize • prize of $1,000,000 • A trivial algorithm got RMSE of 1.0540 • Netflix, Cinematch, got RMSE of 0.9514 on the quiz data, a 9.6% improvement • To WIN • 10% over Cinematch on the test set • a progress prize of $50,000 is granted every year for the best result so far • By June, 2007, over 20,000 teams had registered for the competition from over 150 countries. • On June 26, 2009 the team "BellKor's Pragmatic Chaos", a merger of teams "Bellkor in BigChaos" and "Pragmatic Theory", achieved a 10.05% improvement over Cinematch (an RMSE of 0.8558).
Measuring collaborative filtering • How good are the predictions? • How much of previous opinion do we need? • How do we motivate people to offer their opinions?
Measuring recommendations • Typically, machine learning methodology • Get a dataset of opinions; mask “half” the opinions • Train system with the other half, then validate on masked opinions • Studies with varying fractions half • Compare various algorithms (correlation metrics) <User, Item, Grade> <User, Item, Grade> <User, Item, Grade> <User, Item, Grade> 。。。 。。。 。。。
Common Prediction Accuracy Metric • Mean absolute error (MAE) • Root mean square error(RMSE)
McLaughlin & Herlocker 2004 • Argues that current well-known algorithms give poor user experience • Nearest neighbor algorithms are the most frequently cited and the most widely implemented CF algorithms, consistently are rated the top performing algorithms in a variety of publications • But many of their top recommendations are terrible • These algorithms perform poorly where it matters most in user recommendations
Characteristics of MAE • Characteristics of MAE • Assumes errors at all levels in the ranking have equal weight • Works well for measuring how accurately the algorithm predicts the rating of a randomly selected item. • Seems not appropriate for “Find Good Items” task • Limitations of the MAE metric have concealed the flaws of previous algorithms • it looks at all predictions not just top predictions Precision?
Precision of top k • Concealed because past evaluation mainly onoffline datasets not real users • Many un-rated item exist, but not participate the evaluation What’s this? test-data prediction Appear in recommendation list but not calculated in Precision
Improve the Precision Measure • Precision of top k has wrongly been done on top k rated movies. • Instead, treat not-rated as disliked (underestimate) • Captures that people pre-filter movies • Precision with non-rated items should be counted as non-relevant