Delayed Feedback in Recommender Systems

推荐系统中的延迟反馈 杨嘉祺

推荐系统简介 • 几个基本概念 • 点击率(Click Through Rate, CTR)：广告或商品展示给用户后用户点击的概率。 • 转化(Conversion)：用户在点击后进行了特定行为，例如购买商品或者游戏内氪金。 • 转化率(Conversion Rate, CVR)：用户点击后转化的概率。 Jia-Qi Yang @ https://lamda.thyrixyang.com/ 基于深度学习的推荐系统现状

推荐系统简介 • 早期的广告是对用户无差别的，例如广播；或者与内容相关，例如插播的电视广告。 • 互联网的发展使得针对用户的个性化广告成为可能。 Jia-Qi Yang @ https://lamda.thyrixyang.com/ 展示广告 (Display Advertising)

推荐系统简介 广告主：出价用户：反馈行为广告平台：将广告展示给合适的用户 Jia-Qi Yang @ https://lamda.thyrixyang.com/ 展示广告 (Display Advertising)

推荐系统简介 • 广告主可以选择多种出价模式 • 每次展示费用(Paying per impression,CPM) • 每次点击费用(Cost-per-click, CPC) • 每次转化费用(Cost-per-conversion, CPA) • 不同的出价模式对应了不同的需求 • 品牌广告可能会使用CPM计费，增加品牌知名度 • 商品广告可以使用CPA计费，卖出去了才付费 • 网站可以采用CPC计费来增加流量 Jia-Qi Yang @ https://lamda.thyrixyang.com/ 展示广告 (Display Advertising)

推荐系统简介 • 广告平台利润优化 • 广告平台的目标是最大化自身利润，以每次转化计费(CPA)为例，广告主获得的利润是转化数量*出价。 • 广告主的出价一般由一个广告需求平台决定，所以广告平台需要优化转化数量。 • 可以认为将一个广告a展示给一个用户u所能产生的期望利润是CTR(a, u)*CVR(a, u)*出价(a)。 • 根据上述利润公式，针对用户u，我们只需要把期望利润最高的广告a推送给他。这样我们就可以将个性化推送广告的任务转化为预测问题：预测CTR和CVR。 Jia-Qi Yang @ https://lamda.thyrixyang.com/ 展示广告 (Display Advertising)

推荐系统简介 • 电商平台主要的入口是基于关键词的搜索 • 需要考虑的因素多了一个关键词k • 优化目标变为总销售额 • 在用户u搜索关键词k时推荐商品g的期望销售额=CTR(u, g, k)*CVR(u, g, k)*商品价格。 • 按照期望销售额进行排序是一个简单有效的策略。 • 综上，搜索推荐业务与展示广告业务中的核心问题可以转化为CTR和CVR的预测，而这正是深度学习所擅长的。 Jia-Qi Yang @ https://lamda.thyrixyang.com/ 搜索推荐

推荐系统简介 • 目前深度学习已经成为推荐系统中的主流方法 • 框架化程度高：增加特征、修改模型很容易 • 训练效率高：容易进行大规模分布式训练 • 高效的在线更新能力：新数据以mini-batch直接送给模型进行训练 • 多目标融合训练：例如同时考虑CTR和CVR任务，利用加购信息等 • CVR预估是展示广告和搜索推荐中的核心任务 • 在展示广告中，我们可以认为CVR(a, u)是一个神经网络，其输入为广告属性a以及用户属性u，输出为点击率 Jia-Qi Yang @ https://lamda.thyrixyang.com/ 推荐系统的深度化

推荐系统简介 • 推荐系统中的数据分布处于变化中 • 商品属性：价格、描述、统计量 • 用户属性：浏览历史、购物历史、点击历史 • 大促例如双11会产生巨大的分布变化 • 陈旧的数据对模型性能有负面作用 • 实验证明一个模型如果不更新，则性能会随时间推移逐渐下降 • 如果使用全部数据训练一个新模型，性能会比只使用部分最近的数据要差 • 对模型进行在线流式更新对提升性能有重要意义 Jia-Qi Yang @ https://lamda.thyrixyang.com/ 推荐系统的时效性和在线更新问题

推荐系统简介 • CTR预估和CVR预估的区别 • CVR预估中正例比例比CTR低一个数量级 • CTR预估中几乎不存在延迟反馈：如果看到广告然后划走了，几乎不可能再划回来然后点击，而浏览广告的时间一般也不会超过一分钟 • CVR预估中延迟反馈是常态：用户看到展示的商品后一般会加入购物车，此时点击行为已经发生，而对应的购买行为则往往在短则数分钟，长则数天甚至数月后。 • 延迟反馈是CVR预估中的一个重要问题 Jia-Qi Yang @ https://lamda.thyrixyang.com/ CVR预估中的延迟反馈问题

推荐系统简介 Jia-Qi Yang @ https://lamda.thyrixyang.com/ CVR预估中的延迟反馈问题

优化延迟和转化率的联合概率 • Expected value of an impression Price CVR CTR • Modeling Delayed Feedback Delay as an exponential distribution CVR Model Jia-Qi Yang @ https://lamda.thyrixyang.com/ Delayed Feedback Model (DFM)

优化延迟和转化率的联合概率 • Optimizing DFM with maximum likelihood Negatives Converted, but not observed Positive likelihood Observed conversion Negative likelihood Jia-Qi Yang @ https://lamda.thyrixyang.com/ Delayed Feedback Model (DFM)

优化延迟和转化率的联合概率 • Only use f(x) during inference Delay as an exponential distribution CVR Model The delay distribution is only used to construct unified likelihood. However, a better likelihood of DFM may not lead to a better CVR model. It’s unclear how to use new conversion labels in DFM framework. Jia-Qi Yang @ https://lamda.thyrixyang.com/ Delayed Feedback Model (DFM)

转化后插入正例修正偏差 • Fake negatives harm performance until correction. Performance drop Label corrected Conversion Insert positive Click Insert negative Jia-Qi Yang @ https://lamda.thyrixyang.com/ Fake Negative Weighted (FNW)

延迟采样+延迟正例修正 • Revisiting streaming sampling and sample types Jia-Qi Yang @ https://lamda.thyrixyang.com/ Elapsed-Time Sampling Delayed Feedback Model (ES-DFM)

延迟采样+延迟正例修正 • Elapsed time of negative samples have valuable information ! • The probability of conversion will be lower if elapsed time is larger. • Elapsed time has different meaning to different items: • Cheap items usually convert with low delay. • Expensive items usually take a long delay. • Decoration takes longer than daily necessities. • How to utilize elapsed time ? Jia-Qi Yang @ https://lamda.thyrixyang.com/ Elapsed-Time Sampling Delayed Feedback Model (ES-DFM)

延迟采样+延迟正例修正 • We do not use negative sample immediately (FNW), instead, we introduce an elapsed timewhich is defined according to item type. • We insert a delayed positive sample for each conversion that has not been observed by its elapsed time. • There is a delayed feedback trade-off that can be controlled smoothly by elapsed time: • The larger the elapsed time is, the more accurate the labels are. However, the training data will be more stale. • When we use a smaller elapsed time, we can utilize more fresh data. However, the labels will be less accurate. Jia-Qi Yang @ https://lamda.thyrixyang.com/ Elapsed-Time Sampling Delayed Feedback Model (ES-DFM)

延迟采样+延迟正例修正 Jia-Qi Yang @ https://lamda.thyrixyang.com/ Elapsed-Time Sampling Delayed Feedback Model (ES-DFM)

延迟采样+延迟正例修正 After the delayed positives are added: Jia-Qi Yang @ https://lamda.thyrixyang.com/ Elapsed-Time Sampling Delayed Feedback Model (ES-DFM)

延迟采样+延迟正例修正 By utilizing importance sampling, we can optimize the loss function under p(x, y) using samples from a different distribution q(x, y) The importance weights of ES-DFM. The weights have a clear interpretation, where the is the delayed positive probability and is the real negative probability as defined in equation 15 and 16. Jia-Qi Yang @ https://lamda.thyrixyang.com/ Elapsed-Time Sampling Delayed Feedback Model (ES-DFM)

延迟采样+延迟正例修正 We can prove that the predicted CVR f(x) converges to: If is perfectly correct, we have = , then f(x) = p(y = 1), thus leading to no bias. However, in practice, is learned through historical data, bias always exists. The bias is also related to p(y = 1|x). if the absolute value of conversion rate is large, the bias introduced by may be larger. The sampling distribution p(e|x) can be used to control the bias. If e is long, p(h > e) will be smaller. Thus p(y = 0)+p(y = 1)p(h > e) will be close to p(y = 0|x). will be more close to 1 since there are few fake negatives. Thus (x) is more close to p(y = 0|x). Jia-Qi Yang @ https://lamda.thyrixyang.com/ Elapsed-Time Sampling Delayed Feedback Model (ES-DFM)

延迟采样+延迟正例修正 • Performance of ES-DFM Jia-Qi Yang @ https://lamda.thyrixyang.com/ Elapsed-Time Sampling Delayed Feedback Model (ES-DFM)

延迟采样+延迟正例修正 • ES-DFM is more robust to label noise We hypothesis that a method dealing with delayed feedback problem should not only correct incorrect labels, but also reduce the negative effect of the incorrect labels before they can be corrected or the correction fails. Figure 3 shows that the ES-DFM is more robust to label disturbance than FNW and FSIW, and the performance gap increases when the disturbance is larger. Jia-Qi Yang @ https://lamda.thyrixyang.com/ Elapsed-Time Sampling Delayed Feedback Model (ES-DFM)

延迟采样+延迟正例修正 • Verifying the delayed feedback trade-off The best c on the Criteo dataset is around 15 minutes, where about 35% conversions can be observed. Moreover, larger or smaller c will reduce the performance. The performance decreases slowly on smaller c, which indicates that the bias introduced by the importance weighting model is small. The performance decreases faster on larger c, which indicates that the data freshness matters more when c increase, and a c larger than 1 hour will significantly harm the performance. Jia-Qi Yang @ https://lamda.thyrixyang.com/ Elapsed-Time Sampling Delayed Feedback Model (ES-DFM)

延迟采样+延迟正例修正 • Online performance: Taobao Search Jia-Qi Yang @ https://lamda.thyrixyang.com/ Elapsed-Time Sampling Delayed Feedback Model (ES-DFM)

考虑后点击行为的延迟反馈模型 • Get more information before conversion? • Besides the conversion labels, many events related with conversion exist in real-world recommender systems. • For example, after clicking through an item, a user may decide to add this item to a shopping cart. • Statistical data reveals that such post-click behaviors have a strong relationship with conversions: About 12% of items in a shopping cart will finally be bought, while the proportion is less than 2% without entering a shopping cart. • How to utilize post-click actions to improve CVR prediction? Jia-Qi Yang @ https://lamda.thyrixyang.com/ Post-click actions

考虑后点击行为的延迟反馈模型 CVR at time post-action distribution, depends on sample features x, the conversion label y, and the revealing time δ. The post-action distribution does not depend on time t. Revealing time distribution features x distribution at time t. Jia-Qi Yang @ https://lamda.thyrixyang.com/ Generalized Delayed Feedback Model (GDFM)

考虑后点击行为的延迟反馈模型 • Data distribution is varying along with time t, which is a fundamental character of the delayed feedback problem. • Existing analysis of importance sampling based method did not consider varying distribution. • GDFM explicitly models the revealing time distribution (elapsed time in ES-DFM) in the data distribution. Jia-Qi Yang @ https://lamda.thyrixyang.com/ Generalized Delayed Feedback Model (GDFM)

考虑后点击行为的延迟反馈模型 Jia-Qi Yang @ https://lamda.thyrixyang.com/ Train GDFM with stream data

考虑后点击行为的延迟反馈模型 • Since in GDFM we are using post-actions from instead of , we have a chance to do better. But will optimizing Eq. (4) improve the performance of q(y|x)? • Lemma. 3.1 highlights the benefits of utilizing post-actions: if the relationship between a post-action and the target is predictable (we can train a model q(a|x, y) to approximate p(a|x, y)) and informative (rank(M) = n) we can recover the target distribution even without the ground-truth label y. Narrowing the delayed feedback gap

考虑后点击行为的延迟反馈模型 • We propose to use conditional entropy to measure information carried by actions. Empirically, we found that conditional entropy is also related to the sample complexity of estimating p(y|x) via p(a|x). We propose following information weight: Measuring information carried by actions

考虑后点击行为的延迟反馈模型 • Even if we have unlimited samples from we are only able to recover instead of , which corresponds to a temporal gap. • So we propose the following temporal weight • Overall, we introduce a weight on loss Eq. (4) as follows: Measuring temporal gap

考虑后点击行为的延迟反馈模型 • To reduce variance and stabilize training, we introduce a regularizer loss Eq. (9) that constraints update step during training. • The overall loss function with revealing time is Reducing variance by delayed regularizer

考虑后点击行为的延迟反馈模型 Training algorithm of GDFM

考虑后点击行为的延迟反馈模型 Data analysis

考虑后点击行为的延迟反馈模型 Experimental performance

谢谢！ Jia-Qi Yang @ https://lamda.thyrixyang.com/

Delayed Feedback in Recommender Systems

Delayed Feedback in Recommender Systems

Presentation Transcript

Recommender systems

Recommender Systems

Recommender Systems

Recommender Systems

Recommender Systems

Recommender Systems

Recommender systems

Recommender Systems

Recommender Systems

Explanations in recommender systems

Recommender Systems

Recommender Systems

Recommender Systems

Recommender Systems

Recommender systems

Recommender Systems

Explanations in recommender systems

Recommender Systems

Recommender Systems