slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
机器学习在互联网广告中的应用 PowerPoint Presentation
Download Presentation
机器学习在互联网广告中的应用

Loading in 2 Seconds...

play fullscreen
1 / 22

机器学习在互联网广告中的应用 - PowerPoint PPT Presentation


  • 138 Views
  • Uploaded on

机器学习在互联网广告中的应用. 庄宝童. Agenda. 介绍 机器学习应用 Common utility Advertiser Publisher user 总结. 为什么需要互联网广告?. 流量(用户)是互联网 公司的重要资产 互联网内容免费模式,需要流量变现来维持运营 广告收入占比: Google : 95% (2012 , http ://investor.google.com/financial/tables.html ) Facebook : 83% ( 2011 ) Baidu :? Alibaba :?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '机器学习在互联网广告中的应用' - amela-harrell


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
agenda
Agenda
  • 介绍
  • 机器学习应用
    • Common utility
    • Advertiser
    • Publisher
    • user
  • 总结
slide3
为什么需要互联网广告?
  • 流量(用户)是互联网公司的重要资产
  • 互联网内容免费模式,需要流量变现来维持运营
  • 广告收入占比:
    • Google :95% (2012,http://investor.google.com/financial/tables.html)
    • Facebook:83% (2011)
    • Baidu:?
    • Alibaba:?
  • 特点:效果量化可追踪,运营销售参与少,曝光成本低
  • 对互联网广告公司而言,是一种理想的“印钞机”商业模式(吴军,《浪潮之巅》)
slide4
我们需要什么样的广告?

Find the best match between a given userin a given contextand a suitable advertisement

-- Andrei Broder and Dr. Vanja 2011

slide5

Pick best ads

Ads

Ad Network

Page

User

Publisher

Response rates

(click, conversion,

ad-view)

Bids

conversion

Auction

Statistical

model

Select argmax f(bid, rate)

Advertisers

players in the ecosystem
Players in the ecosystem
  • Publisher’s utility:Revenue,user engagement
  • Advertiser ‘s utility:ROI
  • User’s utility:relevance
mechanism design
mechanism design
  • 合同定价( futures market),CPM 或 CPT 计价
  • 拍卖定价(spot market)
    • GFP
    • GSP
    • VCG
  • 计价方式
    • CPM (Cost per Mille-impressions): publisher 风险最小,如 yahoo,sina的品牌广告
    • CPC (Cost per Click) : publisher 和 advertiser 风险共担,googleadwords,百度凤巢等大部分属于此类
    • CPA (cost per Action):advertiser 风险最小,如淘宝客。
cpc ranking functions
CPC 的ranking functions
  • Bid ranking:bid
    • 源于goto.com (overture 前身,后被yahoo收购)
  • Revenue ranking:CTR * bid
    • Google 首创
    • 核心问题:CTR prediction
model
model

P(click | user, ad, context)

  • ad : creative, bid-terms, landing page, campaign, advertiser, format (text/image/video), size, etc.
  • user : cookie, demo, geo, behavioral, activity history
  • context : query, publisher, page-content, session, time
algorithms
algorithms
  • Logistic Regression + feature engineering (google, yahoo, baidu, facebook , etc)
  • Microsoft (BaysianProbit Regression)
  • Google : boosting http://users.soe.ucsc.edu/~niejiazhong/slides/chandra.pdf
  • Taobao (Mixture of Logistic Regression)
  • trends:big data + nonlinear/feature learning
challenges
challenges
  • Sparsity: use Natural hierarchies or Auto-generated hierarchies
  • Missing data
  • Bias:position,ad category,etc
  • Dynamical /seasonal effects
  • Spam/noisy data
features
features
  • Features:
    • Click feedback features (COEC)
    • Query features
    • Query-ad text matching features
  • Preprocess:
    • 离散化 分段
    • 特征交叉
    • 层次特征—处理稀疏性(variance bias trade-off)
    • 特征平滑,变换
training
training
  • 训练集
      • 正负样本分层采样 – imbalance training 问题
      • Instances:1B
      • Features:10B
  • 分布式训练
    • MPI (baidu, taobao)
    • map reduce (google)
evaluation
Evaluation
  • Offline evaluation
    • MSE, MAE
    • AUC
  • Online A/B test
    • 分层实验平台(google,Overlapping Experiment Infrastructure: More, Better, Faster Experimentation)
    • 正态/二项分布样本的假设检验
slide15
实践
  • 实时计算,性能问题
    • 简单有效的候选集选取
    • 精确计算
  • Online learning
explore exploit

Ad 2

Ad 1

Probability density

CTR

Explore/Exploit
  • 低 mean ,高方差的 ads 应该給予展示机会
  • E.g. Consider 2 ads (same bids)
    • Goal: Select most popular
    • CTR1 ~ (mean=.01,var=.1), CTR2~ (mean=.05,var~0)
slide17
E&E 常用算法
  • Upper confidence bound policy (UCB)
    • Mean + uncertainty-estimate
        • mean + k* sd(estimator)
  • Thompson sampling
    • 从 posterior 里随机采样,比较适合 Bayesian 类的算法
  • 问题
    • 广告集合巨大,explore 代价过大
    • 跟传统 Multi-Arms bandits 问题不太一样,广告集合是动态的,且每次会选择多个
advertiser s perspective
Advertiser’s perspective
  • Keyword selection
  • Bid optimization
  • Smart pricing
  • Anti fraud
  • Impression forecasting: time series
  • Smooth delivery: allocation algorithms
cvr prediction
CVR prediction
  • 用途:
    • Smart pricing :外部流量千差万别,广告主没有精力也能力做分媒体的出价,需要按照点击价值进行智能出价 (Google, smart pricing grows the pie),以保证广告主的ROI
    • DSP: real time bidding
    • CPA 模式的rank function:ctr * cvr * bid
  • 做法:与CTR预估问题类似,但更困难
    • 转化数据获取困难,且更为稀疏
    • 不同广告主的转化定义不一致
user s perspective
User’s perspective
  • User fatigue
  • User privacy
  • Behavioral targeting / retargeting
  • Query intent
  • Low quality ads detection(google, detecting adversarial advertisements in the wild)
publisher s perspective
Publisher’s perspective
  • Revenue
  • User engagement