1 / 25

Improving Click-Through Rate Prediction Accuracy in Online Advertising by Transfer Learning

Improving Click-Through Rate Prediction Accuracy in Online Advertising by Transfer Learning. Yuhan Su 1 , Zhongming Jin 2 , Ying Chen 2 , Xinghai Sun 2 , Yaming Yang 2 , Fangzheng Qiao 2 , Fen Xia 2 , Wei Xu 1 1 Tsinghua University, 2 Baidu, Inc. Advertisement.

awilson
Download Presentation

Improving Click-Through Rate Prediction Accuracy in Online Advertising by Transfer Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ImprovingClick-ThroughRatePredictionAccuracyinOnlineAdvertisingbyTransferLearningImprovingClick-ThroughRatePredictionAccuracyinOnlineAdvertisingbyTransferLearning Yuhan Su1, ZhongmingJin2, Ying Chen2, XinghaiSun2, YamingYang2, FangzhengQiao2, Fen Xia2, Wei Xu1 1TsinghuaUniversity,2Baidu,Inc.

  2. Advertisement

  3. Onlineadsrevenue: three factors Revenue=PV *CTR* ACP NumberofPageViews AverageClickPrice Click-throughRate =#clicks/#views

  4. CTRprediction:thecentertooptimizerevenue

  5. Challenge:smallproductslackdata • Small,niche-marketproducts • Newlydevelopedproducts • LackdataforCTRpredictionmodel. • Largeproductdatato • helpsmallproduct • Differentproductshavedifferentdistribution

  6. Transferlearning:fromsourcetotarget Source Differentdistribution ? Differentdistribution LargeProduct Target Small Product Transferlearning

  7. Ourcontributions Aneffectivetransferlearningapproach Smallproducts CTRprediction Amap-reduceefficientimplementation Realadsdataexperiment

  8. Relatedwork They • Singleadvertisementproduct We • Multipleproducts • CTRprediction • Model • Feature • Transferlearning • Instancetransfer • Featurerepresentationtransfer • Parametertransfer • Relationalknowledgetransfer • Deeptransfer We • Handleamuchlargerdataset They • Fewworkonlargeadsdata

  9. BaiduAllianceAdssystem Product1 (3) Send info (4) Returnbidding priceandmaterials (2) Sendrequest andrelatedinfo ADX Website (1) Surf Product2 User (6) See ads (5) Return ads … Productn

  10. Ourapproach: framework pre-trainatargetmodel; loopfor Ntimes { samplesourcedata; combinedtraining; datareweight; } outputtheensemblemodel;

  11. Ourapproach:intuition Target Target Target Target Source Source Source Source Initialization Sampling Reweighting Training

  12. Ourapproach: sampling strategy Target Source • Sourcedata sampling • Thesamplingprobabilityisproportionaltothegradientonthetrainedmodel. • Intuition:Thelargerthegradientis,themorethemodelneedsthisdatainstance. Sampling

  13. Ourapproach: data reweighting Target Source • Datareweighting • correctly classified: weight will not change • target data misclassified: increase dataweight • source data misclassified: decrease data weight Reweighting

  14. Ourapproach: model ensemble • Modelensemble • As TrAdaboostproves, if the algorithm runs for N iterations, the average weighted training loss on source data from the ⌈N/2⌉-thiteration to the N-thiteration converges to zero. • Theoutputvalueis[0,1]

  15. Experimentsettings • Environment: • Internal MapReduce-like machine learning framework • 100computingnodes • Metric:AUC(AreaunderROCcurve) [0,1] • Datasets:

  16. Experimentresults • Source and target has very different data distribution

  17. Experimentresults • Directly combining source and target does not work

  18. Experimentresults • Our method has better AUC • Our method has less training time. • ( TrAdaboost 220 min vs Our 70 min)

  19. Parametersensitivity: small N makes the ensemble not work • N:numberoftotaliterations; number of ensemble model • N too small: the number of ensemble model is too small so the algorithm does not work well.

  20. Parametersensitivity: large N makes the model overfit • N too large: the number of ensemble model is too large so that the algorithm tends to overfit.

  21. Parametersensitivity: zero alpha only uses target data • Alpha:sampling parameter • α is zero: no source data will be used. The algorithm only uses target data and do model ensemble so this algorithm will become similar to Adaboost.

  22. Parametersensitivity: large alpha uses every source data • α too large: the probability value will be larger then 1. Thus, every source data instance will be sampled.

  23. Data size ratio: # target / # source, neither too small nor too large. • Fix the target size, vary the source size. • Max at about 0.8 • It is necessary to carefully adjust the data size ratio instead of over utilizing the source data.

  24. Promising directions and approach limitations • Our approach shows promising directions • Directlyusedinaverage-click-price(ACP) prediction • Other similartransfer learning scenarios(e.g. user risk prediction) • Some limitations in our approach • Current sampling strategy only uses the information of gradient • Do not take the sparsity of the advertisement data into consideration • How to efficiently do multiple-source transfer is challenging.

  25. Summary • An iterative transfer learning method to deal with CTR prediction in online ads • A map-reduce like implementation makes the approach scalable • Real data experiment shows the effectiveness and promising direction syhmartin@yeah.net http://iiis.tsinghua.edu.cn/en/2014311424/ Thankyou!

More Related