1 / 16

Handling Advertisements of Unknown Quality in Search Advertising

Handling Advertisements of Unknown Quality in Search Advertising. Sandeep Pandey Christopher Olston (CMU and Yahoo! Research). Sponsored Search. How does it work? Search engine displays ads next to search results Advertisers pay search engine per click Who benefits from it?

tuwa
Download Presentation

Handling Advertisements of Unknown Quality in Search Advertising

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)

  2. Sponsored Search • How does it work? • Search engine displays ads next to search results • Advertisers pay search engine per click • Who benefits from it? • Main source of funding for search engines • Information flow from advertisers to users

  3. Sponsored Search • Click-through-rate (CTR): given an ad and a query, CTR = probability that the ad receives a click • Optimal policy to maximize search engine’s revenue: display ads of highest (CTR x bid) value Search query results Sponsored search results

  4. show ads refine CTR estimates record clicks earn revenue Challenges in Sponsored Search • Problem: CTRs initially unknown • estimating CTRs requires going around the circle • Exploration/Exploitation Tradeoff: • explore ads to estimate CTRs • exploit known high-CTR ads to maximize revenue

  5. Query phrases Budgets Ads a1,1 d1 Q1 A1 a2,1 a1,3 d2 A2 Q2 a3,2 d3 A3 Q3 Advertisers The Advertisement Problem • Problem: • Advertiser Ai submits ad ai,j for Query phrase Qj • User clicks on aij -> Ai pays bij (the “bid value”) • Queries arrive one after another • Select ads to show for each query, in an online fashion • Constraints: • Show at most C ads per query • Advertisers have daily budgets: Ai pays at most di • Goal: Maximize search engine’s revenue

  6. Our Approach • Unbudgeted Advertisement Problem • Isomorphic to multi-armed bandit problem • Budgeted Advertisement Problem • Similar to bandit problem, but with additional budget constraints that span arms • Introduce Budgeted Multi-armed Multi-bandit problem (BMMP)

  7. p1 p2 p3 Unbudgeted Advertisement Problem as Multi-armed Bandit Problem • Bandit: Classical example of online learning under the explore/exploit tradeoff • K arms. Arm ihas an associated reward ri and unknown payoff probability pi • Pull C arms at each time instant to maximize the reward accrued over time • Isomorphism: query phrase bandit instance; ads arms; CTR payoff probability; bid reward

  8. Policy for Unbudgeted Problem • Policy “MIX” (adopted from [Auer et. al. ML’02]) • When query phrase Qj arrives • Compute the priority pi,j of each ad ai,j where pi,j = (ei,j + sqrt(2 ln nj / ni,j)) . bi,j • ei,j is the MLE of the CTR value of ai,j • bi,j is the price or bid value of ad ai,j • ni,j : # times ad ai,j has been shown in the past • nj : # times query Qj has been answered • Display the C highest-priority ads

  9. Budgeted Multi-armed Multi-Bandit problem (BMMP) • Finite set of bandit instances; each instance has a finite number of arms • Each arm has an associated type • Each type Ti has budget di • Upper limit on the total amount of reward that can be generated by the arms of type Ti • An external actor invokes a bandit instance at each time instant • the policy must choose C arms of the invoked instance

  10. Meta Policy for BMMP • Input: BMMP instance and policy POL for the conventional multi-armed bandit problem • Output: The following Policy BPOL • Run POL in parallel for each bandit instance Bi • Whenever Bi is invoked: • Discard arm(s) with depleted budget • If one or more arms was discarded, restart POLi • Let POLi decide which of the remaining arms to activate

  11. Performance Guarantee of BPOL • OPT = algorithm that knows in advance: • Full sequence of bandit invocations • Payoff probabilities • Claim: bpol(N) >= opt(N)/2 – O(f(N)) • bpol(N): total expcted reward of BPOL policy after N bandit invocations • opt(N): total expected reward of OPT • f(N): regret of POL after N invocations of the regular bandit problem

  12. Proof of Performance Guarantee • Divide the time instants into 3 categories: • 1 : BPOL chooses an arm of higher expected reward than OPT • opt1(N)<= bpol1(N) • 2 : BPOL chooses an arm of lower expected reward because OPT’s arm has run out of budget • opt2(N) <= bpol2(N) + (#types . max reward) • 3 : otherwise • opt3(N) = O(f(N)) • Claim (implies from the above bounds) • opt(N) <= bpol(N) + bpol(N) + O(1) + O(f(N)) • bpol(N) >= opt(N)/2 – O(f(n))

  13. Advertisement Policies • BMIX : Output of our generic BPOL policy when given MIX as input • BMIX-E :Replace sqrt(2 ln nj / ni,j) in priority pi,j by sqrt(min(0.25, V(ni,j,nj)). ln nj / ni,j), where V(ni,j,nj) = ei,j .(1-ei,j). sqrt(2 ln nj / ni,j) • Suggested in Auer. et. al. ML’02. • Purpose: Aggressive exploitation • BMIX-T :Replace bi,j in priority pi,j by bi,j . throttle(di‘), throttle(di‘) = 1-e^(- di‘/di) where di‘ is the remaining budget of advertiser Ai • Suggested in Mehta et. al. FOCS’05 • Purpose: Delay the depletion of advertisers’ budgets • BMIX-ET: with both E and T modifications

  14. Experiments • Simulations over real data • Data: • 85,000 query phrases from Yahoo! query log • Yahoo! ads with daily budget constraints • CTRs drawn from Yahoo!’s CTR distribution • Simulated user clicks using the CTR values • Time horizon = multiple days • Policies carried over the CTR estimates from one day to the next

  15. Results • GREEDY : select ads with highest current reward estimate (ei,j . bi,j) • Does not explore. Only exploits. *Revenue values scaled for confidentiality reasons

  16. Conclusion • Search advertisement problem • Exploration/exploitation tradeoff • Model as multi-armed bandit • Introduced new Bandit variant • Budgeted multi-armed multi-bandit problem (BMMP) • New policy for BMMP with performance guarantee • In paper: • Variable set of ads (ads come and go) • Prior CTR estimates

More Related