1 / 76

Contents

Online Search and Advertising, Future and Present Chris Burges Microsoft Research Saturday, Dec 13, 2008. Contents. Search and Advertising – some ideas Where are we headed? How to begin? Some new results on ranking: w e can directly learn Information Retrieval measures

selina
Download Presentation

Contents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online Search and Advertising, Future and PresentChris BurgesMicrosoft ResearchSaturday, Dec 13, 2008 Text Mining, Search and Navigation

  2. Contents • Search and Advertising – some ideas • Where are we headed? • How to begin? • Some new results on ranking: we can directly learn Information Retrieval measures • Internet security and RSA: why worry? Text Mining, Search and Navigation

  3. ~ Search and Advertising ~ Text Mining, Search and Navigation

  4. Why Search Works… • Traditional: print, TV, radio, billboards,… • Only very broadly targeted to demographics (some exceptions) • Search is monetarily successful because advertising is more precisely targeted • The Google model is giving and will continue to give traditional channels a run for their money Text Mining, Search and Navigation

  5. Key Points • The online experience will be more deeply engaging. Text Mining, Search and Navigation

  6. What’s wrong with what we do now? • Nothing, but… ten blue links + ads, ten years from now? • Ads are ‘tacked on’ to the user experience. • Paid Search / Contextual / Banner – all are still largely impersonal. • But, Behavioral Targeting… Text Mining, Search and Navigation

  7. How might ads be targeted better? • I just bought a car – don’t show me more ads for cars • I just bought a house – show me ads for furniture • I like band X, but not Y • In general, build a model of what I’m in the market for • Per-user pricing, availability • User-driven asks (show me all ads for Z) Text Mining, Search and Navigation

  8. User Models • User models can be used to enrich the online experience, not just advertising. • Automated teaching • Need a model of the user’s understanding. • Find other users with similar interests • Tailor news presentation to user’s interests Text Mining, Search and Navigation

  9. Key Points • The online experience will be more deeply engaging. • We will need rich state models of users: likes, dislikes, ± interests, knowledge Text Mining, Search and Navigation

  10. What About Search? Text Mining, Search and Navigation

  11. Search: Somewhere in the Near Future Human Computer Dialog Query 84% Info. 12% Nav. 4% Trans. Indexed Web Data 78% Comm. … Structured Data: Distribution over Intents Structured Data: Diversity; Popular Pages; Aid Transaction Display Text Mining, Search and Navigation

  12. Text Mining, Search and Navigation

  13. How to get the information we need, to build good models for users? Ask them! Text Mining, Search and Navigation

  14. Key Points • The online experience will be more deeply engaging. • We will need rich state models of users: likes, dislikes, ± interests, knowledge, and more. • Natural Language Processing will be key. Text Mining, Search and Navigation

  15. Search Applications: And,Data Changes Everything • Example: AskMSR (Brill, Dumais, Banko, ACL 2002) • Commonly used resources for QA: • Part-of-speech tagger, parser, named-entity extractor, WordNet or other knowledge bases, passage or sentence retrieval, abduction, etc. • AskMSR doesn’t use any of them • Instead, AskMSR focuses on data: • There is a lot of data on the web – use it • Redundancy is a resource to be exploited • Data-driven QA: simple techniques, lots of data Text Mining, Search and Navigation

  16. Text Mining, Search and Navigation

  17. Data Changes Everything Banko and Brill, Mitigating…, ICHLTR 2001 Text Mining, Search and Navigation

  18. Data Changes Everything Banko and Brill, Scaling…, 2001 Text Mining, Search and Navigation

  19. Key Points • The online experience will be more deeply engaging. • We will need rich state models of users: likes, dislikes, ± interests, knowledge, and more. • Natural Language Processing will be key. • “Search” can be the engine under the hood for many different applications. • It’s better to use tons of data and simple models, versus smaller datasets and complex models. Text Mining, Search and Navigation

  20. Key Points • The online experience will be more deeply engaging. • We will need rich state models of users: likes, dislikes, ± interests, knowledge, and more. • Natural Language Processing will be key. • “Search” can be the engine under the hood for many different applications. • It’s better to use tons of data and simple models, versus smaller datasets and complex models. Text Mining, Search and Navigation

  21. How to proceed? • Don’t know. But: Sam, a Search Chatbot. • Provide an engaging chat experience • Use Search to show images, urls, videos,… • Will build persistent user world models • Will have its own world model • Can show precisely targeted ads • Will leverage social networks Text Mining, Search and Navigation

  22. The Eliza Effect • Eliza: J. Weizenbaum,1966 (!) • Demonstrated that extremely simple techniques can result in compelling dialog (sometimes, for some users) • Users tend to anthropomorphize computer behavior • This is gives us an advantage Text Mining, Search and Navigation

  23. Our Prime Directive in Building Sam: Do as little supervision as possible. Text Mining, Search and Navigation

  24. Let the Data do the Work • anarchism category: anarchism • anarchism category: political ideologies • anarchism category: political philosophies • anarchism category: social philosophy • autism category: autism • autism category: pervasive developmental disorders • autism category: childhood psychiatric disorders • autism category: communication disorders • autism category: neurological disorders • albedo category: electromagnetic radiation • albedo category: climatology • albedo category: climate forcing • albedo category: scattering, absorption and radiative transfer (optics) • albedo category: radiometry • abudhabi category: abudhabi • abudhabi category: capitals in asia • abudhabi category: cities in the united arab emirates • abudhabi category: coastal cities • a category: latin letters • a category: vowel letters Robert Rounthwaite, TMSN Text Mining, Search and Navigation

  25. Using Category Graphs to Drive Dialog Use ODP and Wikipedia hierarchies to construct graph • User: I like ferrets. • Ferret: category: animals people keep as pets • Animals people keep as pets: rabbits • Sam:Do you like rabbits, too? Text Mining, Search and Navigation

  26. Use Category Graphs to Build Models “World model” for both user and for Sam • Attach vector to each node, sparsely: • [like/dislike; interested/not; knows about; …] • Each component has confidence level • Leverage graph structure to explore Text Mining, Search and Navigation

  27. Other Useful Sources of Data qi uj qi+1 qj qj+1 qj+2 uj+1 qi+2 uj+2 qi+3 Q. Mei, D. Zhou, K. Church, Query Suggestion Using Hitting Time, CIKM 2008 Text Mining, Search and Navigation

  28. Temporal Querying Behavior M. Richardson, ACM Transactions on the Web 2008 Text Mining, Search and Navigation

  29. We Are Not Alone • http://quizbot.trueknowledge.com/ • http://www.jabberwacky.com/ Jab: Do you like computers? Me: Yes, but only when they are turned on. Jab: I like ferraris, do you like them also? Me: Not really. Why do you like them? Jab: Yes. Me: Why do you like them? Jab: No, I like going to mars. Text Mining, Search and Navigation

  30. One Possible Sentence Generator • Inputs: • Sentiment • Distribution over topics under discussion • Features from recent sentences • Sentence or phrase database (with statistics) • Distributions over users likes / interests, etc. • Close or popular nodes where bot lacks knowledge of user • Topic priors • Output: ranked sentences Text Mining, Search and Navigation

  31. New Challenges for Machine Learning • How can we teach a chatbot to talk? • “Good / bad response” buttons: reinforcement learning? • ESP-like games for labeling for learning to rank sentences? • Build natural sentences from phrases? • How can we learn effective user models? • Combine from multiple users to form good priors • Use active learning during chat to reduce uncertainty in the user’s model Text Mining, Search and Navigation

  32. Demo Joint work with Scott Imig, Silviu Cucerzan S. Cucerzan, Large Scale Named Entity Disambiguation based on Wikipedia data, Proc. 2007 Joint Conference on EMNLP and CNLL Text Mining, Search and Navigation

  33. ~ Some New Results on Ranking ~ Text Mining, Search and Navigation

  34. Empirical Optimality of -rank Joint work with: • Pinar Donmez (CMU) • Krysta Svore (MSR) • YisongYue (Cornell) Text Mining, Search and Navigation

  35. Some IR Measures • Precision: Recall: • Average Precision: Compute precision for each positive, average over all positions • Mean Average Precision: Average AP over queries • Mean Reciprocal Rank (TREC QA) • Mean NDCG: , averaged over queries Text Mining, Search and Navigation

  36. IR Measures, cont. These measures: • Depend only on the labels and the sorted order of the documents • Viewed as a function of the scores output by some model, are everywhere either flat or discontinuous • SVM MAP: Yue et. al, SIGIR ’07 • Tao Qin, Tie-Yan Liu, Hang Li, MSR Tech Report 164 (2008) Text Mining, Search and Navigation

  37. LambdaRank: Background Text Mining, Search and Navigation

  38. The RankNet Cost Modeled posteriors: Target posteriors: Define Cross entropy cost: Model output probabilities using logistic: Text Mining, Search and Navigation

  39. Text Mining, Search and Navigation

  40. RankNet Cost ~ Pairwise Cost Text Mining, Search and Navigation

  41. Pairwise Cost Revisited Pairwise cost fine if no errors, but: 13 errors 11 errors Text Mining, Search and Navigation

  42. LambdaRank Instead of using a smooth approximation to the cost, and taking derivatives, write down the derivatives directly. Then use these derivatives to train a model using gradient descent, as usual. Text Mining, Search and Navigation

  43. The Lambda Function NDCG gain in swapping members of a pair of docs, multiplied by RankNet cost gradient as a smoother: Let be the set of documents labeled higher (lower) than document : Text Mining, Search and Navigation

  44. Lambda Functions for MAP, MRR Text Mining, Search and Navigation

  45. Local Optimality • Check the gradient vanishes at solution. • Get bound on probability that we’re not at a local max, using one-sided Monte Carlo P (We miss ascent direction despite k trials) How large must k be for if ? Answer: Text Mining, Search and Navigation

  46. Data Sets • Artificial: 300 features, 50 urls/query, 10k/5k/10k train/valid/test split • Web 1: 420 features, 26 urls/query, 10k/5k/10k split • Web 2: 30k/5k/10k split Text Mining, Search and Navigation

  47. Which function to choose? • LocalGradient: finite element estimate of gradient, with margin • LocalCost: estimate local gradient using neighbors + weighted RankNet cost • SpringSmooth: smoother version of RankNetWeightPairs • DiscreteGradient: finite element estimate using optimal position Text Mining, Search and Navigation

  48. 10K Web 30K Web Artificial Text Mining, Search and Navigation

  49. Sample Size Matters • Number of pairs drops by >2 for MRR and MAP • For MRR, number of samples drops much further Text Mining, Search and Navigation

  50. IR Measure Optimality - Conclusions • Typically, IR practitioners would train models with small numbers of ‘smart’ features (~ BM25), and perform grid search • However, adding many weak features improves performance • We have shown that the LambdaRank gradients optimize three IR measures directly Text Mining, Search and Navigation

More Related