Learning to Rank: New Techniques and Applications

Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK

Why learning to rank? Martin Szummer • Current rankers use many features, in complex combinations • Applications • Web search ranking, enterprise search • Image search • Ad selection • Merging multiple results lists • The good: uses training data to find combinations of features that optimize IR metrics • The bad: requires judged training data. Expensive, subjective, not provided by end-users, out-of-date

This talk Actually – I apply the same recipe in three different settings! Martin Szummer • Learning to rank with IR metrics A single, simple yet competition-winning, recipe. Works for NDCG, MAP, Precision with linear or non-linear ranking functions (neural nets, boosted trees etc) • Semi-supervised ranking A new technique. Reduce the amount of judged training data required. • Learning to merge Application: merging results lists from multiple query reformulations

Ranking Background score function query-docfeatures parameters Martin Szummer • Classification: determine the class of an item i(operates on individual items) • Ranking: determine the preference of item i versus j (operates on pairs of items) • Ranking function: Example: Linear function Ranking function induces a preference: when

From Ranking Function to the Ranking Martin Szummer • Applying the ranking function to define a ranking    Sort  {} • Above: had a deterministic model of preference • Henceforth: a probabilistic model translates score differences into a probability of preference Bradley-Terry/Mallows

Learning to Rank given given determine w preferencepairs Martin Szummer • Learning to rank    Sort  {} • Maximize likelihood of the preference pairs given in training data indicator when in train e.g. RankNet model [Burges et al 2005]

Learning to Rank for IR metrics Martin Szummer • IR metrics such as NDCG, MAP or Precision depend on: • sorted order of items • ranks of items: weight the top of the ranking more Recipe • Express the metric as a sum of pairwise swap deltas • Smooth it by multiplying by a Bradley-Terry term • Optimize parameters by gradient descent over a judged training set LambdaRank & LambdaMART[Burges et al] are instances of this recipe. The latter won the Yahoo! Learning to rank challenge (2010).

Example: Apply recipe to NDCG metric Martin Szummer Unpublished material. Email me if interested.

Gradients - intuition xL r 1 2 3 4 5 Martin Szummer Gradients act as forces on doc pairs

Semi-supervised Ranking [with Emine Yilmaz] • Train with judgedAND unjudgedquery-document pairs Martin Szummer

Semi-supervised Ranking Martin Szummer • Applications • (Pseudo) Relevance feedback • Reduce the number of (expensive) human judgments • Use when judgments are hard to obtain • Customers may not want to judge their collections • adaptation to a specific company in enterprise search • ranking for small markets, special interest domains, • Approach • preference learning • end-to-end optimization of ranking metrics (NDCG, MAP) • multiple and completely unlabeled rank instances • scalability

How to benefit from unlabeled data? Martin Szummer Unlabeled data gives information about the data distribution P(x). We must make assumptions about what the structure of the unlabeled data tells us about the ranking distribution P(R|x). A common assumption: the cluster assumption Unlabeled data defines the extent of clusters, Labeled data determines the class/function value of each cluster

Similarity can be defined based on content. Does not require judgments. is a type of regularizer on the function we are learning. Martin Szummer Semi-supervised classification: similar documents Þ same class regression: similar documents Þsimilar function value ranking: similar documents Þ similar preference i.e. neither is preferred to the other • Differences from classification & regression: • Preferences provide weaker constraints than function values or classes

Quantify Similarity Martin Szummer similar documents Þ similar preference i.e. neither is preferred to the other Unpublished material. Email me if interested.

Semi-supervised Gradients xL Martin Szummer

Experiments Martin Szummer Relevance Feedback task: 1) user issues a query and labels a few of the resulting documents from a traditional ranker (BM25) 2) system trains query-specific ranker, and re-ranks Data: TREC collection. 528,000 documents, 150 queries 1000 total documents per query; 2-15 docs are labeled Features: ranking features (q, d): 22 features from LETOR content features (d1, d2): TF-IDF dist between top 50 words Neighbors in input space using either of the above Note: at test time, only ranking features are used; method allows using features of type (d1, d2) and (q, d1, d2) at training that other algos cannot use Ranking function f(): neural network, 3 hidden units K=5 neighbors

Relevance Feedback Task LambdaRank L&U Cont LambdaRank L&U LambdaRank L TSVM L&U RankBoost L&U RankingSVM L RankBoost L Martin Szummer

Novel Queries Task 90,000 training documents 3500 preference pairs 40 million unlabeled pairs Martin Szummer

Novel Queries Task Upper Bound LambdaRank L&U Cont LambdaRank L&U LambdaRank L Martin Szummer

Learning to Merge • Example application • users do not know the best way to express their web search query • a single query may not be enough to reach all relevant documents wp7 user: Solution wp7 phone microsoft wp7 reformulatein parallel: merge results Martin Szummer Task: learn a ranker that merges results from other rankers

Merging Multiple Queries[with Sheldon, Shokouhi, Craswell] Martin Szummer • Traditional approach: alter before retrieval • Merging: alter after retrieval • Prospecting: see results first, then decide • Flexibility: any is rewrite allowed, arbitrary features • Upside potential: better than any individual list • Increased query load on engine: use cache to mitigate it

LambdaMerge: learn to merge Rewrite features: Rewrite-difficulty: ListMean, ListStd, Clarity Rewrite-drift: IsRewrite, RewriteRank,RewriteScore,Overlap@N Scoring features: Dynamic rank score, BM25, Rank, IsTopN score feat score feat rewrite feat jupiters mass mass of jupiter Martin Szummer A weighted mixture of ranking function

Martin Szummer

-Merge Results Merged – Original NDCG Reformulation – Original NDCG Martin Szummer

Summary Martin Szummer • Learning to Rank • An indispensable tool • Requires judgments: but semi-supervised learning can help crowd-sourcing is also a possibility research frontier: implicit judgments from clicks • Many applications beyond those shown • Merging: multiple local search engines, multiple language engines • Rank recommendations in collaborative filtering • Many thresholding tasks (filtering) can be posed as ranking • Rank ads for relevance • Elections • Use it!

Learning to Rank: New Techniques and Applications

Learning to Rank: New Techniques and Applications

Presentation Transcript

PowerPoint Lesson 1 PowerPoint Basics

Purposes for Using PowerPoint

CARIN Blue Button Framework and Common Payer Consumer Data Set

PowerPoint Presentation

powerpoint 3b

PowerPoint

Chapter 2

PowerPoint

eLearning Presentation

Presentation Techniques for Broadcasting

CHAPTER 5

PowerPoint Tips and Tricks

PowerPoint: Presentation Tips

Effective PowerPoint Presentations

How to make a Poster Presentation in PowerPoint

PowerPoint Presentation

Key Applications Module Lesson 19 — PowerPoint Essentials

Intermediate PowerPoint

How to Create a PowerPoint Presentation