1 / 26

Learning to Rank: New Techniques and Applications

Learning to Rank: New Techniques and Applications. Martin Szummer Microsoft Research Cambridge, UK. Why learning to rank?. Current rankers use many features, in complex combinations Applications Web search ranking, enterprise search Image search Ad selection

aldis
Download Presentation

Learning to Rank: New Techniques and Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK

  2. Why learning to rank? Martin Szummer • Current rankers use many features, in complex combinations • Applications • Web search ranking, enterprise search • Image search • Ad selection • Merging multiple results lists • The good: uses training data to find combinations of features that optimize IR metrics • The bad: requires judged training data. Expensive, subjective, not provided by end-users, out-of-date

  3. This talk Actually – I apply the same recipe in three different settings! Martin Szummer • Learning to rank with IR metrics A single, simple yet competition-winning, recipe. Works for NDCG, MAP, Precision with linear or non-linear ranking functions (neural nets, boosted trees etc) • Semi-supervised ranking A new technique. Reduce the amount of judged training data required. • Learning to merge Application: merging results lists from multiple query reformulations

  4. Ranking Background score function query-docfeatures parameters Martin Szummer • Classification: determine the class of an item i(operates on individual items) • Ranking: determine the preference of item i versus j (operates on pairs of items) • Ranking function: Example: Linear function Ranking function induces a preference: when

  5. From Ranking Function to the Ranking Martin Szummer • Applying the ranking function to define a ranking    Sort  {} • Above: had a deterministic model of preference • Henceforth: a probabilistic model translates score differences into a probability of preference Bradley-Terry/Mallows

  6. Learning to Rank given given determine w preferencepairs Martin Szummer • Learning to rank    Sort  {} • Maximize likelihood of the preference pairs given in training data indicator when in train e.g. RankNet model [Burges et al 2005]

  7. Learning to Rank for IR metrics Martin Szummer • IR metrics such as NDCG, MAP or Precision depend on: • sorted order of items • ranks of items: weight the top of the ranking more Recipe • Express the metric as a sum of pairwise swap deltas • Smooth it by multiplying by a Bradley-Terry term • Optimize parameters by gradient descent over a judged training set LambdaRank & LambdaMART[Burges et al] are instances of this recipe. The latter won the Yahoo! Learning to rank challenge (2010).

  8. Example: Apply recipe to NDCG metric Martin Szummer Unpublished material. Email me if interested.

  9. Gradients - intuition xL r 1 2 3 4 5 Martin Szummer Gradients act as forces on doc pairs

  10. Semi-supervised Ranking [with Emine Yilmaz] • Train with judgedAND unjudgedquery-document pairs Martin Szummer

  11. Semi-supervised Ranking Martin Szummer • Applications • (Pseudo) Relevance feedback • Reduce the number of (expensive) human judgments • Use when judgments are hard to obtain • Customers may not want to judge their collections • adaptation to a specific company in enterprise search • ranking for small markets, special interest domains, • Approach • preference learning • end-to-end optimization of ranking metrics (NDCG, MAP) • multiple and completely unlabeled rank instances • scalability

  12. How to benefit from unlabeled data? Martin Szummer Unlabeled data gives information about the data distribution P(x). We must make assumptions about what the structure of the unlabeled data tells us about the ranking distribution P(R|x). A common assumption: the cluster assumption Unlabeled data defines the extent of clusters, Labeled data determines the class/function value of each cluster

  13. Similarity can be defined based on content. Does not require judgments. is a type of regularizer on the function we are learning. Martin Szummer Semi-supervised classification: similar documents Þ same class regression: similar documents Þsimilar function value ranking: similar documents Þ similar preference i.e. neither is preferred to the other • Differences from classification & regression: • Preferences provide weaker constraints than function values or classes

  14. Quantify Similarity Martin Szummer similar documents Þ similar preference i.e. neither is preferred to the other Unpublished material. Email me if interested.

  15. Semi-supervised Gradients xL Martin Szummer

  16. Experiments Martin Szummer Relevance Feedback task: 1) user issues a query and labels a few of the resulting documents from a traditional ranker (BM25) 2) system trains query-specific ranker, and re-ranks Data: TREC collection. 528,000 documents, 150 queries 1000 total documents per query; 2-15 docs are labeled Features: ranking features (q, d): 22 features from LETOR content features (d1, d2): TF-IDF dist between top 50 words Neighbors in input space using either of the above Note: at test time, only ranking features are used; method allows using features of type (d1, d2) and (q, d1, d2) at training that other algos cannot use Ranking function f(): neural network, 3 hidden units K=5 neighbors

  17. Relevance Feedback Task LambdaRank L&U Cont LambdaRank L&U LambdaRank L TSVM L&U RankBoost L&U RankingSVM L RankBoost L Martin Szummer

  18. Novel Queries Task 90,000 training documents 3500 preference pairs 40 million unlabeled pairs Martin Szummer

  19. Novel Queries Task Upper Bound LambdaRank L&U Cont LambdaRank L&U LambdaRank L Martin Szummer

  20. Learning to Merge • Example application • users do not know the best way to express their web search query • a single query may not be enough to reach all relevant documents wp7 user: Solution wp7 phone microsoft wp7 reformulatein parallel: merge results Martin Szummer Task: learn a ranker that merges results from other rankers

  21. Merging Multiple Queries[with Sheldon, Shokouhi, Craswell] Martin Szummer • Traditional approach: alter before retrieval • Merging: alter after retrieval • Prospecting: see results first, then decide • Flexibility: any is rewrite allowed, arbitrary features • Upside potential: better than any individual list • Increased query load on engine: use cache to mitigate it

  22. LambdaMerge: learn to merge Rewrite features: Rewrite-difficulty: ListMean, ListStd, Clarity Rewrite-drift: IsRewrite, RewriteRank,RewriteScore,Overlap@N Scoring features: Dynamic rank score, BM25, Rank, IsTopN score feat score feat rewrite feat jupiters mass mass of jupiter Martin Szummer A weighted mixture of ranking function

  23. Martin Szummer

  24. Martin Szummer

  25. -Merge Results Merged – Original NDCG Reformulation – Original NDCG Martin Szummer

  26. Summary Martin Szummer • Learning to Rank • An indispensable tool • Requires judgments: but semi-supervised learning can help crowd-sourcing is also a possibility research frontier: implicit judgments from clicks • Many applications beyond those shown • Merging: multiple local search engines, multiple language engines • Rank recommendations in collaborative filtering • Many thresholding tasks (filtering) can be posed as ranking • Rank ads for relevance • Elections • Use it!

More Related