1 / 35

Rank Aggregation

Rank Aggregation. Rank Aggregation: Settings. Multiple items Web-pages, cars, apartments,…. Multiple scores for each item By different reviewers, users, according to different features… Some aggregation function on the scores Sum, Average, Max… Goal: compute the top-k items.

Download Presentation

Rank Aggregation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rank Aggregation

  2. Rank Aggregation: Settings • Multiple items • Web-pages, cars, apartments,…. • Multiple scores for each item • By different reviewers, users, according to different features… • Some aggregation function on the scores • Sum, Average, Max… • Goal: compute the top-k items

  3. Rank Aggregation Example

  4. Naïve Algorithm • Compute the aggregated rank for all items • Find the best one, then the second best one… the k best one • Good for small-scale problems • Still not feasible for web scales…

  5. Can we do any better? • An assumption to help us: each individual list comes sorted • Reasonable for search engines, user rankings… • Another assumption: monotonicity of the aggregation function • Now can we do any better?

  6. Fagin's algorithm (FA) • Do sorted access on all lists in parallel • For every item do random access to the other lists to fetch all of its values • Stop when at least k items were seen (in the sorted access) in all lists • Sort the list • Why is this enough?

  7. Example Beauty Comfort Average

  8. Example Beauty Comfort Average

  9. Example Beauty Comfort Average

  10. Example Beauty Comfort Average

  11. Example Beauty Comfort Average

  12. Example (top-3) Beauty Comfort Average How do we know not to look further?

  13. Complexity • Probabilistic analysis on the order of items can be used to show better bounds (with good probability) • Can we do even better?

  14. Cost model • This is a very simple settings so we can define a finer cost model than worst case complexity • In a web context it is important to do so • Since the scale is huge • We associate some cost Cs with every sorted access , and some cost Cr with every random access • Denote the cost for algorithm A on input instance I by cost(A,I)

  15. Instance-optimality • An algorithm A is instance-optimal if for every input instance I, cost(A,I) = O(cost(A',I)) for every algorithm A' • A very strong notion • But we can realize it here!

  16. Threshold Algorithm (TA) • Idea: sometimes we can stop before seeing k objects in every list • Use a threshold on how good can a score of an unseen object be. • Based on aggregating the minimal score seen so far in all lists

  17. Example Beauty Comfort Average

  18. Example Beauty Comfort Average T=9.5

  19. Example Beauty Comfort Average T=9.5

  20. Example Beauty Comfort Average T=7

  21. Example Beauty Comfort Average T=4 One step less!

  22. Theorem • Assume that the aggregation function t is monotone. Let D be the class of all databases. Let A be the class of all algorithms that correctly find the top k answers for t for every database and that do not make wild guesses. Then TA is instance optimal over A and D

  23. Proof • Assume that algorithm A halts at depth d (that is, if di is the number of objects seen under sorted access to list i; then d =max di). • Assume that A sees a distinct objects (some possibly multiple times). In particular, a>= d: Since A makes no wild guesses, and sees a distinct objects, it must make at least a sorted accesses

  24. Claim: TA halts on D by depth a +k • Note that for each choice of d’ TA sees at least d0 objects by depth d’ • By depth d’ it has made m*d’ sorted accesses, and each object is accessed at most m times under sorted access. • If there are at most k objects that A does not see, then TA halts by depth a + k (after having seen every object), and we are done.

  25. Now assume that there are at least k + 1 objects that A does not see. • Let Y be the output set of A • Since Y is of size k; there is some object V that A does not see and that is not in Y • Let t be the threshold value when algorithm A halts • I.e. the aggregation of the lowest scores observed

  26. Call object R big if it has grade better than t, otherwise small • Claim: Every R in Y is big • Proof: Add another item with “lowest” di values in di, it is not seen by A thus not outputted; by correctness of A the claim follows • Now TA will see all elements in Y after depth d and will halt • d <= a and so we are done.

  27. Restricted Sorted Access • Some rankings are not available as sorted • E.g. distances from a map site • Then we can revise TA to do sorted access only on the list where it is possible • And still instance-optimal! (Against algorithms that work under the same restrictions, of course)

  28. No Random Access • Maintain bottom and upper bounds for every item (worst and best grades) • Best is the aggregation of what we have seen and the worst we have seen in every list, Worst is the aggregation with what we have seen and zeros • Keep in the list those with top-K "worst" grades • Break ties by "best" grades • Halt if we have k items in the list, and the best grade for every item out of the list is less than the k'th in the list

  29. Example Beauty Comfort Average

  30. Example Beauty Comfort Average

  31. Example Beauty Comfort Average

  32. Example Beauty Comfort Average

  33. Example Beauty Comfort Average

  34. Example Beauty Comfort Average

  35. Example Beauty Comfort Average Score(D)<3

More Related