Efficient computation of diverse query results
This presentation is the property of its rightful owner.
Sponsored Links
1 / 42

Efficient computation of diverse query results PowerPoint PPT Presentation


  • 51 Views
  • Uploaded on
  • Presentation posted in: General

Efficient computation of diverse query results. Presenting: Karina Koifman Course : DB Seminar. Example. Example. Yahoo! Autos. Maybe a better retrieval. Introduction. The article talks about the problem of efficiently computing diverse query results in online shopping applications.

Download Presentation

Efficient computation of diverse query results

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Efficient computation of diverse query results

Efficient computation of diverse query results

Presenting: Karina Koifman Course : DB Seminar


Example

Example


Example1

Example

Yahoo! Autos


Efficient computation of diverse query results

Maybe a better retrieval


Introduction

Introduction

  • The article talks about the problem of efficiently computing diverse query results in online shopping applications.


The goal

The Goal

  • The goal of diverse query answering is to return a representative set of top-k answers from all the tuples that satisfy the user selection condition


The problem

The Problem

  • Users issues query for a product

  • Only most relevant answers are shown.

  • Many Duplications


Agenda

Agenda

  • Existing Solutions

  • Definition of diversity

  • Impossibility results of diversity.

  • Query processing technique.


Existing solutions

Existing Solutions

Existing solutions are inefficient or do not work in all situations. Example:

  • Obtain all the query results and then pick a diverse subset from these results  doesn’t scale for large data sets.


Existing solutions1

Existing Solutions

  • Web search engines:

    first retrieve c × k and then pick a diverse subset from these.

  • It is more efficient than the previous method.

  • many duplicates product sale. (inefficient and doesn’t guarantee diversity)


Existing solutions2

Existing Solutions

  • issuing multiple queries to obtain diverse results:


Pro s con s

Pro’s\Con’s

  • The good:

    • Diversity

  • The Bad:

    • Hurts performance

    • Empty results

      *There are no Honda Accord convertibles


Agenda1

Agenda

  • Existing Solutions

  • Definition of diversity

  • Impossibility results of diversity.

  • Query processing technique.


Diversity ordering

Diversity Ordering

  • A diversity ordering of a relation R with attributes A, denoted by , is a total ordering of the attributes in A.

  • Example: Make ≺ Model ≺ Color ≺ Year ≺ Description ≺ Id


The db example

The DB example


Similarity sim x y

Similarity – SIM(X,Y)

Find a result set that minimizes


Example similarity

Example - Similarity


Prefix

Prefix


Few more definitions

Few more definitions

  • RES(R,Q) of size k

  • Given relation R and query Q, let maxval =


Agenda2

Agenda

  • Existing Solutions

  • Definition of diversity

  • Impossibility results of diversity.

  • Query processing technique.


Impossibility results

Impossibility Results

  • Intuition: IR score of an item depends only on the item and possibly statistics from the entirecorpus, but diversity depends on the other items in the query result set.


Inverted lists

Inverted Lists

Honda cars

Honda

Car

Merged Inverted List:


Impossibility results1

Impossibility Results

  • Item in an inverted list has a score, which can either be a global score (e.g., PageRank) or a value/keyword -dependent score (e.g., TF-IDF).

  • The items in each list are usually ordered by their score – so that we could handle top-k queries .

  • If we assume that we have a scoring function f() that is monotonic- which as a normal assumption for traditional IR system, then the article proofs either it’s not diverse or to inefficient\infeasible.


Agenda3

Agenda

  • Existing Solutions

  • Definition of diversity

  • Impossibility results of diversity.

  • Query processing technique.


The db example1

The DB example


The car indexing example

The car indexing example


One pass algorithm

One-pass Algorithm

Lets say Q looks for descriptions with ‘Low’, with k=3

Honda.Civic.Green.2007.’Low miles’


One pass algorithm1

One-pass Algorithm

We start from two Civics , then we know that we need only

one more so we pick the next Civic


One pass algorithm2

One-pass Algorithm

Then we look for another in next level (Accord)- no such,

because it doesn’t have ‘Low’ in it (also no other in that level).


One pass algorithm3

One-pass Algorithm

Then we look for another in next level (make)- and prune,

This is maximum diverse – we stop here.


One pass algorithm4

One-pass Algorithm

If we had a Ford, we would continue

Ford

0

Focus

0

Black

0

07

0

Low

miles


Scored one pass algorithm

Scored One-pass Algorithm

Give each car a score , then the query would take this score as parameter- minScore- smallest score in the result set,

Choose next next ID by :

The smallest ID such that score(id)>=root.minScore.

And the algorithm proceeds as before.


Probing algorithm

Probing Algorithm

Main idea: to go over all the cars as they were on an axis

K=3

K=2

K=1


Advantage of bidirectional exploring

Advantage of bidirectional exploring

  • “Honda” only has one child,we found it quickly not exploring every option (only civic).

  • Each time we add a node to the diverse solution we do not have to prune it- unlike the OnePass algorithm.


Wand algorithm

WAND algorithm

  • WAND is an efficient method of obtaining top-K lists of scored results, without explicitly merging the full inverted lists.

  • AND(X1,X2,...Xk)≡ WAND(X1,1,X2,1, ...Xk,1,k),

  • OR(X1,X2,...Xk) ≡ WAND(X1,1,X2,1, ...Xk,1,1).

  • To obtain k best results the operator uses the upper bounds of maximum contribution, and temp threshold. WAND(X1,UB1,X2,UB2,...,Xk ,UBk, θ)


Scored probing algorithm

Scored Probing Algorithm

We use the WAND algorithm- to obtain the top-k list.

Next step is marking all possible nodes to add- as MIDDLE.

we also maintain a heap – for a node with minimum child.

Each step we move nodes from tentative to useful .


Experiments

Experiments

MultQ – rewriting the query as multiple queries and merging their results.

Naïve – all the results of a query

Basic - just first k answers – without diversity.

OnePass , Probe – our algorithms

U = unscored

S = scored


Experiments1

Experiments


Experiments2

Experiments


Conclusions

Conclusions

  • Formalized diversity in structured search and proposed inverted-list algorithms.

  • The experiments showed that the algorithms are scalable and efficient.

  • In particular, diversity can be implemented with little additional overhead when compared to traditional approaches


Extension of the algorithm

Extension of the algorithm

  • Assign higher weights to Hondas and Toyotas when compared to Teslas, so that the diverse results have more Hondas and Toyotas.


Questions

Questions?

Thank You !


  • Login