Best-Effort Top-k Query Processing Under Budgetary Constraints
Download
1 / 45

Best-Effort Top-k Query Processing Under Budgetary Constraints - PowerPoint PPT Presentation


  • 393 Views
  • Uploaded on

Best-Effort Top-k Query Processing Under Budgetary Constraints. Michal Shmueli-Scheuer (IBM Haifa Research Lab and UCI). Yosi Mass, Haggai Roitman. Chen Li. Ralf Schenkel, Gerhard Weikum. Mobile Applications Highly impatient users, need fast results. Motivating Example.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Best-Effort Top-k Query Processing Under Budgetary Constraints' - Michelle


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

Best-Effort Top-k Query Processing Under Budgetary Constraints

Michal Shmueli-Scheuer

(IBM Haifa Research Lab and UCI)

Yosi Mass, Haggai Roitman

Chen Li

Ralf Schenkel, Gerhard Weikum


Motivating example l.jpg

Mobile Applications Constraints

Highly impatient users, need fast results.

Motivating Example

Mediation Systems

Achieve high query throughput.

Top-k

Top-k

queries

results

Engine

Online Analytics (e.g. logs)

Achieve high query throughput.

Michal Shmueli-Scheuer


Traditional top k query l.jpg
Traditional top-k query Constraints

  • Pre-computed lists over multiple attributes.

  • Combine scores by some monotonic aggregation function.

  • Two accesses modes:

    • sorted access (Cs)

    • random access (Cr)

  • Objective:Compute k objects with highest scores.

sorted

n

m

Michal Shmueli-Scheuer


Nra algorithm fagin et al l.jpg
NRA algorithm (Fagin Constraintset al.)

Top-2

Best score

Worst score

highi

f = SUM

mink

candidates

mink > best-score of candidates

Michal Shmueli-Scheuer


Nra algorithm fagin et al5 l.jpg
NRA algorithm (Fagin Constraintset al.)

Top-2

Best score

Worst score

highi

mink

candidates

mink > best-score of candidates

Michal Shmueli-Scheuer


Nra algorithm fagin et al6 l.jpg
NRA algorithm (Fagin Constraintset al.)

Top-2

Best score

Worst score

highi

mink

candidates

mink > best-score of candidates

Michal Shmueli-Scheuer


Top k with budget constraints l.jpg

Access Costs Constraints

Sorted access cost- Cs

Random access cost- Cr

Top-k with Budget Constraints

Top-2

NRA: 12Cs = 12

precision =0.5

Given budget B,

maximize result quality

Cs=1, Cr =3

f = SUM

TA: 7Cs +7Cr = 28

precision =0

Budget =10 ?

Michal Shmueli-Scheuer


Slide8 l.jpg

Contributions Constraints

  • Sorted Accesses

    • Efficient Plan

    • Solution with Adaptive a

  • Sorted and Random Accesses

    • Efficient Plan

    • Solution with Adaptive a

  • Experiments

Michal Shmueli-Scheuer


Slide9 l.jpg

Results Under Limited Budget Constraints

Results for limited budget

K results for unlimited

budget

Michal Shmueli-Scheuer


Slide10 l.jpg

L1 Constraints

L2

Top-2

o8, SL1

o2, SL2

o1

o4, SL2

P1

o1, SL1

o5

  • Interesting positions-where the k objects appear in the lists.

Q1

o5, SL2

o6, SL1

o5, SL1

P2

o3, SL2

o1, SL2

Q2

Efficient Plan- Sorted Accesses

  • Assume that we know the k results for unlimited budget (REXACT).

  • Plan – {L1,4} {L2,2}

Michal Shmueli-Scheuer


Slide11 l.jpg

L1 Constraints

L2

o8, SL1

o2, SL2

o4, SL2

P1

o1, SL1

Q1

o5, SL2

o6, SL1

o5, SL1

P2

o3, SL2

o1, SL2

Q2

Plan: {L1,2} {L2,3}

Efficient Plan- Sorted Accesses

  • Goal: find plan t, such that :

Plans for B=5

Denoted as ROPT

Michal Shmueli-Scheuer


Slide12 l.jpg

Sorted Accesses Constraints

  • Observations:

L1

L2

L3

O1, SL1

O1, SL2

O2, SL1

O2, SL2

O2, SL3

Prefer high scores

Michal Shmueli-Scheuer


Slide13 l.jpg

Observations – contd. Constraints

title=“war” description=“weapon”

Prefer large score reductions

Michal Shmueli-Scheuer


Slide14 l.jpg

o2, 1 Constraints

o4, 0.9

o5, 0.8

o3, 0.7

o1, 0.6

Score Utilities

Score gain:

Score reduction:

y =3

Michal Shmueli-Scheuer


Slide15 l.jpg

Optimization Problem Constraints

  • Bi-objective optimization problem:

    util(Li,x) = a* gain +(1-a)* reduction

Heuristics:

  • Fair Heuristic

  • Rank Heuristic

Where m is the number of lists

Michal Shmueli-Scheuer


Slide16 l.jpg

Adaptive Constraints

gain

reduction

))

(1-(

time

Michal Shmueli-Scheuer


Slide17 l.jpg

L1 Constraints

L2

L3

O1, SL1

O1, SL2

O1, SL3

Adaptive 

top-k

o1 [ws,bs]

o2 [ws,bs]

d(o4) = 0.8-0.6=0.2

o3 [0.8,bs]

candidates

hight1

o4 [0.6,bs]

hight2

o6 [ws,bs]

Theobald et al. VLDB04

Michal Shmueli-Scheuer


Slide18 l.jpg

TREC query, k=100 Constraints

Adaptive 

Michal Shmueli-Scheuer


Slide19 l.jpg

Efficient Plan- Random Accesses Constraints

  • Observations:

    • random accesses occur always after sorted accesses have been finished.

schedule 1: {SA……RA……SA….}

schedule 2: {SA……SA……RA….}

precision(schedule1) = precision(schedule2)

Michal Shmueli-Scheuer


Slide20 l.jpg

o1 [ Constraintsws,bs]

o2 [ws,bs]

o3 [ws,bs]

Observations- contd.

  • Random accesses are only useful to objects in REXACT.

top-k

L2

o1 [ws,bs]

o2, SL2

Precision reduced

o5 [ws,bs]

o5, Not in REXACT

o2 [ws,bs]

o5, SL2

candidates

o4 [ws,bs]

o1, SL2

o5 [ws,bs]

Precision remains the same

Michal Shmueli-Scheuer


Slide21 l.jpg

Gathering with Sorted Constraints

Not enough good candidates, RA is wasted

Probing with Random

Not enough RAs to prune the candidates

Random Accesses

  • When to switch from SA to RA?

)(

(1-(

time

Michal Shmueli-Scheuer


Slide22 l.jpg

S+R > B Constraints

Random Accesses

  • Switch from Sorted to Random:

    R= (1- )*S

    S – total cost of sorted accesses.

    R – total cost for random accesses.

  • Which items to access ?

  • maximize expected score.

Michal Shmueli-Scheuer


Slide23 l.jpg

Experimental Data Constraints

  • TREC Terabyte

    • 25M webpages

    • 50 queries with average length of 3 words.

  • IMDB

    • 375,000 movies

    • 20 queries , each with 4 attributes: {Title, Genre, Actors, Description}

  • Synthetic data

    • Zipf, #lists =[2,6], #objects =[10000,1000000]

  • Aggregate Function : Sum

Michal Shmueli-Scheuer


Slide24 l.jpg

Evaluation Methods Constraints

  • percentage of optimal precision

Ropt

Rexact

Ralg

Ropt

  • SME

Michal Shmueli-Scheuer


Slide25 l.jpg

Results- Sorted Accesses Constraints

TREC, k=100

Less budget, more improvement

Michal Shmueli-Scheuer


Slide26 l.jpg

Varied k Constraints

IMDB, B=400

Lower K, more improvement.

Michal Shmueli-Scheuer


Slide27 l.jpg

Number of Lists Constraints

Zipf, K=100, B=4000

More lists, more improvement.

Michal Shmueli-Scheuer


Slide28 l.jpg

Results- Random Accesses Constraints

TREC, k=100,Cr=10

TREC, K=100, Cr=100


Slide29 l.jpg

Related Works Constraints

  • Minimize budget for optimal results:

    • the algorithm computes the exact results with minimum cost. (Bast et al. VLDB06, Bruno et al. ICDE02, Chang et al. SIGMOD02)

    • Dual problem.

  • Anytime top-k :

    • The algorithm collects statistics during processing, which can be used to provide probabilistic guarantees at any time during processing. (Aray et al. VLDB07)

    • Do not do any optimizations.

  • Approximate top-k:

    • approximate results with probabilistic guarantees. (Theobald et al. VLDB04, Fagin et al. 2001)

Michal Shmueli-Scheuer


Slide30 l.jpg

Conclusions Constraints

  • First attempt to deal with budget constraints.

  • For SA only, average precision around 70%.

  • Tradeoff between RAs and SAs, for relatively low cost of RA, RA schedules are improved.

Michal Shmueli-Scheuer


Slide31 l.jpg

Thank You ! Constraints


Top k query l.jpg
Top-k query Constraints

  • Given a set of n objects and m scoring lists sorted in decreasing order, find the top-k objects according to a scoring function f

  • top-k: a set T of k objects such that f(rj1,…,rjm) ≤ f(ri1,…,rim)for every objectXi in T and every object Xjnot in T

  • Assumption: The scoring function f is monotone

    • f(r1,…,rm) ≤ f(r1’,…,rm’)ifri ≤ ri’for allI

    • Two accesses modes:

      • sorted access – Cs

      • random access - Cr

  • Objective:Compute top-k with the minimum cost


Slide34 l.jpg

L1 Constraints

L2

L3

O1, SL1

O1, SL2

O1, SL3

Sorted Accesses

  • Observations:

    • object with high scores has higher potential to be part of the top-k.

    • object with “mediocre” scores does not help.

Prefer high scores


Example l.jpg

Q Constraints

Wireless zone

Example

useless


Slide36 l.jpg

Applications Constraints

  • Mobile Applications

    • Highly impatient users, need fast results.

  • Mediation Systems

    • Achieve high query throughput.

  • Online analytics (e.g. logs)

    • Achieve high query throughput.

Michal Shmueli-Scheuer


Motivating example37 l.jpg

Servers Constraints

Mediator

Engine

User query

Motivating Example

Query throughput

Allocate time for each query

Given #queries

per

time unit


Terminology l.jpg
Terminology Constraints

  • Sorted Access

  • Random Access

  • highi

  • Top-k queue

  • Candidates queue

  • mink

  • worstScore(d)

  • bestScore(d)


Slide39 l.jpg

L1 Constraints

L2

o8, SL1

o2, SL2

o4, SL2

P1

o1, SL1

P1

o5, SL2

o6, SL1

o5, SL1

P2

o3, SL2

o1, SL2

P2

Efficient Offline Solution- Sorted

  • Goal: find trace t, such that :

L1

L2

B=5

Denoted as ROPT


Slide40 l.jpg

L1 Constraints

L2

o8, SL1

o2, SL2

o4, SL2

P1

o1, SL1

P1

o5, SL2

o6, SL1

o5, SL1

P2

o3, SL2

o1, SL2

P2

Efficient Offline Solution- Sorted

  • Goal: find trace t, such that :

B =5

L1

L2

  • Feasible for K up to 100, and m up to 10.


Slide41 l.jpg

Efficient Offline Solution- Sorted Constraints

  • Proof: (in negation)

    • Assume that t does not exists, and chose trace s that within the budget and has optimal precision. Assume s` with traces s`i that are largest position of Pi less or equal to si.

    • By construction the score of any object in S is the same to S`


Slide42 l.jpg

Fair Heuristic Constraints

  • Assume budget =b

Runs in batches


Slide43 l.jpg

d ConstraintsRexact

best(o)-mink

(best(o) = wosrt(o)+RA)

o5, S

o8, S

o7, S

o9, S

….

….

Efficient Offline Solution- Random

  • Budget for RAs =(B-|t|*Cs)

Top-k

o1, S

o2, S

o3, S

o4, S

o10, S

o14, S

….


Slide44 l.jpg

Motivation Constraints

  • Many applications work in budgeted constraint environments. Still, they wish to perform top-k queries.

Servers

Budget-aware

Query processing

Mediator

Engine

User query


Slide45 l.jpg

Future work Constraints

  • Different access costs for different lists

  • Time-aware top-k

  • Top-k with budget constraints for P2P


ad