Caching dynamic skyline queries
Download
1 / 50

Caching Dynamic Skyline Queries - PowerPoint PPT Presentation


  • 150 Views
  • Uploaded on

Caching Dynamic Skyline Queries. D. Sacharidis 1 , P. Bouros 1 , T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management of Information Systems – R.C. Athena. Outline. Introduction Skyline (SL) and dynamic skyline queries (DSL) Related work

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Caching Dynamic Skyline Queries' - liora


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Caching dynamic skyline queries

Caching Dynamic Skyline Queries

D. Sacharidis1, P. Bouros1, T. Sellis1,2

1National Technical University of Athens

2Institute for Management of Information Systems – R.C. Athena

HDMS'08


Outline
Outline

  • Introduction

    • Skyline (SL) and dynamic skyline queries (DSL)

  • Related work

  • Evaluating dynamic skyline queries

    • Computing orthant skylines (OSL)

    • Computing dynamic skyline via caching

      • LRU, LFU, LPP cache replacement policies

  • Experimental evaluation

  • Conclusions and Future work

HDMS'08


Skyline queries sl
Skyline queries (SL)

  • Given a dataset of d-dimensional points

    • SL contains points not dominated by others

    • x dominates y iff x as good as y in all dimensions and strictly better in at least one

HDMS'08


Skyline queries sl1
Skyline queries (SL)

  • Given a dataset of d-dimensional points

    • SL contains points not dominated by others

    • x dominates y iff x as good as y in all dimensions and strictly better in at least one

Price

Distance from sea

  • Example

    • Dataset of hotels

    • Prefer cheap hotels close to the sea

HDMS'08


Skyline queries sl2
Skyline queries (SL)

  • Given a dataset of d-dimensional points

    • SL contains points not dominated by others

    • x dominates y iff x as good as y in all dimensions and strictly better in at least one

Skyline points

Price

Distance from sea

  • Example

    • Dataset of hotels

    • Prefer cheap hotels close to the sea

HDMS'08


Skyline queries sl3
Skyline queries (SL)

  • Given a dataset of d-dimensional points

    • SL contains points not dominated by others

    • x dominates y iff x as good as y in all dimensions and strictly better in at least one

Skyline points

Price

p1

Distance from sea

  • Example

    • Dataset of hotels

    • Prefer cheap hotels close to the sea

HDMS'08


Skyline queries sl4
Skyline queries (SL)

  • Given a dataset of d-dimensional points

    • SL contains points not dominated by others

    • x dominates y iff x as good as y in all dimensions and strictly better in at least one

Skyline points

Price

p1

p2

Distance from sea

  • Example

    • Dataset of hotels

    • Prefer cheap hotels close to the sea

HDMS'08


Dynamic skyline queries dsl
Dynamic skyline queries (DSL)

  • Extension of skyline queries

    • Given a query point q

    • DSL contains points not dynamically dominated by others w.r.t q

    • x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one

  • Can be treated as static SL

    • Transform points w.r.t. q

HDMS'08


Dynamic skyline queries dsl1
Dynamic skyline queries (DSL)

  • Extension of skyline queries

    • Given a query point q

    • DSL contains points not dynamically dominated by others w.r.t q

    • x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one

  • Can be treated as static SL

    • Transform points w.r.t. q

Query point q

Price

Distance from sea

  • Example

    • User defines “ideal” hotel q

HDMS'08


Dynamic skyline queries dsl2
Dynamic skyline queries (DSL)

  • Extension of skyline queries

    • Given a query point q

    • DSL contains points not dynamically dominated by others w.r.t q

    • x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one

  • Can be treated as static SL

    • Transform points w.r.t. q

Dynamic Skyline points

Price

q

Distance from sea

  • Example

    • User defines “ideal” hotel q

HDMS'08


Dynamic skyline queries dsl3
Dynamic skyline queries (DSL)

  • Extension of skyline queries

    • Given a query point q

    • DSL contains points not dynamically dominated by others w.r.t q

    • x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one

  • Can be treated as static SL

    • Transform points w.r.t. q

p4

p5

Dynamic Skyline points

Price

q

Distance from sea

  • Example

    • User defines “ideal” hotel q

HDMS'08


Intuition 1
Intuition (1)

  • Traditional SL algorithms need to run anew for each DSL query

  • Our idea

    • Exploit results from past queries to reduce processing cost for future DSL queries

    • Cache past queries

    • Decide which queries in cache are useful

HDMS'08


Intuition 2
Intuition (2)

Price

Distance from sea

HDMS'08


Intuition 21
Intuition (2)

  • 2 past DSL queries

    • qa, qb

  • Each query partitions space in 4quadrants

qa

Price

qb

Distance from sea

HDMS'08


Intuition 3
Intuition (3)

p4

  • A new query q arrives

  • Consider DSL for qa

    • p1 is contained DSL(qa)

    • p1 dominates p2, p3, p4

  • p1 lies in upper right quadrant w.r.t. qa

  • qa lies in upper right quadrant w.r.t. q

  • p1 dominates also p2, p3,p4 w.r.t. to q

    • Exclude p2, p3, p4 from dominance test for DSL(q)

p2

p1

p3

qa

q

Price

qb

Distance from sea

  • Shaded area denotes points dominated by p1

HDMS'08


Contribution in brief
Contribution in brief

  • Caching past DSL queries cannot reduce processing cost for future ones

    • We need more information about dominance relationships

  • Introduce orthantskylines (OSL) and examine their relationship with DSL

  • ExtendBitmap algorithm to compute OSL in parallel with DSL

  • Cache OSL to enhance DSL queries evaluation

    • Present 3 cache replacement policies

      • LRU, LFU, LPP

  • Experimental evaluation of caching mechanism

HDMS'08


Related work
Related work

  • Non-indexed methods

    • Block-Nested Loops (BnL)

    • Bitmap

    • Multidimensional Divide and Conquer (DnC)

    • Sort First Scan (SFS)

  • Index-based methods

    • B-tree

      • sort points according to the lowest valued coordinate

    • R-tree

      • Nearest neighbor based (NN)

      • Branch and bound (BBS)

HDMS'08


Related work1
Related work

  • Non-indexed methods

    • Block-Nested Loops (BnL)

    • Bitmap

    • Multidimensional Divide and Conquer (DnC)

    • Sort First Scan (SFS)

  • Index-based methods

    • B-tree

      • sort points according to the lowest valued coordinate

    • R-tree

      • Nearest neighbor based (NN)

      • Branch and bound (BBS)

HDMS'08


Bitmap
Bitmap

  • BnL variant

  • Suitable for domains with low cardinality and discrete

  • In brief

    • Computes a bitmap representation of the points in the dataset

    • Examines each point separately (dominance test)

      • Checks whether it is contained in the skyline or not

      • Exploits fast bitwise operations OR/AND

HDMS'08


Bitmap dominance test
Bitmap – Dominance test

  • For each point p

    • Define A = A1 & A2 & … & Ad

      • Denotes the points as good as p in all dimensions

    • Define B = B1 | B2 | … | Bd

      • Denotes the points strictly better than p in at least one dimension

    • Dominance test:

      • If C = A & B has all bits set to 0 then p is in SL

HDMS'08


Orthant skyline osl
Orthant skyline (OSL)

  • OSL provides more information about dominance relationships than DSL

    • Useful for pruning

  • Given a dataset of d-dimensional points and a query point q

    • Space partitioned in 2d orthants

    • o-th orthant skyline (OSL) of q contains points of the o-th orthantnot dynamically dominated by others inside orthant o w.r.t q

HDMS'08


Orthant skyline osl1
Orthant skyline (OSL)

Quadrant 1

Quadrant 0

  • OSL provides more information about dominance relationships than DSL

    • Useful for pruning

  • Given a dataset of d-dimensional points and a query point q

    • Space partitioned in 2d orthants

    • o-th orthant skyline (OSL) of q contains points of the o-th orthantnot dynamically dominated by others inside orthant o w.r.t q

Query point q

Price

Distance from sea

Quadrant 2

Quadrant 3

HDMS'08


Orthant skyline osl2
Orthant skyline (OSL)

Quadrant 1

Quadrant 0

  • OSL provides more information about dominance relationships than DSL

    • Useful for pruning

  • Given a dataset of d-dimensional points and a query point q

    • Space partitioned in 2d orthants

    • o-th orthant skyline (OSL) of q contains points of the o-th orthantnot dynamically dominated by others inside orthant o w.r.t q

Query point q

Price

Distance from sea

Quadrant 2

Quadrant 3

HDMS'08


Orthant skyline osl3
Orthant skyline (OSL)

Quadrant 1

Quadrant 0

  • OSL provides more information about dominance relationships than DSL

    • Useful for pruning

  • Given a dataset of d-dimensional points and a query point q

    • Space partitioned in 2d orthants

    • o-th orthant skyline (OSL) of q contains points of the o-th orthantnot dynamically dominated by others inside orthant o w.r.t q

Query point q

Quadrant 2 skyline points

Price

Distance from sea

Quadrant 2

Quadrant 3

HDMS'08


Osl and dsl relationship
OSL and DSL relationship

Quadrant 1

Quadrant 0

Price

q

Distance from sea

Quadrant 2

Quadrant 3

HDMS'08


Osl and dsl relationship1
OSL and DSL relationship

Quadrant 1

Quadrant 0

Price

q

Distance from sea

Quadrant 2

Quadrant 3

HDMS'08


Osl and dsl relationship2
OSL and DSL relationship

Quadrant 1

Quadrant 0

  • Map points from quadrants 1,2,3 to points inside quadrant 0

Price

q

Distance from sea

Quadrant 2

Quadrant 3

HDMS'08


Osl and dsl relationship3
OSL and DSL relationship

Quadrant 1

Quadrant 0

  • Map points from quadrants 1,2,3 to points inside quadrant 0

  • Compute DSL w.r.t. q

Price

q

Distance from sea

Quadrant 2

Quadrant 3

HDMS'08


Osl and dsl relationship4
OSL and DSL relationship

Quadrant 1

Quadrant 0

  • Map points from quadrants 1,2,3 to points inside quadrant 0

  • Compute DSL w.r.t. q

  • Union of all OSLs is superset of DSL w.r.t. to q

Price

q

Distance from sea

Quadrant 2

Quadrant 3

HDMS'08


Osl and dsl relationship5
OSL and DSL relationship

Quadrant 1

Quadrant 0

  • Map points from quadrants 1,2,3 to points inside quadrant 0

  • Compute DSL w.r.t. q

  • Union of all OSLs is superset of DSL w.r.t. to q

p1

p2

Price

q

Distance from sea

Quadrant 2

Quadrant 3

HDMS'08


Osl and dsl relationship6
OSL and DSL relationship

Quadrant 1

Quadrant 0

  • Map points from quadrants 1,2,3 to points inside quadrant 0

  • Compute DSL w.r.t. q

  • Union of all OSLs is superset of DSL w.r.t. to q

p2

Price

q

p3

Distance from sea

Quadrant 2

Quadrant 3

HDMS'08


Computing orthant skylines
Computing orthant skylines

  • Algorithm DBM

    • Extends Bitmap to compute DSL and OSLs at the same time

  • Method:

    • Compute bitmap representation

      • Transform each point coordinates w.r.t. to query q

    • Dominance test, point p, orthant o

      • p not in OSLo and not in DSL

      • p not in DSL, but in OSLo

      • p in DSL and in OSLo

HDMS'08


Dynamic skylines via caching
Dynamic skylines Via Caching

  • Cache OSLs instead of DSLs

    • Query cache contains (query point qj, OSLs)

    • OSLs encode by bitmaps

  • Algorithm cDBM

    • OSL contains information about dominance test inside orthant

    • Discard points inside orthants from dominance tests

  • Method:

    • Compute bitmap representation

    • For each point p consider its position (orthant) w.r.t. to cache queries qj

    • If p in the same orthant o w.r.t qj as qj w.r.t. q and p not in OSLo(qj) then exclude it from OSLo(q), DSL(q)

HDMS'08


Cache replacement policies
Cache Replacement Policies

  • General idea

    • Limited cache space

    • Identify least useful query point in cache

    • Replace it with new one

HDMS'08


Usage based policies
Usage-based policies

  • Only a few queries in cache are useful

  • Log cache query usage

  • Given a new query q

    • Consider as input the query point cache Q

    • Only query points in OSL of Q w.r.t. q are useful

    • Update cache - remove:

      • Least Recently Used (LRU) query point

      • Least Frequently Used (LFU) query point

HDMS'08


Usage based policies1
Usage-based policies

  • Only a few queries in cache are useful

  • Log cache query usage

  • Given a new query q

    • Consider as input the query point cache Q

    • Only query points in OSL of Q w.r.t. q are useful

    • Update cache - remove:

      • Least Recently Used (LRU) query point

      • Least Frequently Used (LFU) query point

qd

qb

qc

qa

Price

q

Distance from sea

HDMS'08


Usage based policies2
Usage-based policies

Redundant queries

  • Only a few queries in cache are useful

  • Log cache query usage

  • Given a new query q

    • Consider as input the query point cache Q

    • Only query points in OSL of Q w.r.t. q are useful

    • Update cache - remove:

      • Least Recently Used (LRU) query point

      • Least Frequently Used (LFU) query point

qd

qb

qc

qa

Price

q

Distance from sea

HDMS'08


Usage based policies3
Usage-based policies

Redundant queries

  • Only a few queries in cache are useful

  • Log cache query usage

  • Given a new query q

    • Consider as input the query points in cache Q

    • Only query points inOSL of Q w.r.t. q are useful

    • Update cache - remove:

      • Least Recently Used (LRU) query point

      • Least Frequently Used (LFU) query point

qd

qb

qc

qa

Price

q

Distance from sea

HDMS'08


Usage based policies4
Usage-based policies

Redundant queries

  • Only a few queries in cache are useful

  • Log cache query usage

  • Given a new query q

    • Consider as input the query points in cache Q

    • Only query points in OSL of Q w.r.t. q are useful

    • Update cache - remove:

      • Least Recently Used (LRU) query point

      • Least Frequently Used (LFU) query point

qd

qb

qc

qa

Price

q

Distance from sea

HDMS'08


Pruning power based policy
Pruning power-based policy

  • Usage-based policies do not indicate usefulness

  • Useful cached query

    • Great pruning power

      • Probability that a query can prune points of dataset from DSL computation

    • Depends on

      • Points dominated by query in an orthant j

      • Points contained in the antisymetric orthantof j

  • Update cache – remove

    • Query point with less pruning power (LPP)

qa

Price

q

Distance from sea

HDMS'08


Pruning power based policy1
Pruning power-based policy

  • Usage-based policies do not indicate usefulness

  • Useful cached query

    • Great pruning power

      • Probability that a query can prune points of dataset from DSL computation

    • Depends on

      • Points dominated by query in an orthant j

      • Points contained in the antisymetric orthantof j

  • Update cache – remove

    • Query point with less pruning power (LPP)

5:24

2:24

qa

3:24

74:24

Price

q

Distance from sea

HDMS'08


Pruning power based policy2
Pruning power-based policy

  • Usage-based policies do not indicate usefulness

  • Useful cached query

    • Great pruning power

      • Probability that a query can prune points of dataset from DSL computation

    • Depends on

      • Points dominated by query in an orthant j

      • Points contained in the antisymetric orthant of j

  • Update cache – remove

    • Query point with less pruning power (LPP)

5:74

2:34

qa

3:44

74:884

Price

q

Distance from sea

HDMS'08


Pruning power based policy3
Pruning power-based policy

  • Usage-based policies do not indicate usefulness

  • Useful cached query

    • Great pruning power

      • Probability that a query can prune points of dataset from DSL computation

    • Depends on

      • Points dominated by query in an orthant j

      • Points contained in the antisymetric orthant of j

  • Update cache – remove

    • Query point with less pruning power (LPP)

5:74

2:3176

qa

3:44

74:884

Price

q

Distance from sea

HDMS'08


Pruning power based policy4
Pruning power-based policy

  • Usage-based policies do not indicate usefulness

  • Useful cached query

    • Great pruning power

      • Probability that a query can prune points of dataset from DSL computation

    • Depends on

      • Points dominated by query in an orthant j

      • Points contained in the antisymetric orthant of j

  • Update cache – remove

    • Query point with less pruning power (LPP)

5:720

2:3176

qa

3:421

74:88222

Price

q

Distance from sea

HDMS'08


Pruning power based policy5
Pruning power-based policy

  • Usage-based policies do not indicate usefulness

  • Useful cached query

    • Great pruning power

      • Probability that a query can prune points of dataset from DSL computation

    • Depends on

      • Points dominated by query in an orthant j

      • Points contained in the antisymetric orthant of j

  • Update cache – remove

    • Query point with less pruning power (LPP)

5:720

2:3176

qa

3:421

74:88222

Price

q

Distance from sea

HDMS'08


Experimental evaluation
Experimental Evaluation

  • Synthetic datasets

    • Distribution types

      • Independent, correlated, anti-correlated

    • Number of points N

      • 10k, 20k, 50k, 100k,

    • Dimensionality

      • d = {2,3,4,5,6}

    • Domain size for dimension

      • |D| = {10,20,50}

  • Compare

    • Bitmap (NO-CACHE)

    • cDBM with LFU,LRU,LPP cache replacement policies

    • Query cache

      • |Q| = {10,20,30,40,50} past query points

      • Cache size is |Q|*N bits uncompressed

HDMS'08


Varying query cache size
Varying query cache size

Independent

Anti-correlated

  • Dataset: N = 50k points, with d = 4 dimensions of |D| = 20 domain size

  • LFU,LRU cache queries not representative for future ones

  • LPP caches queries with great pruning power

HDMS'08


Effect of distribution parameters
Effect of distribution parameters

Correlated vary N

Correlated vary d

  • Relative improvement in running time over NO-CACHE

  • Vary number of points N

    • d = 4 dimensions of |D| = 20 domain size

  • Vary number of dimensions d

    • N = 50k, |D| = 20

HDMS'08


Conclusions and future work
Conclusions and Future work

  • Conclusions

    • Introduced orthant skylines(OSLs) and discussed its relationship with DSL

    • Extended Bitmap to compute OSLs and DSL at the same time (DBM algorithm)

    • Proposed caching mechanism of OSLs to reducecost for future DSL queries

      • LRU, LFU, LPPcache replacement policies

    • Experimentally verified the efficiency of caching mechanism

  • Future work

    • Apply caching mechanism to index-based methods

    • Further increase pruning power of cached queries

HDMS'08


Questions

Questions ?

HDMS'08


ad