1 / 33

On the interaction between multidimensional skylines and functional dependencies

This talk explores the interaction between multidimensional skylines and functional dependencies and how functional dependencies can help with full and partial materialization of skycubes.

hopee
Download Presentation

On the interaction between multidimensional skylines and functional dependencies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the interaction between multidimensional skylines and functional dependencies Sofian Maabout University of Bordeaux. CNRS Joint work with Nicolas Hanusse, Patrick Kamnang Wanko, Carlos Ordonez

  2. Skyline query • O is in the skyline iff there is no other O’ better than O • Skyline={a, b, c, d} not dominated by any hotel • Intuitively, skyline points represent the best tradeoff

  3. Multidimensional skylines • Users are allowed to ask queries using any combination of dimensions • CEO: Best hotels = offering a swimming pool and air conditionning • Student: Best hotels = cheapest and free wifi • Skycube = set of all possible skylines • How to optimize all these multidimensional skylines? • Precompute ALL of them  Full Skycube • Precompute a SUBSET of them  Partial Skycube

  4. This talk • How functional dependencies can help full and partial materialization of skycubes

  5. SkylineQueries and Data Quality • Discard records with low quality is one dimension of data cleaning • Compare tuples wrt their respective quality parameters • Best tuples = those with best tradeoff wrt quality parameters

  6. SkylineQueries and Data Quality Zip  City Phone Name

  7. SkylineQueries and Data Quality t1, t3 and t4 involved in Zip  City violation t1 and t2 involved in Phone  Name violation t1’s salary is less precise than t2’s

  8. SkylineQueries and Data Quality Sky(#FDs,SU)= {t4, t5}

  9. Skylines are not monotone

  10. Functional dependencies & multidimensional skylines A  B BC A B A Theorem: If X Y thenSky(X) Sky(XY)

  11. Closed subspaces • X is closed iff XA for every A not in X • The minimal FD’s satisfied by T are C is closed AB is not closed

  12. Example sqs Red : closed subspace

  13. Skycube computation If partial materialization, just stop here

  14. Skycube computation Need of an efficient procedure

  15. Mining ClosedSubspaces • Intuitive idea: • For every A, find the maximal X st X  A • Every x  X’sispotentiallyclosed • The intersection of these sets of x’s are the closedsubspaces We adapt N. Hanusse, SM: A parallel algorithm for computing borders. CIKM’11

  16. Mining ClosedSubspaces Maximal subspaces not determining B

  17. SubspaceClosure • Let X be a subspace. • Let Closed={Y | Y is closed} • Then, X+ = smallest Y Closed s.t X  Y

  18. ClosedSubspaces ABCD BCD ABC ABD ACD AD BC AB BD AC CD A B C D

  19. Experiments • We versus other proposals for fully computing the skycube. • QGS & QGL : Lee et al. VLDBJ’14 and • BUS & TDS: Pei et al. TODS’06 • Orion: Raïssi et al. VLDB’10 • We versus closed skycubes: a losseless compression technique. Raïssi et al. VLDB’10 • Assess query evaluation time

  20. Experiments: (1) compute all skylinesSynthetic data sets Independent Correlated Anti-correlated

  21. Experiments: (1) Full SkycubeSynthetic data sets Speedup = execution time of algorithm X / execution time of our algorithm FMC

  22. Experiments: (1) Full SkycubeReal Data

  23. Experiments: (2) query optimization1000 random skyline queries • 0.31% out of the 2^20 queries are materialized. • 49 ms to answer 1K skyline queries from the materialized ones instead of • 99.92 seconds from the underlying data. • Speed up > 2000 23 23

  24. Experiments: (3) comparison with closed skycubes • Identify equivalent skylines and store just one copy  compression of the whole skylines set • E.g, Sky(C), Sky(D) and Sky(CD) are equivalent

  25. Experiments: (3) comparison with closed skycubes Number of materialized skylines (time to find and materialize them) Synthetic correlated data: n=100K, d=20: MICS=20sec, Closed didn’t finish after 36 hours More details in N. Hanusse, SM, P. Kamnang Wanko, C. Ordonez: Skycube Materialization Using the Topmost skyline of Functional Dependencies. TODS’16

  26. IncomparabilityDependencies • Definition: X ↬ Y iff t[X]=t’[X]  t[Y] and t’[Y] incomparable • Theorem: Sky(X) satisfiesX ↬ Y Sky(X)  Sky(XY) • Property: XY X ↬ Y

  27. IncomparabilityDependencies FDs do not detect Sky(B)  Sky(AB) while Sky(B) satisifes B ↬ A IncoDs detect that Sky(B)  Sky(BC) because Sky(B) doesn’t satisfy B ↬ C

  28. PrioritizedSkyline • Expression = Sky(AB & CD) • First computesSky(AB) • If t[AB] = t’[AB] and t Sky(AB), then t and t’ are comparedwrt C and D Kießling. Foundations of preferences in database systems. In VLDB’02, Chomicki et al. Preference elicitation in prioritized skyline queriesVLDBJ’12 Ciaccia et al. Output-sensitive Evaluation of Prioritized Skyline Queries. Sigmod’15

  29. PrioritizedSkyline Sky(AB)= {t1, t2, t3, t4} t1[AB]= t2[AB] and t1 dominates t2 wrt CD Sky(AB & CD) = {t1, t3, t4}

  30. PrioritizedSkyline • Let  = X1 & … & Xi & … & Xm • If X1…Xi-1 X and X  Xithen •  ’ = X1& … & Xi\X & … & Xm • AB  C  Sky(AB & CD)  Sky(AB & D)

  31. Conclusion • Functionaldependencies are helpful for both full and partial skycubematerialization • Incomparabilitydependenciescharacterizeskyline inclusions • Semanticoptimization of prioritizedskylineswithFDs

  32. Some Open questions • Is it possible to come up with a Chase like procedure for priotirized skylines semantic optimization? • What about Order dependencies ? • Incremental maintenance • Approximate skylines and approximate FDs • t[A] is preferred to s[A] iff s[A] – t[A] >  • X  Y iff t, s : t[X] ~ s[X]  t[Y] ~ s[Y]

  33. Thanks • Questions

More Related