1 / 34

Optimal Workload-Based Weighted Wavelet Synopsis

Optimal Workload-Based Weighted Wavelet Synopsis. Yossi Matias Daniel Urieli. School of Computer Science Tel Aviv University. Outline. Motivation Background & Contributions Wavelet synopses Optimal WB weighted wavelet synopses. Outline. Motivation Background & Contributions

ghazi
Download Presentation

Optimal Workload-Based Weighted Wavelet Synopsis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimal Workload-Based Weighted Wavelet Synopsis Yossi Matias Daniel Urieli School of Computer Science Tel Aviv University

  2. Outline • Motivation • Background & Contributions • Wavelet synopses • Optimal WB weighted wavelet synopses

  3. Outline • Motivation • Background & Contributions • Wavelet synopses • Optimal WB weighted wavelet synopses

  4. Compact Data Synopses “Transformed” Query Approximate Answer KB/MB FAST!! Approximate Query Processing OperationalDatabase SQL Query Long Response Times! Exact Answer GB/TB

  5. Goals • Develop data synopses • Most accurate answers • Using a small amount of memory • Massive data sets efficient construction • Time • I/O

  6. Outline • Motivation • Background & Contributions • Wavelet synopses • Optimal WB weighted wavelet synopses

  7. Data synopses • Samples: random samples, stratified samples, congressional samples, reservoir-sampling, backing samples, join synopses, sketches • [Olken-Rotem, Vitter, Alon-Matias-Szegedy, Gibbons-Matias-Poosala, Acharia et al…] • Used in commercial DB systems • Histograms: equi-depth, compressed, v-optimal, spline, multi-dimensional, dynamic, Max-diff, MHIST • [Poosala-Ionnidis, etc.] • Used in commercial DB systems • Wavelets synopses: basic, multi-dim, probabilistic, dynamic, extended • Adapts to nature of data effectively • [Matias-Vitter-Wang, Garafolakis-Gibbons, Chakrabarti et al, Rousopoulous-Kiotidis…] • Workload-based wavelet synopses [Matias, Portman]

  8. Accuracy of various synopses

  9. Workload-based synopses • Future queries correlated to past queries • Can be thought of as taken from a probability distribution roughly determined by the workload • Workload based synopses: optimized for a given query workload • “Standard” synopses assume uniform workload

  10. Workload-based synopses – prior work • Workload-based sampling • Overcoming limitations of sampling for aggregation queries [Chaudhuri, Das, Datar, Motwani, and Narasayya] • Icicles: Self-tuning samples for approximate query answering [Ganti, Lee, Ramakrishnan] • Workload-based histograms • Self-tuning histograms [Aboulnaga and Chaudhuri] • ST-holes [ Bruno et al. ] • Hierarchical range histogram [Guha-Koudas-Srivastava-02] • Workload-based wavelets • By Yossi Matias and Leon Portman

  11. Workload-Based Wavelet synopses [MP03] • Adapts effectively to a given query workload (not only to data) • Reduces the mean-squared-absolute / relative error over a workload of queries • Order magnitude improvement over prior wavelet synopses • Not necessarily optimal

  12. Contributions • Optimal Workload-based Weighted Wavelet (WWW) synopses • WB-MSE (Workload-Based Mean Squared Error) • WB-MRE (Workload-Based Mean-squared Relative Error) • Equivalently, minimize the expected squared, absolute or relative error over a point query • First to minimize the MRE over the data • WB-MRE with uniform distribution • Both WWW synopses are optimal enhanced wavelet synopses • A generalized definition which allows coefficients with arbitrary values • Optimal cost construction • Linear construction time • I/O optimal

  13. Techniques • Problem definition in terms of • Weighted norm • Weighted-inner-product • Weighted-inner-product-space • Weighted wavelets for building data synopses

  14. Outline • Motivation • Background & Contributions • Wavelet synopses • Optimal WB weighted wavelet synopses

  15. Resolution Averages Detail Coefficients 3 [2, 2, 0, 2, 3, 5, 4, 4] ---- 2 [2, 1, 4, 4] [0, -1, -1, 0] 1 [1.5, 4] [0.5, 0] 0 [2.75] [-1.25] [2.75, -1.25, 0.5, 0, 0, -1, -1, 0] Haar wavelet decomposition • Wavelets: mathematical tool for hierarchical decomposition of functions/signals • Haar wavelets: simplest wavelet basis, easy to understand and implement • Recursive pair wise averaging and differencing at different resolutions. • A linear time algorithm.

  16. + 2.75 + - -1.25 0.5 0 0 -1 0 -1 + - + - + - + - + - + - 2 2 0 2 3 5 4 4 Wavelet error tree [MVW98] Original data

  17. + + - + - + - + - + - + - + - 1 -1 0 1 The Haar Basis

  18. + 2.75 + - -1.25 0.5 0 0 -1 0 -1 + - + - + - + - + - + - 2 2 0 2 3 5 4 4 Wavelet error tree [MVW98] How should we choose which coefficients to retain? 1 1 Original data

  19. Parseval-based optimal thresholding • Given a vector with respect to some orthonormal basis • Goal: approximate the vector using only M << N basis coefficients • Then, choosing the largest M coefficients is optimal • Minimizes the L2 norm of the error vector

  20. Haar Wavelet Synopses - summary • Compute Haar wavelet decomposition of D • Coefficient thresholding: only M<<|D| = N coefficients can be kept • Parseval-based thresholding • optimal w.r.t the MSE • Several other greedy heuristics exists

  21. Outline • Motivation • Background & Contributions • Wavelet synopses • Optimal WB weighted wavelet synopses

  22. Given a synopsis S 3.5 3.5 -0.5 -1 0 -2 -1 0 0 -0.5 standard thresholding -0.707 0 0 -1 -0.5 0 WL2(S) 0.498 standard: 4 4 2 2 2 6 4 4 Importance: 0.001 0.001 0.001 0.001 0.249 0.249 0.249 0.249 Workload Example 2 2 2 6 3 5 4 4

  23. 0 3.5 -0.5 -1 0 0 -2 -1 WL2(S) 0.008 Importance: 0.001 0.001 0.001 0.001 0.249 0.249 0.249 0.249 Workload Example 3.5 Workload- based thresholding -0.5 -0.707 0 0 -1 -0.5 0 2 2 2 6 3 5 4 4 standard: 4 4 2 2 2 6 4 4 Workload based 5 4 2 2 4 4 3 4

  24. Error definition • D = (d1,…,dN) - our data. • - the point query • - the approximated answer • abs-error: rel-error: • The purpose: reduce a norm of • For example:

  25. Workload-based Error • A workload: (c1,…,cN), where ci is the probability that qi appears. • Given a workload W = (c1,…,cN) we define the Weighted L2 Norm: • When ci = 1/N: WL2(E) = MSE

  26. Our goal • Minimizing the WL2 norm of the errors vector E • For given data set D and query workloads W • Equivalently: minimizing the expected squared error over a point query taken from a given distribution

  27. Regular Haar transform Given a data set D = (d0,…,dN-1) D Haar Transform (HT) HT(D) standard thresholding wavelet synopsis

  28. Parseval’s formula, the WL2 norm, the weighted inner product, and the algorithm for computing the WH basis from the workload Overview Given a data set D = (d0,…,dN-1) and a workload vector W = (c0,…,cN-1) W D WHB(W) Weighted Haar Basis (WHB( Weighted Haar Transform (WHT) WHT(D) standard thresholding WB – wavelet synopsis

  29. x -y 0 1 The weighted Haar basis • The Weighted Haar Basis would also look like but

  30. 0 1 c0,c1,… , cN-1 Compute theWeighted Haar Basis • Meaning it would look more like: Recall the weight coefficients (the workload) W = (c0,…,cN-1) for D = (d0,…,dN-1)

  31. Experimental results WB-MSE VS. STANDARD

  32. Experimental results WB-MRE, ADAPTIVE, STANDARD

  33. Experimental results WB-MRE, ADAPTIVE

  34. Thank you!

More Related