- 88 Views
- Uploaded on
- Presentation posted in: General

Raghavendra Madala

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

ICICLES: Self-tuning Samples for Approximate Query AnsweringBy Venkatesh Ganti, Mong Li Lee, and Raghu RamakrishnanCSE6339 – Data exploration

Raghavendra Madala

- Introduction
- Icicles
- Icicle Maintenance
- Icicle-Based Estimators
- Quality Guarantee
- Performance Evaluation
- Conclusion

ICICLES: Self-tuning Samples for Approximate Query

Analysis of data in data warehouses useful in decision support

- OLAP-provide interactive response times to aggregate queries
- AQUA- Approximate query answering systems provide very fast alternatives to OLAP systems

ICICLES: Self-tuning Samples for Approximate Query

- Sampling-based
- Histogram-based
- Probabilistic-based
- Wavelet-based
- Clustering-based

ICICLES: Self-tuning Samples for Approximate Query

Is a Uniform Random Sampling

- All tuples are assumed to be equally important
- OLAP queries follow a predictable repetitive pattern
- Sampling wastes precious main-memory
- Join of random samples of base relations may not be a random sample of the join of the base relations. This is basis for Join Synopsis by Gibbons

ICICLES: Self-tuning Samples for Approximate Query

- To capture the data locality of aggregate queries on foreign key joins
- Is expected to consist of more tuples in regions that are accessed more frequently
- Sample relation space better utilized if more samples from actual result set are present
- Dynamic algorithm that changes the sample to suit the queries being executed in the workload

ICICLES: Self-tuning Samples for Approximate Query

Is a uniform random sample of a multiset of tuples L (an extension of R), which is the union of a relation R and all sets of tuples that were required to answer queries in the workload

ICICLES: Self-tuning Samples for Approximate Query

The intuition is to incrementally maintain a sample, called icicles.

We maintain an icicle such that the probability of a tuple being selected is proportional to frequency with which it is required to answer queries(exactly).

ICICLES: Self-tuning Samples for Approximate Query

Efficient incremental maintenance is possible for the the following reasons

- Uniform Random Sample of L(extension of relation R) ensures that tuple’s selection in the icicle is proportional to it’s frequency
- Incremental maintenance of icicle requires only the segment of R that satisfies the new query each time
- Reservoir Sampling Algorithm is used to stream each tuple being appended to L.

ICICLES: Self-tuning Samples for Approximate Query

ICICLES: Self-tuning Samples for Approximate Query

ICICLES: Self-tuning Samples for Approximate Query

- Icicle is a non-uniform sample of original data
- Frequency must be maintained over all tuples
- Different Estimation mechanisms for Average, Count and Sum

ICICLES: Self-tuning Samples for Approximate Query

- Average is the average of distinct tuples in sample satisfying query
- Count is the sum of expected contributions of all tuples in icicle that satisfy the query
- Sum is the product of average and count

ICICLES: Self-tuning Samples for Approximate Query

- Add Frequency Attribute to the Relation R
- Frequency of each tuples is set to 1
- Frequency incremented each time when a tuple is used to answer a query
- Frequencies of relevant tuples updated only when icicle updated with new query

ICICLES: Self-tuning Samples for Approximate Query

- When queries in workload exhibit data locality, then icicles consists of more tuples from frequently accessed subsets of the relation
- Accuracy improves with increase in number of tuples used to compute it

ICICLES: Self-tuning Samples for Approximate Query

Plots definition:

- Static sample:
Uniform random sample on the relation

- Icicle:
Icicle evolves with the workload

- Icicle-complete
The tuned icicle again on the same workload

ICICLES: Self-tuning Samples for Approximate Query

SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice)

FROM LI, C, O, S, N, R

WHERE C_Custkey=O_Custkey AND O_Orderkey=LI_Orderkey AND LI_Suppkey=S_Suppkey AND

C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND

R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998

Qworkload : Template for generating workloads

SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice)

FROM LICOS-icicle, N, R

WHERE C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND

R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998

Template for obtaining approximate answers

ICICLES: Self-tuning Samples for Approximate Query

ICICLES: Self-tuning Samples for Approximate Query

ICICLES: Self-tuning Samples for Approximate Query

- Icicles are class of samples that are sensitive to workload characteristics
- Adapt quickly to changing workload
- Icicles are useful when the workload focuses on relatively small subsets in relation
- Icicle is a trade-off between accuracy and cost

ICICLES: Self-tuning Samples for Approximate Query

- V. Ganti, M. Lee, and R. Ramakrishnan. ICICLES: Self-tuning Samples for Approximate Query Answering. VLDB Conference 2000.

ICICLES: Self-tuning Samples for Approximate Query

ICICLES: Self-tuning Samples for Approximate Query