Raghavendra madala
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

Raghavendra Madala PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on
  • Presentation posted in: General

ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan CSE6339 – Data exploration. Raghavendra Madala. In this presentation…. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee

Download Presentation

Raghavendra Madala

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Raghavendra madala

ICICLES: Self-tuning Samples for Approximate Query AnsweringBy Venkatesh Ganti, Mong Li Lee, and Raghu RamakrishnanCSE6339 – Data exploration

Raghavendra Madala


In this presentation

In this presentation…

  • Introduction

  • Icicles

  • Icicle Maintenance

  • Icicle-Based Estimators

  • Quality Guarantee

  • Performance Evaluation

  • Conclusion

ICICLES: Self-tuning Samples for Approximate Query


Introduction

Introduction

Analysis of data in data warehouses useful in decision support

  • OLAP-provide interactive response times to aggregate queries

  • AQUA- Approximate query answering systems provide very fast alternatives to OLAP systems

ICICLES: Self-tuning Samples for Approximate Query


Approaches

Approaches

  • Sampling-based

  • Histogram-based

  • Probabilistic-based

  • Wavelet-based

  • Clustering-based

ICICLES: Self-tuning Samples for Approximate Query


Join synopsis

Join synopsis

Is a Uniform Random Sampling

  • All tuples are assumed to be equally important

  • OLAP queries follow a predictable repetitive pattern

  • Sampling wastes precious main-memory

  • Join of random samples of base relations may not be a random sample of the join of the base relations. This is basis for Join Synopsis by Gibbons

ICICLES: Self-tuning Samples for Approximate Query


Why icicles

Why Icicles?

  • To capture the data locality of aggregate queries on foreign key joins

  • Is expected to consist of more tuples in regions that are accessed more frequently

  • Sample relation space better utilized if more samples from actual result set are present

  • Dynamic algorithm that changes the sample to suit the queries being executed in the workload

ICICLES: Self-tuning Samples for Approximate Query


Icicles

Icicles

Is a uniform random sample of a multiset of tuples L (an extension of R), which is the union of a relation R and all sets of tuples that were required to answer queries in the workload

ICICLES: Self-tuning Samples for Approximate Query


Icicle maintenance

Icicle Maintenance

The intuition is to incrementally maintain a sample, called icicles.

We maintain an icicle such that the probability of a tuple being selected is proportional to frequency with which it is required to answer queries(exactly).

ICICLES: Self-tuning Samples for Approximate Query


Icicle maintenance algorithm

Icicle Maintenance Algorithm

Efficient incremental maintenance is possible for the the following reasons

  • Uniform Random Sample of L(extension of relation R) ensures that tuple’s selection in the icicle is proportional to it’s frequency

  • Incremental maintenance of icicle requires only the segment of R that satisfies the new query each time

  • Reservoir Sampling Algorithm is used to stream each tuple being appended to L.

ICICLES: Self-tuning Samples for Approximate Query


Icicle maintenance algorithm1

Icicle Maintenance Algorithm

ICICLES: Self-tuning Samples for Approximate Query


Icicle maintenance example

Icicle Maintenance Example

ICICLES: Self-tuning Samples for Approximate Query


Icicle based estimators

Icicle-Based Estimators

  • Icicle is a non-uniform sample of original data

  • Frequency must be maintained over all tuples

  • Different Estimation mechanisms for Average, Count and Sum

ICICLES: Self-tuning Samples for Approximate Query


Estimators for aggregate queries

Estimators for Aggregate queries

  • Average is the average of distinct tuples in sample satisfying query

  • Count is the sum of expected contributions of all tuples in icicle that satisfy the query

  • Sum is the product of average and count

ICICLES: Self-tuning Samples for Approximate Query


Maintaining frequency relation

Maintaining Frequency Relation

  • Add Frequency Attribute to the Relation R

  • Frequency of each tuples is set to 1

  • Frequency incremented each time when a tuple is used to answer a query

  • Frequencies of relevant tuples updated only when icicle updated with new query

ICICLES: Self-tuning Samples for Approximate Query


Quality guarantees

Quality Guarantees

  • When queries in workload exhibit data locality, then icicles consists of more tuples from frequently accessed subsets of the relation

  • Accuracy improves with increase in number of tuples used to compute it

ICICLES: Self-tuning Samples for Approximate Query


Performance evaluation

Performance Evaluation

Plots definition:

  • Static sample:

    Uniform random sample on the relation

  • Icicle:

    Icicle evolves with the workload

  • Icicle-complete

    The tuned icicle again on the same workload

ICICLES: Self-tuning Samples for Approximate Query


Performance evaluation1

Performance Evaluation

SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice)

FROM LI, C, O, S, N, R

WHERE C_Custkey=O_Custkey AND O_Orderkey=LI_Orderkey AND LI_Suppkey=S_Suppkey AND

C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND

R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998

Qworkload : Template for generating workloads

SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice)

FROM LICOS-icicle, N, R

WHERE C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND

R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998

Template for obtaining approximate answers

ICICLES: Self-tuning Samples for Approximate Query


Performance evaluation2

Performance Evaluation

ICICLES: Self-tuning Samples for Approximate Query


Performance evaluation3

Performance Evaluation

ICICLES: Self-tuning Samples for Approximate Query


Conclusion

Conclusion

  • Icicles are class of samples that are sensitive to workload characteristics

  • Adapt quickly to changing workload

  • Icicles are useful when the workload focuses on relatively small subsets in relation

  • Icicle is a trade-off between accuracy and cost

ICICLES: Self-tuning Samples for Approximate Query


References

References

  • V. Ganti, M. Lee, and R. Ramakrishnan. ICICLES: Self-tuning Samples for Approximate Query Answering. VLDB Conference 2000.

ICICLES: Self-tuning Samples for Approximate Query


Thank you

Thank you!

ICICLES: Self-tuning Samples for Approximate Query


  • Login