1 / 26

Ripple Joins for Online Aggregation

Ripple Joins for Online Aggregation. By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi. Overview. What the paper is all about Traditional Algorithms Online Aggregation Ripple Joins: Introduction How different is Ripple join

winona
Download Presentation

Ripple Joins for Online Aggregation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ripple Joins for Online Aggregation By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi CSE 6339 - Data Exploration

  2. Overview • What the paper is all about • Traditional Algorithms • Online Aggregation • Ripple Joins: Introduction • How different is Ripple join • Ripple Join variants • Aspect ratios • Future Work CSE 6339 - Data Exploration

  3. What the paper is about.. • The paper talks about a class of join algorithms called Ripple joins for the online processing of multi-table aggregation queries. • This paper tells how to join a bunch of tables and get the SUM, COUNT, or AVG in GROUP BY clauses showing approximate results immediately and the confidence interval of the results from the first few tuples retrieved. CSE 6339 - Data Exploration

  4. Traditional Algorithms • Traditional algorithms take a lot of time since they have to process the entire tables or relations • The users have to wait for a long time before the results are returned. • An better method is Online Aggregation. CSE 6339 - Data Exploration

  5. Online Aggregation • A running estimate of the final aggregates are continuously displayed to the user. • Quick results rather than minimize time for completion. • The proximity of the running estimate to the final result is also displayed to the user.(confidence interval). CSE 6339 - Data Exploration

  6. GUI CSE 6339 - Data Exploration

  7. Ripple Joins: Introduction • Generalize the traditional block nested loops and hash joins. • Non blocking • Square ripple join – samples are drawn at the same rate • Rectangular ripple join – samples out one relation at a higher rate than another. CSE 6339 - Data Exploration

  8. Ripple Join: Introduction • Typical query forms SELECT op(expression) FROM R1, R2, … , RK WHERE predicate GROUP BY columns; CSE 6339 - Data Exploration

  9. How different is Ripple join? • Traditional hash join blocks until the entire query output is finished. Ripple join reports approximate results after each sampling step, and allows user intervention. • In the inner loop, an entire table is scanned. Ripple join expands the sample set incrementally. • Ripple joins avoid complete scan of the relations. CSE 6339 - Data Exploration

  10. How Ripple Join works.. Assume ripple join of relations R and S • Select a random tuple r from R. • Join with previously selected S tuples. • Select a random tuple s from S. • Join with previously selected R tuples. • Join r and s. CSE 6339 - Data Exploration

  11. Ripple Join: Square two table join R S X N = 1 CSE 6339 - Data Exploration

  12. R S X X X X N = 2 CSE 6339 - Data Exploration

  13. R S X X X X X X X X X N = 3 CSE 6339 - Data Exploration

  14. Ripple Join Algorithm For(max=1 to infinity) { for(i=1 to max-1) if(predicate(R[i],s[max])) output(R[i],S[max]); for(i=1 to max) if(predicate(R[max],s[i])) output(R[max],S[i]); } CSE 6339 - Data Exploration

  15. Ripple Join Iterator • An iterator based DBMS invokes an iterator’s next() method each time an output tuple is needed. • The iterator needs to store the next position to be fetched from each of its inputs R and S. CSE 6339 - Data Exploration

  16. Pipelining • Can easily be pipelined for multiple binary joins • Cannot do three-table joins as two binary ripple joins. CSE 6339 - Data Exploration

  17. Ripple Join Variants • Block Ripple Join • Hash Ripple Join • Index Ripple Join CSE 6339 - Data Exploration

  18. Block Ripple Join • Takes disk blocks of R and S in turn (not tuples) • Read a disk block of R and scan against old S • Evict from memory • Read Block of S and compare with older R tuples. • Has I/O saving since each block is taken out at a time. CSE 6339 - Data Exploration

  19. Index and Hash Ripple Joins • Index Ripple Join • Identical to indexed-enhanced nested loop join • Hash Ripple Join • Used only for Equijoin queries. CSE 6339 - Data Exploration

  20. Statistical Considerations • Goal-to provide efficient, accurate, interactive estimation. • Estimator unbiased, consistent • Running average is biased but consistent • Capable of giving tight confidence intervals CSE 6339 - Data Exploration

  21. Aspect Ratios • Aspect ratio: how many tuples are retrieved from each base relation per sampling step. e.g. β1 = 1, β2 = 3, … • Ripple join adjusts the aspect ratio according to the sizes of the base relations. CSE 6339 - Data Exploration

  22. Why is it called Ripple Join? • The algorithm seems to ripple out from a corner of the join. • Acronym: "Rectangles of Increasing Perimeter Length" CSE 6339 - Data Exploration

  23. Performance CSE 6339 - Data Exploration

  24. Conclusions and Future Work • Complete implementation of online aggregation must be able to handle multi-table queries. • This paper introduces ripple joins, a family of join algorithms designed to meet the performance needs of online aggregation system. CSE 6339 - Data Exploration

  25. Though ripple joins are symmetric, it is still not clear how a query optimizer should choose among the ripple join variants, nor how it should order a sequence of ripple joins. CSE 6339 - Data Exploration

  26. References • Haas & Hellerstein, “Ripple Joins for Online Aggregation” (SIGMOD ’99) • Haas & Hellerstein, “Online Query Processing: A Tutorial” • P. J Haas, J.M Hellerstein and H.J Wang Online aggregation. In Proc. 1997 ACM SIGMOD Intl Conf. Management of data pages. CSE 6339 - Data Exploration

More Related