1 / 17

Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang. Presented by Archana Vijayalakshmanan. Contents. Introduction Example Advantages Requirements Approaches to building a system System issues Conclusion. +. AVG. Query Results. 3.262574342.

verda
Download Presentation

Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online AggregationJoseph M. HellersteinPeter J.HaasHelen J.Wang Presented by Archana Vijayalakshmanan

  2. Contents • Introduction • Example • Advantages • Requirements • Approaches to building a system • System issues • Conclusion

  3. + AVG Query Results 3.262574342 Online Aggregation: Motivation Select AVG(grade) from ENROLL; • A “fancy” interface:

  4. A Better Approach • Don’t process in batch! Online aggregation:

  5. Example Select AVG(grade) from ENROLL GROUP BY major;

  6. Advantages • stopping condition set on the fly! • statistical techniques are more sophisticated • can handle GROUP BY w/o a priori knowledge

  7. Usability Continuous output non-blocking query plans time/precision control fairness/partiality Performance time to accuracy time to completion pacing Requirements

  8. A Naive Approach SELECT running_avg(final_grade), running_confidence(final_grade), running_interval(final_grade) FROM grades; • No grouping • Can’t meet performance & usability needs: • no guarantee of continuous output • no guarantee of fairness (or control over partiality) • no control over pacing

  9. Random Access to Data • Heap Scan • OK if clustering uncorrelated to agg & grouping attrs • Index Scan • can scan an index on attrs uncorrelated to agg or grouping • Sampling from indices • could introduce new sampling access methods (e.g. Olken’s work)

  10. Group By & Distinct • Can’t sort! • sorting blocks • sorting is unfair • Must use hash-based techniques • non-blocking approach but do not scale gracefully. • Hybrid Hashing. • “Hybrid Cache” even better.

  11. Index Striding • For fair Group By: • read tuples in round-robin fashion. (want random tuple from Group 1, random tuple from Group 2, ...) • each group is updated at appropriate rate. • gives info/speed match!

  12. Join Algorithms • Non-Blocking Joins • no sorting! • merge join OK, but watch for the sorted output • hybrid hash not great • symmetric pipeline hash • nested loops always good, can be too slow

  13. Query Optimization • Avoid sorting • Blocking sub-operations 2 components in cost function: • dead time (td ): time spent doing “invisible” work -- tax this at a high rate! • output time (to ): time spent producing output • Preference to plans that maximize user control e.g., index striding

  14. Extended Aggregate Functions • Basically,aggregate functions must provide running estimates SUM,COUNT-straight forward VAR,STD DEV-algorithms • return confidence intervals

  15. API Current API uses built-in methods • e.g., StopGroup(cursor,groupval) speedUpGroup(cursor,groupval) slowDownGroup(cursor,groupval) setSkipFactor(cursor name,integer)

  16. Future Work • Better UI -online data visualization (Tioga DataSplash) • data viz = “graphical” aggregate - “drill down” and roll up” facilities • Nested Queries • Control w/o Indices • Checkpointing/continuation • Tracking online queries • Extensions of statistical results

  17. References control.cs.berkeley.edu/online/olamd/olamd.PPT

More Related