1 / 28

Incremental Maintenance for Non-Distributive Aggregate Functions

Incremental Maintenance for Non-Distributive Aggregate Functions. Themis Palpanas (U of Toronto) Richard Sidle Bobbie Cochrane Hamid Pirahesh. work done at IBM Almaden Research Center. Motivation. large amounts of data stored in databases often times data warehouses are used

savea
Download Presentation

Incremental Maintenance for Non-Distributive Aggregate Functions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Incremental Maintenance for Non-Distributive Aggregate Functions Themis Palpanas (U of Toronto) Richard Sidle Bobbie Cochrane Hamid Pirahesh work done at IBM Almaden Research Center

  2. Motivation • large amounts of data stored in databases • often times data warehouses are used • consolidate data from many sources • offer more general and descriptive view of data • queried by business intelligence tools and decision support systems • produce expensive OLAP queries • these OLAP queries have nice properties: • based on same set of tables • perform similar aggregations Themis Palpanas - U of Toronto

  3. Motivation (cont’d) • can efficiently support such queries with Automatic Summary Tables (ASTs) • materialized queries defined over a set of base tables • precomputed once, used many times • answer complex queries fast • must maintain ASTs when base tables change • inserts, updates, deletes Themis Palpanas - U of Toronto

  4. base tables insert/update/delete Motivation (cont’d) AST AST definition

  5. Aggregate Functions • characterization of functions wrt insertion and deletion operations • updates are series of deletions and insertions • distributive aggregate functions • new value computed based on old value and value of operation • SUM() • non-distributive aggregate functions • above property does not hold • STDDEV() • MIN() (because of deletions) Themis Palpanas - U of Toronto

  6. Problem Statement • given ASTs with aggregate functions • distributive • SUM, COUNT • non-distributive • STDDEV, CORRELATION, REGRESSION, MIN/MAX, XMLAGG, … • when base tables change • incrementally maintain affected ASTs efficient maintenance of ASTs with non-distributive aggregate functions Themis Palpanas - U of Toronto

  7. Outline • Current Approach • Our Solution • Experimental Evaluation • Related Work • Conclusions Themis Palpanas - U of Toronto

  8. Propagate phase Apply phase combine old and new values base tables insert/update/delete Current Approach AST delta AST definition

  9. Current Approach (cont’d) • works for distributive • SUM, COUNT • does not work for non-distributive • STDDEV, CORRELATION, REGRESSION • MIN/MAX • XMLAGG • need new way to deal with these functions Themis Palpanas - U of Toronto

  10. Our Solution • selective recomputation • no longer enough to compute delta • must recompute some aggregation groups • minimize work to be done • choose which groups to recompute • optimize query plan Themis Palpanas - U of Toronto

  11. Propagate phase Apply phase base tables insert/update/delete Our Solution (cont’d) combine old and new values recompute affected groups AST delta AST definition

  12. Our Solution (cont’d) • the 5 steps • compute new aggregate values • change column derivation • recompute only affected groups • eliminate unnecessary operations • optimize for special cases Themis Palpanas - U of Toronto

  13. UDI LOJ AST prop Initial Query Plan • Query Graph Model (QGM) Themis Palpanas - U of Toronto

  14. UDI LOJ AST LOJ prop AST 1. Compute New Aggregate Values • compute delta for distributive functions • recompute non-distributive functions • get those values only for affected groups • duplicate computation for distributive functions! Themis Palpanas - U of Toronto

  15. 2. Change Column Derivation UDI • change column derivation • rewrite phase projects out unused columns • entire AST gets recomputed! LOJ AST LOJ non-distributive only prop AST distributive only Themis Palpanas - U of Toronto

  16. 2. Change Column Derivation • example AST: SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary) FROM employees GROUP BY dept_id • result of COUNT() computed from old propagate phase • results of MAX() and STDDEV() from AST definition Themis Palpanas - U of Toronto

  17. 3. Recompute Affected Groups • push join predicate down in AST • only affected groups are recomputed • special rules for super-aggregates • GROUPING SETS • ROLLUP • CUBE UDI non-distributive only LOJ AST LOJ distributive only AST* J J prop … T1 Tk Themis Palpanas - U of Toronto

  18. 3. Recompute Affected Groups • special treatment for ASTs with super-aggregates • predicates not pushdownable • caution not to compute totals of totals • build special join predicate • ensure correct aggregations • change rewrite rules • allow predicate pushdown through super aggregates • applicable only for special join predicate Themis Palpanas - U of Toronto

  19. 4. Remove Unnecessary Operations • outerjoin not always needed • when changes are only inserts • reroute columns from propagate phase through AST • remove outerjoin operator • same for updates not referencing AST grouping columns and predicates UDI LOJ all columns AST distributive only AST J J prop … T1 Tk Themis Palpanas - U of Toronto

  20. 4. Remove Unnecessary Operations • example AST: SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary) FROM employees GROUP BY dept_id • modification on base tables: UPDATE employees SET salary=10 WHERE age>40 • outerjoin operation will not be built • update does not refer to grouping column (dept_id), and no predicate in AST refers to updated column (salary) • certain that no tuples in AST will be deleted • only STDDEV() will be recomputed • the rest are not affected by changes Themis Palpanas - U of Toronto

  21. 5. Optimize for Special Cases • recomputation step not needed when • only insertions and only MIN/MAX functions • build predicate in apply phase • check if new min/max should replace old values • only deletions referring only to grouping columns of AST • can only cause entire groups to be deleted • handled in apply phase Themis Palpanas - U of Toronto

  22. 5. Optimize for Special Cases • example AST: SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary) FROM employees GROUP BY dept_id • modification on base tables: DELETE FROM employees WHERE dept_id>40 • selective recomputation step not needed • deletion refers only to grouping column (dept_id) • certain that entire groups will be deleted from AST • no other groups will be affected Themis Palpanas - U of Toronto

  23. Experimental Evaluation • prototype implementation in IBM DB2 UDB • star schema database • sales of products over 5 year time period • fact table: 10 million tuples • AST with non-distributive aggregate function • 240,000 tuples • workload simulates nightly updates • add/delete data for first day of month • add/delete data for second day of month • add/delete data for full month Themis Palpanas - U of Toronto

  24. Experimental Evaluation (cont’d) • deletions require 40-60% of full refresh time • optimized deletions require 1-4% of full refresh time Themis Palpanas - U of Toronto

  25. Experimental Evaluation (cont’d) • insertions/updates require 20-25% of full refresh time Themis Palpanas - U of Toronto

  26. Related Work • incremental view maintenance • differential refresh algorithms • Lindsay et al. 1986, Blakeley et al. 1986, Qian and Wiederhold 1991, Ceri and Widom 1991 • deferred incremental maintenance • Colby et al. 1996, Salem et al. 2000 • views with aggregation • Quass 1996, Mumick et al. 1997 Themis Palpanas - U of Toronto

  27. Conclusions • incremental maintenance for ASTs with non-distributive aggregate functions • support MIN/MAX, STDDEV, CORRELATION, REGRESSION, XMLAGG, … • efficient selective recomputation • recompute only affected groups • optimize query plan • customize for special cases • significant performance improvements Themis Palpanas - U of Toronto

  28. Future Work • examine use of work areas • temporary storage space • store intermediate values • maintenance without recomputation • STDDEV, MIN/MAX(?), … • very helpful for ASTs defined with super-aggregates • ASTs with HAVING clauses • do not know when groups will enter/leave AST Themis Palpanas - U of Toronto

More Related