1 / 41

A First Look at Materialized Query Table (MQT) in DB2 LUW

A First Look at Materialized Query Table (MQT) in DB2 LUW. Petrus Chan, IBM Toronto Lab, petrus@ca.ibm.com. Agenda. Motivation Sample scenario How MQT can help Different types of MQTs & Maintenance MQT Matching – what queries can benefit from MQTs Design Considerations Other Uses of MQT.

warreng
Download Presentation

A First Look at Materialized Query Table (MQT) in DB2 LUW

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A First Look at Materialized Query Table (MQT) in DB2 LUW Petrus Chan, IBM Toronto Lab, petrus@ca.ibm.com

  2. Agenda • Motivation • Sample scenario • How MQT can help • Different types of MQTs & Maintenance • MQT Matching – what queries can benefit from MQTs • Design Considerations • Other Uses of MQT

  3. Database Schema for Our Sample Scenario STORE Star Schema Store_id Region_id … PRODUCT SALES Product_id Class_id Group_id Family_id Line_id Division_id … Product_id Store_id Date_id Amount Quantity … TIME Date_id Month_id Quarter_id Year_id

  4. Dimensions Hierarchy Product Dimension Division Level 5 Line Level 4 Time Dimension Family Level 3 Year Quarter Group Level 2 Store Dimension Region Month Class Level 1 Day Store Product Level 0 Sales Fact

  5. Sample Query Q9: Total sales of each product group at each store for December 2008. SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND t.month_id = 200812 GROUP BY t.month_id, f.store_id, p.group_id

  6. Q9 Q9 Month=200812 Month=200812 Joe’s Query Joe’s Q Bob’s Q Group=Laser Printer Group=Laser Printer Store=McLean Compute Once & Reuse GB GB GB GB JOIN JOIN JOIN JOIN JOIN JOIN JOIN Dim2 Dim2 Dim2 JOIN Dim2 Bob’s Query Fact Fact Fact Dim1 Dim1 Dim1 Store=McLean MQT Fact Dim1 Benefits of MQTs

  7. What is an MQT? • MQT stands for Materialized Query Table. • Basic Idea: • Query results caching - pre-compute and store (in database tables) the results of frequently issued queries • Query optimizer automatically and transparently replace portions of future queries with direct references to the pre-computed (i.e., materialized) results. This is referred to as MQT routing or matching. • Standard technique in data warehousing, also known as aggregate tables and aggregate (re-)routing, or alternatively, as materialized views.

  8. Usages of MQTs – At a High Level • Aggregate data over one or more dimensions • Joins and aggregate data over a group of tables • Data from a commonly accessed subset of data, that is, from a "hot" horizontal or vertical database partition • Repartitioned data from a table, or part of a table, in a partitioned database environment

  9. How query can be routed to MQT? MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, f.store_id, p.group_id Q9: Original Query SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND t.month_id = 200812 GROUP BY t.month_id, f.store_id, p.group_id Q9: Rewritten Query SELECT m.month_id, m.store_id, m.group_id, m.total_sales FROM MQT_SalesByMthStrGrp m WHERE m.month_id = 200812 Route Query to MQT

  10. MQT Creation CREATE TABLE MQT_SalesByMthStrGrp AS • (SELECT t.month_id, • f.store_id, • p.group_id, • sum(amount) as total_sales • FROM sales f, product p, time t • WHERE f.date_id = t.date_id AND • f.product_id = p.product_id • GROUP BY t.month_id, f.store_id, p.group_id) DATA INITIALLY DEFERRED REFRESH DEFERRED; REFRESH TABLE MQT_SalesByMthStrGrp; RUNSTATS ON TABLE petrus.MQT_SalesByMthStrGrp WITH DISTRIBUTION AND DETAILED INDEXES ALL; -- Confirm that MQT is being used by -- (a) setting CURRENT REFRESH AGE to ANY -- (b) explaining the query used as -- the MQT’s definition • MQTs are created using the CREATE TABLE statement • Immediately after creation, the MQT is empty and not accessible. It is populated using the REFRESH TABLE statement • After running the REFRESH, the MQT becomes accessible

  11. Agenda • Motivation • Sample scenario • How MQT can help • Types of MQTs & Maintenance • MQT Matching – what queries can benefit from MQTs • Design Considerations • Other Uses of MQT

  12. Types of MQTs MQT User Maintained System Maintained Refresh DEFERRED Refresh IMMEDIATE WITHOUT staging table WITH staging table

  13. MQT Maintenance • Process of synchronizing the MQT data with changes to the underlying tables • Incremental Maintenance • Immediate Propagate Immediate Applied (IPIA) • Full Maintenance • Hybrid • Immediate Propagate Deferred Applied (IPDA)

  14. Combine old and new values Pair Delta to MQT base base base BaseTables tables tables tables Incremental Maintenance Immediate Propagate Immediate Apply • the ability to maintain the MQT without the need to access the modified base table, only using the delta information (newly inserted/updated/deleted data) and the MQT itself. • REFRESH IMMEDIDATE MQTs are maintained incrementally during: • INSERT/UPDATE/DELETE of base table processing • REFRESH TABLE processing after LOAD INSERT into the base table. • Immediate Propagate, Immediate Apply • For performance reasons, only incrementally maintainable MQTs can be defined as REFRESH IMMEDIATE (hence more restrictions in MQT definition) DeltaAggregate MQT Delta select/join Primary-key: grouping columns

  15. base base base BaseTables tables tables tables Full Maintenance MQT • MQTs are fully rebuilt by re-evaluating the query in the MQT definition • REFRESH DEFERRED MQTs (without staging table): • not maintained during INSERT/UPDATE/DELETE operations on the base table. • synchronized with base table using the REFRESH TABLE statement Aggregate select/join

  16. Combine old and new values Pair Delta to AST base base tables tables Hybrid - Immediate Propagate, Deferred Applied Immediate Propagate Deferred Apply • REFRESH DEFERRED MQTs with staging table • Delta changes to MQT are computed upon INSERT/UPDATE/DELETE of base tables • Delta aggregate is recorded in a separate staging table • Subsequent REFRESH of MQT can be done incrementally • Immediate Propagate, Deferred Apply • One staging table per MQT CREATE TABLE SALES_BY_YEAR_STG FOR SALES_BY_YEAR PROPAGATE IMMEDIATE; DeltaAggregate AST Net-effect Delta select/join BaseTables base tables Primary-key: grouping columns StagingTables Staging

  17. User Maintained MQT • User is responsible for maintaining MQT content • REFRESH TABLE disallowed • Insert/update/delete on MQT allowed • Must be REFRESH DEFERRED • Scenarios where user maintained MQTs are • For efficiency reasons, when the user is convinced that (s)he can implement MQT maintenance far more efficiently than the mechanisms used by DB2. • For leveraging existing “user maintained” summaries, where the user wants DB2 to automatically consider them for optimization for new ad hoc queries being executed against the base tables.

  18. Refresh Considerations

  19. Agenda • Motivation • Sample scenario • How MQT can help • Types of MQTs & Maintenance • MQT Matching – what queries can benefit from MQTs • Design Considerations • Other Uses of MQT

  20. MQT Matching MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id • Process to automatically use the MQT to derive results for a query for improved performance Q9: Original Query SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND t.month_id = 200812 GROUP BY t.month_id, f.store_id, p.group_id Q9: Rewritten Query SELECT m.month_id, m.store_id, m.group_id, m.total_sales FROM MQT_SalesByMthStrGrp m WHERE m.month_id = 200812 Route Query to MQT

  21. What queries can benefit from an MQT? • Queries • that matches an MQT perfectly! • with predicates that are a subset of those in MQT • with more table joins than MQT (re-join) • with fewer table joins than MQT (need RI constraint) • with columns not in MQT definition (need functional dependencies) • at a higher aggregation level than an MQT

  22. Q9 Q9 Month=200812 Month=200812 Joe’s Query Joe’s Q Bob’s Q Group=Laser Printer Group=Laser Printer Store=McLean Compute Once & Reuse GB GB GB GB JOIN JOIN JOIN JOIN JOIN JOIN JOIN Dim2 Dim2 Dim2 JOIN Dim2 Bob’s Query Fact Fact Fact Dim1 Dim1 Dim1 Store=McLean MQT Fact Dim1 Query with predicates that are a subset of those in MQT

  23. Query with more table joins than MQT If the extra table join is through a column in group-by list, match and re-join with tables not in MQT Rewritten Query: GB GB Original Query: Re-join Store MQT Store Time GB MQT: Sales Product Time Sales Product

  24. Query with more table joins than MQT (Example) Rewritten Query SELECT m.month_id, s.region_id, m.group_id, sum(m.total_sales) FROM MQT_SalesByMthStrGrp m, store s WHERE m.stored_id = s.store_id GROUP BY m.month_id, s.region_id, m.group_id Original Query SELECT t.month_id, s.region_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t, store s WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND f.store_id = s.store_id GROUP BY t.month_id, s.region_id, p.group_id Route Query to MQT MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id

  25. Queries with fewer joins than MQT • In general, we can’t route queries to an MQT that has more table joins because the extra joins affects the MQT content • However, if the extra joins in MQT are RI-joins, i.e. joins with RI parent table with join predicate of the form: child.fk = parent.pk They are “look-up” joins that do not add or delete rows • RI-join is common between Fact and Dimension tables. • Need to declare RI constraints (even just informational constraints) to tell DB2 about it GB MQT: Query: GB RI-join Product Sales Time Sales Time

  26. Queries with fewer joins than MQT (Example) Rewritten Query SELECT m.month_id, m.store_id, sum(m.total_sales) FROM MQT_SalesByMthStrGrp m GROUP BY m.month_id, m.store_id Original Query SELECT t.month_id, f.store_id, sum(amount) as total_sales FROM sales f, time t WHERE f.date_id = t.date_id GROUP BY t.month_id, f.store_id Route Query to MQT MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id alter table sales add constraint fk_product foreign key (product_id) references product; Sales (product_id int NOT NULL)

  27. Importance of NOT NULL Constraint on Foreign Key Columns • Declaring NOT NULL constraint on the foreign key column of the child table is important! • If a row of column SALES.product_id has null value: • the row will not qualify the join predicate f.product_id = p.product_id. • The MQT would be missing some rows that are needed by the query, hence incorrect to route. MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id

  28. Queries at higher aggregation level than an MQT Product Dimension • MQT for the Store-Month-Group slice • Queries that access data at or above that slice can be satisfied by the MQT with some additional aggregating. • Querying more detailed data below the slice, such as Day or Product, cannot be routed to this MQT. Level 5 Division Level 4 Line Time Dimension Level 3 Family Year Level 2 Quarter Group Store Dimension Level 1 Region Month Class Day Store Level 0 Product Sales Fact

  29. Queries at a higher level aggregation - Example MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id Functional Dependency TIME (month_id→quarter_id) Original Query SELECT t.quarter_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.quarter_id, f.store_id, p.group_id Rewritten Query SELECT t.quarer_id, m.store_id, m.group_id, sum(m.total_sales) FROM MQT_SalesByMthStrGrp m, time t WHERE m.month_id = t.month_id GROUP BY t.quarter_id, m.store_id, m.group_id Route Query to MQT

  30. Functional Dependencies Product Dimension • X functionally determines Y iff each X value is associated with precisely one Y value. • By definition, all columns of a relation are functionally dependent on the unique/primary key. • De-normalization introduces functional dependencies between non-key columns. • Between ID and descriptive columns • Example: group_id → group_desc • Between different levels of the hierarchy • Example: group_id → (family_id, line_id, division_id) Division Line Family Group Class Product

  31. Functional Dependencies in DB2 • User can help DB2 identifies the functional dependencies among non-key columns by declaring NOT ENFORCED FD constraints: ALTER TABLE PRODUCT ADD CHECK((group_desc) DETERMINED BY group_id) NOT ENFORCED ENABLE QUERY OPTIMIZATION • During MQT routing, DB2 will re-join with the table to pick up the “dependents”

  32. (year-region-division) GB FD re-join (month_id, year_id) TIME FD re-join (group_id, division_desc) FD re-join (store_id, region_desc) MQT (month-store-group) PRODUCT PRODUCT Functional Dependencies - Example • Query at year-region-division level: • Select t.year_id, s.region_desc, p.division_desc, • sum(amount) as total_sales • ... • GROUP BY t.year_id, s.region_desc, p.division_desc; MQT at the month-store- group aggregate level: • SELECT t.month_id, • t.year_id, • f.store_id, • s.region_id, s.region_desc, • p.group_id, • p.division_id, p.division_desc, • sum(amount) as total_sales • FROM sales f, product p, time t • ,stored s • WHERE f.date_id = t.date_id • AND f.product_id = p.product_id • AND f.store_id = s.store_id • GROUP BY t.month_id, • t.year_id, • s.store_id, • s.region_id, s.region_desc, • p.group_id • ,p.division_id, p.division_desc FD: TIME: month_id -> year_id UK: STORE store_id -> (region_id, region_desc) FD: PRODUCT: group_id -> (division_id, division_desc)

  33. Functional Dependencies in MQT • It is a good practice to declare functional dependencies to keep the minimum amount of dimensional information in the MQT. This will: • keep the size of the MQT small • Keep the MQT unaffected by dimension table updates - minimize the need for maintenance

  34. MQT Matching – Insurance Policy SQL Query • MQT matching performed during Query Rewrite • Rewrite transformations mostly rule/heuristic based • Insurance Policy: • Optimizer Cost based comparison between: • Access plan with MQT routing • Access plan WITHOUT MQT routing • Cheapest plan wins - against choosing a wrong heuristic Parser Query Graph Semantics With and without Aggregate MQT Query Rewrite Optimizer Code Generator

  35. Agenda • Motivation • Sample scenario • How MQT can help • Types of MQTs & Maintenance • MQT Matching – what queries can benefit from MQTs • Design Considerations • Other Uses of MQT

  36. Product Dim. Level 5 Division Level 4 Line Time Dim. Level 3 Family Year Level 2 Quarter Group Store Dim. Region Level 1 Month Class Day Store Level 0 Product Sales Fact Design Considerations • On usage of MQT • Space requirement – MQTs are materialized tables • On MQT definition • Level of aggregation vs. query processing time • Too high level – few queries can route to it • Too low level – costly compensation time • Workload-based vs. Model-based • Numbers of MQTs

  37. Refreshing with finer granularity MQTs • DB2 9 allows the use of existing MQTs to be used by higher level MQTs during REFRESH • Matching done during REFRESH • It is very important to aggregate in the right order • Example : • MQT1 : GROUP BY DATE, MONTH, YEAR • MQT2 : GROUP BY MONTH, YEAR • REFRESH MQT1 • REFRESH MQT2  uses MQT1

  38. Store Store Store Another Use of Replicated MQTs • In partitioned database, colocation of rows involved in different tables => efficient join (colocated joins) • In a typical star schema setup, the fact table can be co-located with at most one dimension table. • The remaining (smaller) dimension tables are frequently replicated to all nodes of the fact table • Replicated MQTs are used to achieve co-location of data for efficient join processing in partitioned database environment Store Product ProdID3 Product ProdID2 Product ProdID1 Sales ProdID3 Sales ProdID2 Sales ProdID1

  39. Summary • An MQT is a table that stores pre-computed query results. • Queries can be matched to an MQT to improve performance. • Different types of MQTs and their maintenance: • immediate, deferred (with/without staging table), user maintained • replicated • Queries can be matched to an MQT that has: • Predicates that are superset of those in queries • Finer grained aggregation level • More/few table joins • Not null constraint, foreign key constraints and functional dependencies can be defined to help MQT matching

  40. References • Redbooks (http://www.redbooks.ibm.com) • DB2 UDB's High Function Business Intelligence in e-Business (SG24-6546-00) • Related Papers • Wolfgang Lehner, Richard Sidle, Hamid Pirahesh, Roberta Cochrane: Maintenance of Automatic Summary Tables. SIGMOD Conference 2000: 512-513 • Markos Zaharioudakis, Roberta Cochrane, George Lapis, Hamid Pirahesh, Monica Urata: Answering Complex SQL Queries Using Automatic Summary Tables. SIGMOD Conference 2000: 105-116

  41. Thank You

More Related