1 / 56

Chapter 16

Chapter 16. Data Warehouse Technology and Management. Outline. Basic concepts and characteristics Business architectures and applications Data cube concepts and operators Relational DBMS features Maintaining a data warehouse. Comparison of Environments. Transaction processing

Download Presentation

Chapter 16

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 16 Data Warehouse Technology and Management

  2. Outline • Basic concepts and characteristics • Business architectures and applications • Data cube concepts and operators • Relational DBMS features • Maintaining a data warehouse

  3. Comparison of Environments • Transaction processing • Uses operational databases • Short-term decisions: fulfill orders, resolve complaints, provide staffing • Decision support processing • Uses integrated and summarized data • Medium and long-term decisions: capacity planning, store locations, new lines of business

  4. Definition and Characteristics • A central repository for summarized and integrated data from operational databases and external data sources • Key Characteristics • Subject-oriented • Integrated • Time-variant • Nonvolatile

  5. Data Comparison

  6. Architectures and Applications • Data warehouse projects • Top-down architectures • Bottom-up architecture • Applications and data mining

  7. Data Warehouse Projects • Large efforts with much coordination across departments • Enterprise data model • Important artifact of data warehouse project • Structure of data model • Meta data for data transformation • Top-down vs. bottom-up business architectures

  8. Two Tier Architecture

  9. Three Tier Architecture

  10. Bottom-up Architecture

  11. Applications

  12. Maturity Model • Guidance for investment decisions • Stages provide a framework to view an organization’s progress • Insights: difficulty moving between stages • Infant to child stages because of investment level • Teenager to adult because of strategic importance of data warehouse

  13. Data Mining • Discover significant, implicit patterns • Target promotions • Change mix and collocation of items • Requires large volumes of transaction data • Important application for data warehouses

  14. Data Cube Concepts and Operators • Basics • Dimension and measure details • Operators

  15. Data Cube Basics • Multidimensional arrangement of data • Users think about decision support data as data cubes • Terminology • Dimension: subject label for a row or column • Member: value of dimension • Measure: quantitative data stored in cells

  16. Data Cube Example

  17. Dimensions and Measures • Dimensions • Hierarchies: members can have sub members • Sparsity: many cells do not have data • Measures • Derived measures • Multiple measures in cells

  18. Time Series Data • Common data type in trend analysis • Reduce dimensionality using time series • Time series properties • Data type • Start date • Calendar • Periodicity • Conversion

  19. Slice Operator • Focus on a subset of dimensions • Set dimension to specific value: 1/1/2006

  20. Dice Operator • Focus on a subset of member values • Replace dimension with a subset of values • Dice operation often follows a slice operation

  21. Other Operators • Operators for hierarchical dimensions • Drill-down: add detail to a dimension • Roll-up: remove detail from a dimension • Recalculate measure values • Pivot: rearrange dimensions

  22. Operator Summary

  23. Relational DBMS Support • Data modeling • Dimension representation • GROUP BY extensions • Materialized views and query rewriting • Storage structures and optimization

  24. Relational Data Modeling • Dimension table: contains member values • Fact table: contains measure values • 1-M relationships from dimension to fact tables • Grain: most detailed measure values stored

  25. Star Schema Example

  26. Constellation Schema

  27. Snowflake Schema Example

  28. Handling M-N Relationships • Source data may have M-N relationships, not 1-M relationships • Adjust fact or dimension tables for a fixed number of exceptions • More complex solutions to support M-N relationships with a variable number of connections

  29. Time Representation • Timestamp • Time dimension table for organization specific calendar features • Two fact tables for international operations • Accumulating fact table for representation of multiple events

  30. Level of Historical Integrity • Primarily an issue for dimension updates • Type I: overwrite old values • Type II: version numbers for an unlimited history • Type III: new columns for a limited history

  31. Historical Integrity Example

  32. Dimension Representation • Star schema and variations lack dimension representation • Explicit dimension representation important to data cube operations and optimization • Proprietary extensions for dimension representation • Represent levels, hierarchies, and constraints

  33. Oracle Dimension Representation • Levels: dimension components • Hierarchies: may have multiple hierarchies • Constraints: functional dependency relationships

  34. CREATE DIMENSION Example CREATE DIMENSION StoreDim LEVEL StoreId IS Store.StoreId LEVEL City IS Store.StoreCity LEVEL State IS Store.StoreState LEVEL Zip IS Store.StoreZip LEVEL Nation IS Store.StoreNation LEVEL DivId IS Division.DivId HIERARCHY CityRollup ( StoreId CHILD OF City CHILD OF State CHILD OF Nation ) HIERARCHY ZipRollup ( StoreId CHILD OF Zip CHILD OF State CHILD OF Nation ) HIERARCHY DivisionRollup ( StoreId CHILD OF DivId JOIN KEY Store.DivId REFERENCES DivId ) ATTRIBUTE DivId DETERMINES Division.DivName ATTRIBUTE DivId DETERMINES Division.DivManager ;

  35. GROUP BY Extensions • ROLLUP operator • CUBE operator • GROUPING SETS operator • Other extensions • Ranking • Ratios • Moving summary values

  36. CUBE Example SELECT StoreZip, TimeMonth, SUM(SalesDollar) AS SumSales FROM Sales, Store, Time WHERE Sales.StoreId = Store.StoreId AND Sales.TimeNo = Time.TimeNo AND (StoreNation = 'USA' OR StoreNation = 'Canada') AND TimeYear = 2005 GROUP BY CUBE (StoreZip, TimeMonth)

  37. ROLLUP Example SELECT TimeMonth, TimeYear, SUM(SalesDollar) AS SumSales FROM Sales, Store, Time WHERE Sales.StoreId = Store.StoreId AND Sales.TimeNo = Time.TimeNo AND (StoreNation = 'USA' OR StoreNation = 'Canada') AND TimeYear BETWEEN 2005 AND 2006 GROUP BY ROLLUP (TimeMonth,TimeYear);

  38. GROUPING SETS Example SELECT StoreZip, TimeMonth, SUM(SalesDollar) AS SumSales FROM Sales, Store, Time WHERE Sales.StoreId = Store.StoreId AND Sales.TimeNo = Time.TimeNo AND (StoreNation = 'USA' OR StoreNation = 'Canada') AND TimeYear = 2005 GROUP BY GROUPING SETS((StoreZip, TimeMonth), StoreZip, TimeMonth, ());

  39. Variations of the Grouping Operators • Partial cube • Partial rollup • Composite columns • CUBE and ROLLUP inside a GROUPIING SETS operation

  40. Materialized Views • Stored view • Periodically refreshed with source data • Usually contain summary data • Fast query response for summary data • Appropriate in query dominant environments

  41. Materialized View Example CREATE MATERIALIZED VIEW MV1 BUILD IMMEDIATE REFRESH COMPLETE ON DEMAND ENABLE QUERY REWRITE AS SELECT StoreState, TimeYear, SUM(SalesDollar) AS SUMDollar1 FROM Sales, Store, Time WHERE Sales.StoreId = Store.StoreId AND Sales.TimeNo = Time.TimeNo AND TimeYear > 2003 GROUP BY StoreState, TimeYear;

  42. Query Rewriting • Substitution process • Materialized view replaces references to fact and dimension tables in a query • Query optimizer must evaluate whether the substitution will improve performance over the original query • More complex than query modification process for traditional views

  43. Query Rewriting Process

  44. Query Rewriting Matching • Row conditions: query conditions at least as restrictive as MV conditions • Grouping detail: query grouping columns at least as general as MV grouping columns • Grouping dependencies: query columns must match or be derivable by functional dependencies • Aggregate functions: query aggregate functions must match or be derivable from MV aggregate functions

  45. Query Rewriting Example -- Data warehouse query SELECT StoreState, TimeYear, SUM(SalesDollar) FROM Sales, Store, Time WHERE Sales.StoreId = Store.StoreId AND Sales.TimeNo = Time.TimeNo AND StoreNation IN ('USA','Canada') AND TimeYear = 2005 GROUP BY StoreState, TimeYear; -- Query Rewrite: replace Sales and Time tables with MV1 SELECT DISTINCT MV1.StoreState, TimeYear, SumDollar1 FROM MV1, Store WHERE MV1.StoreState = Store.StoreState AND TimeYear = 2005 AND StoreNation IN ('USA','Canada');

  46. Storage and Optimization Technologies • MOLAP: direct storage and manipulation of data cubes • ROLAP: relational extensions to support multidimensional data • HOLAP: combine MOLAP and ROLAP storage engines

  47. ROLAP Techniques • Bitmap join indexes • Star join optimization • Query rewriting • Summary storage advisors • Parallel query execution

  48. Maintaining a Data Warehouse • Data sources • Workflow representation • Optimizing the refresh process

  49. Data Sources • Cooperative: • Notification using triggers • Requires source system changes • Logged • Readily available • Extraneous data in logs • Queryable • Queries using timestamps • Requires timestamps in source data • Snapshot • Periodic dumps of source data • Significant processing for difference operations

  50. Maintenance Workflow

More Related