1 / 76

Chapter 16

Chapter 16. Data Warehouse Concepts and Design. Outline. Basic concepts and characteristics Business architectures and applications Data cube concepts and operators Relational database design for data warehouses Enterprise data warehouse development. Historical Perspective.

Download Presentation

Chapter 16

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 16 Data Warehouse Concepts and Design

  2. Outline • Basic concepts and characteristics • Business architectures and applications • Data cube concepts and operators • Relational database design for data warehouses • Enterprise data warehouse development

  3. Historical Perspective Failure of relational database technology for decision making Failure causes Lack of decision-making value of individual databases Infeasible to use a database for both operational support and decision making Missing features for summary data Data warehouse technology and organizational deployment developed in 1990s to address these issues.

  4. Comparison of Environments • Transaction processing • Uses operational databases • Short-term decisions: fulfill orders, resolve complaints, provide staffing • Decision support processing • Uses integrated and summarized data • Medium and long-term decisions: capacity planning, store locations, new lines of business

  5. Definition and Characteristics • A central repository for summarized and integrated data from operational databases and external data sources • Key Characteristics • Subject-oriented • Integrated • Time-variant • Nonvolatile

  6. Data Comparison

  7. Architectures and Applications • Data warehouse projects • Top-down architectures • Bottom-up architecture • Applications and data mining

  8. Data Warehouse Projects • Large efforts with much coordination across departments • Enterprise data model • Important artifact of data warehouse project • Structure of data model • Meta data for data transformation • Top-down vs. bottom-up business architectures

  9. Two Tier Architecture

  10. Three Tier Architecture

  11. Three Tier Architecture with Staging Area

  12. Bottom-up Architecture

  13. Applications

  14. Maturity Model Stages

  15. Maturity Model Importance • Guidance for investment decisions • Stages provide a framework to view an organization’s progress • Insights: difficulty moving between stages • Infant to child stages because of investment level • Teenager to adult because of strategic importance of data warehouse

  16. Data Mining • Discover significant, implicit patterns • Target promotions • Change mix and collocation of items • Requires large volumes of transaction data • Important application for data warehouses

  17. Data Cube Concepts and Operators • Basics • Dimension and measure details • Operators

  18. Data Cube Basics • Multidimensional arrangement of data • Users think about decision support data as data cubes • Terminology • Dimension: subject label for a row or column • Member: value of dimension • Measure: quantitative data stored in cells

  19. Data Cube Example

  20. Dimensions and Measures • Dimensions • Hierarchies: members can have sub members • Sparsity: many cells do not have data • Measures • Derived measures • Multiple measures in cells

  21. Measure Aggregation Properties Additive Summarized by addition across all dimensions Common measures such as sales, cost, and profit Semi-Additive Summarized in some but not all dimensions such as time Periodic measurements such as account balances and inventory levels Non-Additive Cannot be summarized by addition through any dimension Historical facts such as unit price for a sale

  22. Measure Aggregation Example • Dimensions • Course: course id, degree, department, and college • Student: student id, major, department, and college • Time: semester, academic year, academic decade • Measures: • Credit hours • Grade • Unit tuition • Tuition • Aggregation properties for measures: ?

  23. Time Series Data • Common data type in trend analysis • Reduce dimensionality using time series • Time series properties • Data type • Start date • Calendar • Periodicity • Conversion

  24. Slice Operator • Focus on a subset of dimensions • Set dimension to specific value: 1/1/2010 (Location  Product Slice for Time = 1/1/2010)

  25. Dice Operator • Focus on a subset of member values • Replace dimension with a subset of values • Dice operation often follows a slice operation

  26. Other Operators • Operators for hierarchical dimensions • Drill-down: add detail to a dimension • Roll-up: remove detail from a dimension • Recalculate measure values • Pivot: rearrange dimensions

  27. Drill-Down Example

  28. Operator Summary

  29. Motivation for Data Warehouse Design • Representation of long-term memory for organizational decision making • Schema design captures measures and variables important to decision making • Poor schema design can lead to decision making problems.

  30. Data Modeling Basics • Dimension table: contains member values • Fact table: contains measure values • 1-M relationships from dimension to fact tables • Grain: most detailed measure values stored

  31. Types of Fact Tables • Transaction • Most common • Usually additive measures • Snapshot • Periodic view of asset level • Usually semi-additive measures • Factless • Event occurrence • No measures, just FKs

  32. Star Schema Example

  33. Constellation Schema

  34. Snowflake Schema Example

  35. Summarizability Problems Motivation Intra dimension patterns and problems Fact-dimension patterns and problems

  36. Summarizability Motivation Ability to compute summary computations for a coarse level of detail from finer levels of details Violations of summarizability Incorrect results Erroneous decision making and user confusion Inability to use performance optimizations Relationships among dimension levels and dimension and fact tables

  37. Drill Down Incompleteness Example

  38. Roll-up Incompleteness Example

  39. Non Strict Example

  40. Incomplete Fact-Dimension Relationship

  41. Non Strict Dimension-Fact Relationship (a) Unit sales by salesperson (b) Shared unit sales by salesperson

  42. Dimension Summarizability Patterns

  43. Examples of Dimension Summarizability Patterns

  44. Dimension Non Summarizability Patterns

  45. Dimension Non Summarizability Examples

  46. Summary of Dimension Patterns

  47. Fact-Dimension Summarizability Patterns

  48. Examples of Fact- Dimension Summarizability Patterns

  49. Fact-Dimension Non Summarizability Patterns

  50. Examples of Fact-Dimension Non Summarizability Patterns

More Related