Dimensional Modeling - PowerPoint PPT Presentation

MikeCarlo
dimensional modeling n.
Skip this Video
Loading SlideShow in 5 Seconds..
Dimensional Modeling PowerPoint Presentation
Download Presentation
Dimensional Modeling

play fullscreen
1 / 35
Download Presentation
Dimensional Modeling
318 Views
Download Presentation

Dimensional Modeling

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Dimensional Modeling CS 543 – Data Warehousing

  2. From Requirements to Data Models CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  3. Logical Data Model • Logical data design includes identification of all data elements and the structures in which they are connected • Data elements • Data structures • Requirement gathering, and more specifically, information packages lead to the logical data design CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  4. Dimensional Modeling • A logical data design technique to structure the business dimensions and the metrics that are analyzed along these dimensions • Dimensional modeling • Is intuitive for business • Has proven to be efficient for queries and analyses • Information packages are the foundation of dimensional modeling CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  5. Fact Table CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  6. Dimension Tables CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  7. Desired Characteristics in the Model • The model should provide the best data access • The model should be query-centric • The model must be optimized for queries and analyses • The model must show that the dimension tables interact with the fact table • It must be structured in such a way that every dimension can interact equally with the fact table • The model should allow drill down and roll up along dimension hierarchies CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  8. The Star Schema CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  9. E-R Vs. Dimensional Modeling (1) • Entity-relationship modeling • Removes data redundancy • Ensures data consistency • Expresses microscopic relationships • Dimensional modeling • Captures critical measures • Views along dimensions • Intuitive for business users CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  10. E-R Vs. Dimensional Modeling (2) • DM rules more restrictive than for E-R modeling. • DM is a simpler logical model. • E-R representative power is greater due to variety of constructs supported. • DM looks like normalized E-R conceptual, except: • All relationships mandatory M-1. • Single path between any two relations. CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  11. Another Example: Retail Dimensions CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  12. Star Schema CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  13. Querying Against a Star Schema CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  14. Dimension Tables Characteristics • Dimension table key • Large number of attributes (wide) • Textual attributes • Attributes not directly related • Flattened out, not normalized • Ability to drill down/roll up • Multiple hierarchies • Less number of records CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  15. Fact Table Characteristics • Concatenated fact table key • Grain or level of data identified • Fully additive measures • Semi-additive measures • Large number of records • Only a few attributes • Sparsity of data • Degenerate dimensions CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  16. “Factless” Fact Table • If the metric or unit of analysis is occurrence or non-occurrence of an event, then the fact table will contain either 1 or nulls CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  17. Data Granularity (1) • Actual events are tied to actual transactions (e.g., sales). • This corresponds to the lowest grain or highest detail • Accumulated events are the effect of accumulated transactions (e.g., inventory on hand). • This corresponds to a higher grain and lesser detail • Allowable events represent the “ability” to perform a transaction (e.g., carried products, a.k.a. plan-o-gram). • Actual events are typically more sparse than allowable events (e.g., a store carries more products than it sells each day). CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  18. Data Granularity (2) • Low grain designs are easy to change (“graceful” change) • Low grain designs result in larger storage and maintenance costs CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  19. Keys (1) • Be careful in picking and using operational system keys as keys for the dimension tables • Avoid built-in meanings in the primary keys of the dimension tables • Do not use operational system keys as primary keys of dimension tables • Use surrogate keys (system generated keys) • Keep a mapping between the surrogate and primary keys • Include the operational system primary key as an attribute in the dimension table CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  20. Keys (2) Primary key options for fact tables • A single compound primary key whose length is the total length of the keys of the dimension tables • Foreign keys need to be stored as additional attributes in the fact table • Increases size of fact table • A concatenated primary key that is the concatenation of all the primary keys of the dimension tables • No need to store the foreign keys separately • A generated primary key independent of the keys of the dimension tables • Foreign keys need to be stored as additional attributes CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  21. Advantages of the Star Schema • Easy for users to understand • Optimizes navigation • More suitable for query processing • Star-join and star-index CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  22. Updates • Updates to the fact table • Addition of rows • Changes in row (adjustments in values) • Rarely, addition of attributes (new fact or metric) • Updates to dimension tables • Slow addition of rows • Slow addition of attributes • New dimensions CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  23. Updates to the Dimension Tables • Most dimensions are generally constant over time • Many dimensions, if not constant, change slowly over time • The key of the source record does not change • The description and other attributes change slowly over time • In the source OLTP systems, the new values overwrite the old values • Overwriting is not always the best option for dimension table attributes • The way updates are made depends on the type of change CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  24. CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  25. Type 1 Changes: Correction of Errors (2) • Overwrite the attribute value in the dimension table row with the new value • The old value of the attribute is not preserved • No other change are made in the dimension table row • The key of this row or any other key value are not affected • This type is easiest to implement CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  26. Type 1 Changes (2) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  27. Type 2 Changes: Preservation of History (1) • Properties • They usually relate to true changes in source systems • There is a need to preserve history in the data warehouse • This type of change partitions the history in the data warehouse • Every change for the same attribute must be preserved • Approach • Add a new dimension table row with the new value of the changed attribute • An effective data field may be added into the dimension table • There are no changes to the original row of the dimension table • The new row is inserted with a new surrogate key • The key of the original row is not affected CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  28. Type 2 Changes: Preservation of History (2) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  29. Type 3 Changes: Tentative Soft Revisions (1) • Properties • They usually apply to “soft” or tentative changes in the source systems • There is a need to keep track of history with old and new values of the changed attribute • They are used to compare performances across the transition • They provide the ability to track forward and backward • Approach • Add an “old” field in the dimension table for the affected attribute • Push down the existing value of the attribute from the “current” field to the “old” field • Keep the new value of the attribute in the “current” field • Also, you may add a “current” effective date field for the attribute • The key of the row is not affected • No new dimension row is needed CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  30. Type 3 Changes: Tentative Soft Revisions (2) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  31. Large Dimensions (1) • Large dimensions? • Large number of rows (deep) • Large number of attributes (wide) • Dimensions can become large because of frequent changes (what type?) and need to have many attributes for analysis • Consequence • Slow and inefficient • Solution • Proper logical and physical design • Indexes • Optimized algorithms CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  32. Large Dimensions (2) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  33. Multiple Hierarchies CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  34. Junk Dimensions • Dimensions for a DW are typically taken from operational source systems • However, source systems contain many additional attributes (such as flags, text, descriptions, etc) that may not be useful in a DW • What are the options • Discard all such fields in the source systems • Include them in the fact table • Include all of them as dimensions • Select some and add them to a single “junk” dimension table CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  35. The Snowflake Schema • Snowflaking is a method of normalizing the tables in a star schema CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS