dimensional modeling n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Dimensional Modeling PowerPoint Presentation
Download Presentation
Dimensional Modeling

Loading in 2 Seconds...

play fullscreen
1 / 35

Dimensional Modeling - PowerPoint PPT Presentation


  • 311 Views
  • Uploaded on

Dimensional Modeling. CS 543 – Data Warehousing. From Requirements to Data Models. Logical Data Model. Logical data design includes identification of all data elements and the structures in which they are connected Data elements Data structures

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Dimensional Modeling' - MikeCarlo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
dimensional modeling

Dimensional Modeling

CS 543 – Data Warehousing

from requirements to data models
From Requirements to Data Models

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

logical data model
Logical Data Model
  • Logical data design includes identification of all data elements and the structures in which they are connected
    • Data elements
    • Data structures
  • Requirement gathering, and more specifically, information packages lead to the logical data design

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

dimensional modeling1
Dimensional Modeling
  • A logical data design technique to structure the business dimensions and the metrics that are analyzed along these dimensions
  • Dimensional modeling
    • Is intuitive for business
    • Has proven to be efficient for queries and analyses
  • Information packages are the foundation of dimensional modeling

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

fact table
Fact Table

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

dimension tables
Dimension Tables

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

desired characteristics in the model
Desired Characteristics in the Model
  • The model should provide the best data access
  • The model should be query-centric
  • The model must be optimized for queries and analyses
  • The model must show that the dimension tables interact with the fact table
  • It must be structured in such a way that every dimension can interact equally with the fact table
  • The model should allow drill down and roll up along dimension hierarchies

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

the star schema
The Star Schema

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

e r vs dimensional modeling 1
E-R Vs. Dimensional Modeling (1)
  • Entity-relationship modeling
    • Removes data redundancy
    • Ensures data consistency
    • Expresses microscopic relationships
  • Dimensional modeling
    • Captures critical measures
    • Views along dimensions
    • Intuitive for business users

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

e r vs dimensional modeling 2
E-R Vs. Dimensional Modeling (2)
  • DM rules more restrictive than for E-R modeling.
  • DM is a simpler logical model.
  • E-R representative power is greater due to variety of constructs supported.
  • DM looks like normalized E-R conceptual, except:
    • All relationships mandatory M-1.
    • Single path between any two relations.

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

another example retail dimensions
Another Example: Retail Dimensions

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

star schema
Star Schema

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

querying against a star schema
Querying Against a Star Schema

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

dimension tables characteristics
Dimension Tables Characteristics
  • Dimension table key
  • Large number of attributes (wide)
  • Textual attributes
  • Attributes not directly related
  • Flattened out, not normalized
  • Ability to drill down/roll up
  • Multiple hierarchies
  • Less number of records

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

fact table characteristics
Fact Table Characteristics
  • Concatenated fact table key
  • Grain or level of data identified
  • Fully additive measures
  • Semi-additive measures
  • Large number of records
  • Only a few attributes
  • Sparsity of data
  • Degenerate dimensions

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

factless fact table
“Factless” Fact Table
  • If the metric or unit of analysis is occurrence or non-occurrence of an event, then the fact table will contain either 1 or nulls

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

data granularity 1
Data Granularity (1)
  • Actual events are tied to actual transactions (e.g., sales).
    • This corresponds to the lowest grain or highest detail
  • Accumulated events are the effect of accumulated transactions (e.g., inventory on hand).
    • This corresponds to a higher grain and lesser detail
  • Allowable events represent the “ability” to perform a transaction (e.g., carried products, a.k.a. plan-o-gram).
  • Actual events are typically more sparse than allowable events (e.g., a store carries more products than it sells each day).

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

data granularity 2
Data Granularity (2)
  • Low grain designs are easy to change (“graceful” change)
  • Low grain designs result in larger storage and maintenance costs

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

keys 1
Keys (1)
  • Be careful in picking and using operational system keys as keys for the dimension tables
    • Avoid built-in meanings in the primary keys of the dimension tables
    • Do not use operational system keys as primary keys of dimension tables
    • Use surrogate keys (system generated keys)
  • Keep a mapping between the surrogate and primary keys
    • Include the operational system primary key as an attribute in the dimension table

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

keys 2
Keys (2)

Primary key options for fact tables

  • A single compound primary key whose length is the total length of the keys of the dimension tables
    • Foreign keys need to be stored as additional attributes in the fact table
    • Increases size of fact table
  • A concatenated primary key that is the concatenation of all the primary keys of the dimension tables
    • No need to store the foreign keys separately
  • A generated primary key independent of the keys of the dimension tables
    • Foreign keys need to be stored as additional attributes

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

advantages of the star schema
Advantages of the Star Schema
  • Easy for users to understand
  • Optimizes navigation
  • More suitable for query processing
  • Star-join and star-index

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

updates
Updates
  • Updates to the fact table
    • Addition of rows
    • Changes in row (adjustments in values)
    • Rarely, addition of attributes (new fact or metric)
  • Updates to dimension tables
    • Slow addition of rows
    • Slow addition of attributes
    • New dimensions

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

updates to the dimension tables
Updates to the Dimension Tables
  • Most dimensions are generally constant over time
  • Many dimensions, if not constant, change slowly over time
  • The key of the source record does not change
  • The description and other attributes change slowly over time
  • In the source OLTP systems, the new values overwrite the old values
  • Overwriting is not always the best option for dimension table attributes
  • The way updates are made depends on the type of change

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

type 1 changes correction of errors 2
Type 1 Changes: Correction of Errors (2)
  • Overwrite the attribute value in the dimension table row with the new value
  • The old value of the attribute is not preserved
  • No other change are made in the dimension table row
  • The key of this row or any other key value are not affected
  • This type is easiest to implement

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

type 1 changes 2
Type 1 Changes (2)

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

type 2 changes preservation of history 1
Type 2 Changes: Preservation of History (1)
  • Properties
    • They usually relate to true changes in source systems
    • There is a need to preserve history in the data warehouse
    • This type of change partitions the history in the data warehouse
    • Every change for the same attribute must be preserved
  • Approach
    • Add a new dimension table row with the new value of the changed attribute
    • An effective data field may be added into the dimension table
    • There are no changes to the original row of the dimension table
    • The new row is inserted with a new surrogate key
    • The key of the original row is not affected

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

type 2 changes preservation of history 2
Type 2 Changes: Preservation of History (2)

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

type 3 changes tentative soft revisions 1
Type 3 Changes: Tentative Soft Revisions (1)
  • Properties
    • They usually apply to “soft” or tentative changes in the source systems
    • There is a need to keep track of history with old and new values of the changed attribute
    • They are used to compare performances across the transition
    • They provide the ability to track forward and backward
  • Approach
    • Add an “old” field in the dimension table for the affected attribute
    • Push down the existing value of the attribute from the “current” field to the “old” field
    • Keep the new value of the attribute in the “current” field
    • Also, you may add a “current” effective date field for the attribute
    • The key of the row is not affected
    • No new dimension row is needed

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

type 3 changes tentative soft revisions 2
Type 3 Changes: Tentative Soft Revisions (2)

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

large dimensions 1
Large Dimensions (1)
  • Large dimensions?
    • Large number of rows (deep)
    • Large number of attributes (wide)
  • Dimensions can become large because of frequent changes (what type?) and need to have many attributes for analysis
  • Consequence
    • Slow and inefficient
  • Solution
    • Proper logical and physical design
    • Indexes
    • Optimized algorithms

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

large dimensions 2
Large Dimensions (2)

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

multiple hierarchies
Multiple Hierarchies

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

junk dimensions
Junk Dimensions
  • Dimensions for a DW are typically taken from operational source systems
  • However, source systems contain many additional attributes (such as flags, text, descriptions, etc) that may not be useful in a DW
  • What are the options
    • Discard all such fields in the source systems
    • Include them in the fact table
    • Include all of them as dimensions
    • Select some and add them to a single “junk” dimension table

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

the snowflake schema
The Snowflake Schema
  • Snowflaking is a method of normalizing the tables in a star schema

CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS