1 / 37

Creating the Dimensional Model

Creating the Dimensional Model. Data Warehouse Database Design Objectives. Easy to understand Optimum performance Adaptable. Data Warehouse Data Type. Description. Fact (detail, atomic, raw) base data. Business measures. Dimension data. Query drivers. Derived fact data.

keran
Download Presentation

Creating the Dimensional Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Creating the Dimensional Model

  2. Data Warehouse Database Design Objectives • Easy to understand • Optimum performance • Adaptable

  3. Data Warehouse Data Type Description Fact (detail, atomic, raw) base data Business measures Dimension data Query drivers Derived fact data Calculated data Summary (aggregate) data Pre-calculated data Metadata Warehouse map Data Warehouse Data Types

  4. Designing the Dimensional Model • Phase III: Defining the dimensional model

  5. Item_nbrItem_descQuantityDiscnt_priceUnit_priceOrder_amount…Item_nbrItem_descQuantityDiscnt_priceUnit_priceOrder_amount… Order History (Dimension table) Product (Dimension table) Customer (Dimension table) Channel (Dimension table) (Facttable) Star Dimensional Modeling

  6. Advantages of Using a Star Dimensional Model • Supports multidimensional analysis • Design improves performance • Optimizers yield better execution plans • Parallels end-user perceptions • Provides an extensible design • Broadens the choices for data access tools

  7. Fact Tables • Contain numerical metrics of the business • Can hold large volumes of data • Can grow quickly • Can contain base, derived, and summarized data • Are typically additive • Are joined to dimension tables through foreign keys that reference primary keys in the dimension tables

  8. Factless Fact Tables • Event tracking • Coverage

  9. More on Factless Fact Tables Employee dimension Emp_PK Grade dimension Grade_PK Emp_FK Sal_FK Age_FK Ed_FK Grade_FK Salary dimension Sal_PK Education dimension Ed_PK Age dimension Age_PK PK = Primary Key & FK = Foreign Key

  10. Identify Base and Derived Measures • Identify candidate facts • Remove duplicate facts • Discover and document the underlying calculations • Cross reference base facts • Obtain final derived fact approval

  11. Base and Derived Data Payrolltable Emp_FK Month_FK Salary Comm Comp 101 05 1,000 0 1,000 102 05 1,500 100 1,600 103 05 1,000 200 1,200 104 05 1,500 1,000 2,500 Basedata Deriveddata

  12. Translating Business Measures into a Fact Table Business measures Facts Fact Business Measures Base Number of Items Number of Items Item Amount Base Amount Base Item Cost Cost Derived Profit Profit

  13. Fact Table Measures Nonadditive: Cannot be added along any dimension Additive: Added across all dimensions Semiadditive: Added along some dimensions

  14. Dimension Tables • Contain descriptors of the business • Contain relatively static data • Contain textual and discrete data • Are usually smaller than fact tables • Are joined to a fact table through a foreign key reference

  15. Translating Business Dimensions into Dimension Tables DimensionTables Business Dimensions Product Store SourceSystemInformation Customer Day

  16. Translating a Product Business Dimension into a Product Dimension Table Business Dimension Business DimensionProduct for Product Product_Id (Natural Key) Product Category Supplier Type Warehouse Promotion Catalog Product_name Product_desc Category Supplier_Id Product_status List_price Catalog_Id Product_type Product_code Promotion_Code Warehouse_location

  17. Date Dimension • Should contain the attributes required by all fact tables. • Can be categorized into 4 groups • Date formats • Calendar date attributes : day, month name, year • Fiscal attributes : fiscal week, fiscal period • Indicator columns : Boolean values such as it is a national holiday

  18. Slowly Changing Dimensions Business Dimension for Product Business DimensionProduct Product_Id (Natural Key) Product_PK (surrogate key) Product Category Supplier Type Warehouse Promotion Catalog Product_name Product_desc Category Supplier_Id Product_status List_price Catalog_Id Product_type Product_code Promotion_code Warehouse_location Valid_from_date Valid_to_date Where Product_key is a calculated number stored within the database

  19. Types of Database Keys • Primary keys (PKs) • Foreign keys (FKs) • Composite keys • Surrogate keys

  20. Using Surrogate Keys • Advantages of surrogate keys include: • Control over data • Reduced fact table size • Avoid using the following as data • warehouse keys: • OLTP Production (natural) keys • Smart keys (embedded meaning) Surrogate key Product key: 38972

  21. 1 1 Smith 200 030199 2 2 Jones 300 050599 3 3 Harvey 300 060599 22 1 Smith 400 061001 1 073258 Coffee Hot YUBN 032200 2 073258 Coffee Hot MAXH 110100 3 011172 Pop Cold SCHW 061001 4 011173 Tea Hot RRSE 061001 Surrogate Keys Example Emp_FK Salesperson_ID Salesperson_Name Manager_ID Emp_Change_Date Surrogate Keys Prod_FK Prod_ID Prod_Name Prod_Grouping Brand_Code Prod_Change_Date

  22. Cost Discount Margin Sales_amt Units Sales_channel_FK(surrogate key) Sales_product_FK (surrogate_key) Sales_promotion_FK (surrogate key) Sales_time Adding a Surrogate Key • Channel, product, promotion, and time surrogate keys are added. • These can be used to build aggregate tables. Business Dimension for Sales

  23. Bracketed Dimensions • Enhance performance and analytical capabilities • Create groups of values for attributes with many unique values, such as income ranges and age brackets • Minimize the need for full table scans by pre-aggregating data

  24. Bracket dimension Bracket_PK Income fact Customer_PKBracket_FK Customer dimension Customer_PKBracket_FK Bracket_PK Income (10Ks) Marital Status Gender Age 1 60-90 Single Male <21 2 60-90 Single Male 21-35 3 60-90 Single Male 35-55 4 60-90 Single Male >55 5 60-90 Single Female <21 6 60-90 Single Female 21-35 Bracketing Dimensions

  25. Models for Hierarchical Data • Analytical activities using hierarchies are supported through different models: • Business • Multiple • Multiple time • Hierarchical data is stored in dimension tables. • Dimensions can contain one or more hierarchies.

  26. Identifying Analytical Hierarchies Business hierarchies describe organizational structure and logical parent-child relationships within the data. Storedimension Organizationhierarchy Region Store IDStore DescLocationSizeTypeDistrict IDDistrict DescRegion IDRegion Desc District Store

  27. Multiple Hierarchies Storedimension Store IDStore DescLocationSizeTypeDistrict ID District DescRegion IDRegion DescCity IDCity DescCounty IDCounty DescState IDState Desc Organizationhierarchy Geography hierarchy Region State City County Store District Store

  28. Multiple Time Hierarchies Fiscaltimehierarchy Calendartimehierarchy Fiscal year Calendar year Fiscal quarter Calendar quarter Fiscal month Calendar month Fiscal week Calendar week

  29. Drilling Up and Drilling Down Market Hierarchy Group Region 1 Region 2 District 1 District 2 District 3 District 4 Store 1 Store 2 Store 3 Store 4 Store 5 Store 6

  30. Drilling Across Market hierarchy Group Region Region District District City hierarchy Store Store City City Stores> 20,000 sq. ft.

  31. Documenting the Granularity of Dimensions • Is an important design consideration • Determines the level of detail • Is determined by business needs Low-level grain (Transaction-level data) High-level grain (Summary data)

  32. Fiscal Year Fiscal Quarter Fiscal Month Fiscal Week Day Defining Time Granularity Fiscal Time Hierarchy Current dimension grain Future dimension grain

  33. A Star Dimensional Model • Denormalized Model Order History Product History_FKCustomer_FKProduct_FKChannel_FK Item_nbrItem_descQuantityDiscnt_priceUnit-priceOrder_amt… Product_PK History_PK . . . . . . . . Customer Channel Customer_PK Channel_PK . . . . . . . .

  34. Star Dimensional Model Characteristics • The model is easy for users to understand. • Primary keys represent a dimension. • Nonforeign key columns are values. • Facts are usually highly normalized. • Dimensions are completely denormalized. • Fast response to queries is provided. • Performance is improved by reducing table joins. • End users can express complex queries. • Support is provided by many front-end tools.

  35. Channel_PK Web_PK Channel_desc Snowflake Model Order History Product History_FKCustomer_FKProduct_FKChannel_FK Item_nbrItem_descQuantityDiscnt_priceUnit-priceOrder_amt… History_PK Product_PK . . . . . . . . Customer Channel Customer_PK . . . . Web Web_PK Web_url

  36. Constellation Configuration Atomicfact

  37. Updating the Meta Data • Dimensions and attributes detail (Primary key, attribute definition, and so on) • Facts and measures detail (Measure description, additivity, and so on) • Data source definitions (Business owner, platform, description, and so on) • Source to target data mappings (Data type, length, target column description, and so on)

More Related