Advanced Dimensional Modelling. SQLBits 8, 9 th April 2011, Brighton. Vincent Rainardi firstname.lastname@example.org Blog: dwbi1.wordpress.com. Advanced Dimensional Modelling. 1. Dimensions - Structure SCD Type 6 1 or 2 Dimensions When To Snowflake A Dimension with Only 1 Attribute
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
SQLBits 8, 9th April 2011, Brighton
a) One Dimension
b) Two Dimensions
select c. attribute, sum(f.measure1) from fact1 finner join dim_account a on f.account_key = a.account_keyinner join dim_customer c on a.customer_key = c.customer_keygroup by c. attribute
d) Two Dimensions with inter-dimension link
Weakness: maintain customer key in 2 places: fact table and dim account.
a.k.a. “Star with a Back Door”
e) One Dimension with Customer Key
Try to fix weakness of a:
unable to build a fact table with grain = customer.
Add a column in dim account: customer key
Not as popular as c) and d) in solving Dim Customer issue. It is “indecisive” :
trying to create Dim Customer but doesn’t want to create Dim Customer.
Disadvantage: Dim Customer is hidden inside Dim Account, making it:
a) more difficult to maintain (especially for a type 2), and
b) less modular/flexible
1. When the sub dim is used by several dims
City-Country-Region columns exist in DimBroker, DimPolicy, DimOffice and DimInsured
Replaced by Location/GeoKey pointing to DimLocation / DimGeography
Advantage: consistent hierarchy, i.e. relationship between City, Country & Region.
Weakness: we would lose flexibility. City to Country are more or less fixed, but the grouping of countries might be different between dimensions.
2. When the sub dim is used by both the main dim and the fact table(s)
The alternative is maintaining two full dimensions (star classic).
3. To make “base dim” and “detail dim”
4. To enrich a date attribute
Insurance classes, account types (banking), product lines, diagnosis, treatment (health care)
Policies for marine, aviation & property classes have different attributes.
Pull common attributes into 1 dim: DimBasePolicy
Put class-specific attributes into DimMarine, DimProperty, DimAviation
Ref: Kimball DW Toolkit 2nd edition page 213
Month, Quarter, Year, etc.
Like #1, a sub dim used by several dims.
Reasons for putting single attribute in its own dim:
Should we put the attribute in the fact table? (like DD = Degenerate Dim)
Probably, if the grain = fact table,and it’s short or it’s a number.
Exception: snapshot month (or day/week/quarter)
Snapshot month is used in periodic snapshot fact table. Snapshot month is in the form of an integer (201104 for April 2011). Doesn’t violate the 3 points above.
A dim with grain = the transaction fact table
Transaction, not accumulative or periodic snapshot
Most granular event in any business process
Query PerformanceDD columns are moved to a dim, away from the heavy traffic in fact tables. DW queries don’t touch those DD columns unless they need to– performance. DD attributes totalling 30 bytes, replaced by 4 bytes int column. Slimmer fact table, better for queries.
Periodic Snapshot Fact TableFor periodic snapshot fact table, saving is even greater. Monthly snapshot fact, 10 years / 120 months. Rather than specifying the DDs repeatedly 120x, they are specified once in the transaction dim. All that is left on the fact table is a slim 1 intcol: the transaction key.
Some fact tables have grains greater than the transactionA payment from a customer is posted into 4 accounts in the GL fact table. That single financial transaction becomes 4 fact rows but only has 1 row in the trans dim. Fact table with 10m rows, trans dim only 3 million rows.
Related TransactionsSome transactions are related, e.g. in retail, a purchase of a kitchen might need to be created as 2 related orders, because the worktop is made-to-order. Rather than creating a ‘related order’ column on the fact tables, it might be better (depends on how it’s used) to create it on the trans dim because: a) an order can consist of many fact rows (1 row per item) so the “related order number” will be duplicated across these fact rowsb) slimmer fact tablec) the transaction could be on many fact tables, not only one.
Should we have a PK?
Yes, if we need to be able to identify each fact row
Some experts totally disagree
Header - Detail
FK (no RI)
Single or Multi Column?
Single Column: Generated Identity Multi Column: Dimension Keys
Single-column PK is better than multi-column PK because :
1) A multi-column PK may not be unique. A single-column PK guarantees that the PK is unique, because it is an identity column.
2) A single-column PK is slimmer than a multi-column PK, better query performance. To do a self join in the fact table (e.g. to link the current fact row to the previous fact row), we join on a single integer column.
Example of not having a PK
If duplicate fact rows are allowed.
e.g. retail DW: Store Key, Date Key, Product Key, Customer Key
Same customer buying the same milk in the same shop on the same day twice
--- Order Line ID as DD to make it unique (not all EPOS has it)
What are they?
Base Fact Tables
Loading & indexing:
(actual & budget)
Why is it a problem
What to do
Calculate internally or buy data from outside
What is a Multi Valued Attribute?
An attribute which has more than 1 value per dimension row.
MV Attribute or MV Dimension?
Why do I need to know this?
Approaches to deal with MV Attributes
1. Lower the grain of the dim
Fact table requires that the product dimension is at Product Code grain, e.g. no sales info per size, but only per product code.
2. Put the MV attributes in a separate dim, link direct to the fact table
Often we don’t have the allocation information e.g. 50-50 or 30-70, we only know that product1has 2 sizes
3. Use a bridge table to link the 2 dims
4. Have several columns in the dim for that attribute
If the number of attributes is small and fixed, this is a popular approach. But if the number of attributes is large (e.g. >10) or if it’s variable (e.g. sometimes 2, sometimes 20), approach 2 and 3 above are more popular, and more appropriate.
5. Put the attribute in a snowflake sub dim
We can’t really do this, as it is 1 to many (1 row in the main dim corresponds to many rows in the sub dim). So we need a bridge table, which brings us back to approach 3.
6. Keep in one column using delimiters
e.g. “Small|Medium″. A crazy idea. More flexible than having several columns (approach 4) and simpler than approach 3 or 2.
If the purpose of the attribute is “display only” on a report (rather than analyse or slice & dice), there is an argument for using this approach, particularly if the number of attributes is small (e.g. 1 to 4).
Dims as of
(up to last night)
-1 null in source
-2 not in dim table
-3 not in dim table as dim was frozen to be resolved next batch
Real time partition
GBP, USD, EUR,
( 1 reporting
FX Fact Table
in original currency, DW currency or both
Concept of FX Rate Type/Profile