140 likes | 325 Views
Data Warehousing. DSCI 4103 Dr. Mennecke. Chapter 2. Steps in Designing a DW. Choose a business process model (BPM) Choose the grain of the business process Choose the dimensions Choose the measured facts that will populate each record of the fact table. Choose a Business Process Model.
E N D
Data Warehousing DSCI 4103 Dr. Mennecke Chapter 2
Steps in Designing a DW • Choose a business process model (BPM) • Choose the grain of the business process • Choose the dimensions • Choose the measured facts that will populate each record of the fact table
Choose a Business Process Model • A BPM is a view of the organization that considers the operational processes for which the operational system captures data. • Business processes are typically things like orders, shipments, inventory, sales, etc.
Choose the Grain of the Business Process • The grain is the fundamental level of data to be represented in the fact table for the business process that is being modeled • Grains can be is detailed as individual transactions or as broad as periodic summaries • Detailed grains are preferable to highly summarized grains because they offer more flexibility
Choose the Dimensions • Dimensions often fall out of the fields included in the fact table. However, selecting the right dimensions and dimensional fields is critical to the success of a DW • If a dimension requires the addition of more records to the fact table, then it violates the grain of the fact table and is suspect
Choose the Facts • Usually measured facts are numeric additive fields like quantity sold • Non-additive facts are fields such as unit price, ratios, and percentages
Points to Consider • Normalizing a facts table is important: • A fact table is the largest part of the DW (literally hundreds of millions of records), therefore as few fields as possible should be included in this table • Normalizing dimension tables is a waste of time: • Dimension tables are minuscule when compared to the fact table. • When dimension tables are normalized, it reduces the users ability to browse the data
Date Dimensions • A date dimension is important because DW are time variant. This means that they capture data at a moment in time. This implies that date must usually be part of a DW query.
Date fields • For example… • Date key • Date • Full date description • Day of week • Day of epoch • Week number of epoch • Month number of epoch • Day number in calendar month • Day number in calendar year • Day number in fiscal month • Day number in fiscal year • Selling season • Weekend indicator • Holiday indicator • Calendar quarter • Etc…
Degenerate dimensions • Fact table fields that are included in the table even though they are not linked to a dimension table • Occurs when the grain of the fact table corresponds to an individual transaction • For example, the POS transaction# is important to the transaction, but is described by the facts and figures represented elsewhere in the fact table (not in a dimension)
Drilling Up and Down • To drill down into a DW, more dimensions are added to the query • In other words, more fields are added to the query which results in more records being included in the dynaset • To drill up, dimensions are removed from the query • In other words, fewer fields are part of the query which results in fewer records being included in the dynaset
Normalizing Dimensions • Snowflaking is the process of normalizing denormalized dimensional tables. BrandDimension Product Dimension Package Type Dimension Storage Type Dimension
Normalization should be resisted • Presentation is more difficult with snowflake dimensions • Snowflake dimensions create difficulties for query optimizers • Disk space savings are insignificant • Snowflake dimensions inhibit a user’s ability to browse and query dimensions • Snowflake dimensions reduce index efficiency (DWs are highly indexed)
Surrogate keys • Every join between dimensions and the fact table in the data warehouse should be based on meaningless integer surrogates • Avoid using natural codes or keys • Keys should not be smart • Over time, smart keys and natural codes may change and loose their meaning