1 / 14

Data Warehousing

Data Warehousing. DSCI 4103 Dr. Mennecke. Chapter 2. Steps in Designing a DW. Choose a business process model (BPM) Choose the grain of the business process Choose the dimensions Choose the measured facts that will populate each record of the fact table. Choose a Business Process Model.

dewey
Download Presentation

Data Warehousing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehousing DSCI 4103 Dr. Mennecke Chapter 2

  2. Steps in Designing a DW • Choose a business process model (BPM) • Choose the grain of the business process • Choose the dimensions • Choose the measured facts that will populate each record of the fact table

  3. Choose a Business Process Model • A BPM is a view of the organization that considers the operational processes for which the operational system captures data. • Business processes are typically things like orders, shipments, inventory, sales, etc.

  4. Choose the Grain of the Business Process • The grain is the fundamental level of data to be represented in the fact table for the business process that is being modeled • Grains can be is detailed as individual transactions or as broad as periodic summaries • Detailed grains are preferable to highly summarized grains because they offer more flexibility

  5. Choose the Dimensions • Dimensions often fall out of the fields included in the fact table. However, selecting the right dimensions and dimensional fields is critical to the success of a DW • If a dimension requires the addition of more records to the fact table, then it violates the grain of the fact table and is suspect

  6. Choose the Facts • Usually measured facts are numeric additive fields like quantity sold • Non-additive facts are fields such as unit price, ratios, and percentages

  7. Points to Consider • Normalizing a facts table is important: • A fact table is the largest part of the DW (literally hundreds of millions of records), therefore as few fields as possible should be included in this table • Normalizing dimension tables is a waste of time: • Dimension tables are minuscule when compared to the fact table. • When dimension tables are normalized, it reduces the users ability to browse the data

  8. Date Dimensions • A date dimension is important because DW are time variant. This means that they capture data at a moment in time. This implies that date must usually be part of a DW query.

  9. Date fields • For example… • Date key • Date • Full date description • Day of week • Day of epoch • Week number of epoch • Month number of epoch • Day number in calendar month • Day number in calendar year • Day number in fiscal month • Day number in fiscal year • Selling season • Weekend indicator • Holiday indicator • Calendar quarter • Etc…

  10. Degenerate dimensions • Fact table fields that are included in the table even though they are not linked to a dimension table • Occurs when the grain of the fact table corresponds to an individual transaction • For example, the POS transaction# is important to the transaction, but is described by the facts and figures represented elsewhere in the fact table (not in a dimension)

  11. Drilling Up and Down • To drill down into a DW, more dimensions are added to the query • In other words, more fields are added to the query which results in more records being included in the dynaset • To drill up, dimensions are removed from the query • In other words, fewer fields are part of the query which results in fewer records being included in the dynaset

  12. Normalizing Dimensions • Snowflaking is the process of normalizing denormalized dimensional tables. BrandDimension Product Dimension Package Type Dimension Storage Type Dimension

  13. Normalization should be resisted • Presentation is more difficult with snowflake dimensions • Snowflake dimensions create difficulties for query optimizers • Disk space savings are insignificant • Snowflake dimensions inhibit a user’s ability to browse and query dimensions • Snowflake dimensions reduce index efficiency (DWs are highly indexed)

  14. Surrogate keys • Every join between dimensions and the fact table in the data warehouse should be based on meaningless integer surrogates • Avoid using natural codes or keys • Keys should not be smart • Over time, smart keys and natural codes may change and loose their meaning

More Related