data warehouse design l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Data Warehouse Design PowerPoint Presentation
Download Presentation
Data Warehouse Design

Loading in 2 Seconds...

play fullscreen
1 / 15

Data Warehouse Design - PowerPoint PPT Presentation


  • 239 Views
  • Uploaded on

Data Warehouse Design. Alan Schneider. Overview. Operational Design Review Data Warehouse Dimensional Model Star Schema Multiple Star Schemas Fact Tables Granularity Dimension Tables Time Hierarchical Drilling Aggregates / Summaries. Operational Design Review.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data Warehouse Design' - Faraday


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data warehouse design

Data Warehouse Design

Alan Schneider

overview
Overview
  • Operational Design Review
  • Data Warehouse Dimensional Model
    • Star Schema
    • Multiple Star Schemas
  • Fact Tables
    • Granularity
  • Dimension Tables
    • Time
  • Hierarchical Drilling
  • Aggregates / Summaries
operational design review
Operational Design Review
  • Decisional queries require the summation of hundreds to tens of thousands of figures stored in perhaps as many rows in the database.
  • Such processing on a fully normalized data structure is slow and cumbersome.
    • For example: If a business manager requires a Product Sales per Customer report the program code must access the Customer, Account, Account Type, Order, Order Line Item, and Product tables to compute the totals.
dimensional modeling
Dimensional Modeling
  • Refers to a set of data modeling techniques that have gained popularity and acceptance for data warehouse implementations.
  • Two types of tables are used in dimensional modeling:
    • Fact tables
    • Dimensional tables.
dimensional modeling star schema
Dimensional Modeling Star Schema
  • Is a Fact Table Plus Its Related Dimension Tables
    • Visually, a dimensional schema looks very much like a star, hence the use of the term star schema to describe dimensional models.
    • Fact tables reside at the center of the schema, and their dimensions are typically drawn around it,
    • The dimensions are Client, Time, Product and Organization. The fields in these tables are used to describe the facts in the Sales Fact table.
multiple star schemas snow flake
Multiple Star Schemas(Snow Flake)
  • A data warehouse will most likely have multiple star schemas, i.e., many Fact tables. Each schema is designed to meet a specific set of information needs.
  • Multiple schemas, each focusing on a different aspect of the business, are natural in a dimensional warehouse.
  • Equally normal is the use of the same Dimension table in more than one schema. The classic example of this is the Time dimension.
    • For example, a retail company that has one star schema to track profitability per store may make use of the same Time dimension table in the star schema that tracks profitability by product.
fact tables
Fact Tables
  • Fact tables are used to record actual facts or measures in the business. Facts are the numeric data items that are of interest to the business.
  • Below are examples of facts for different industries:
    • Retail. Number of units sold, sales amount
    • Telecommunications. Length of call in minutes, average number of calls
    • Banking. Average daily balance, transaction amount
    • Insurance. Claims amounts
    • Airline. Ticket cost, baggage weight
  • Are Fully Normalized
fact table granularity
Fact TableGranularity
  • The term Granularity is used to indicate the level of detail stored in the fact table. The granularity of the Fact table follows naturally from the level of detail of its related dimensions.
    • For example, if each Time record represents a day, each Product record represents a product, and each Organization record represents one branch, then the grain of a sales Fact table with these dimensions would likely be: sales per product per day per branch.
  • Proper identification of the granularity of each schema is crucial to the usefulness and cost of the warehouse.
fact table granularity too high
Fact Table GranularityToo High
  • Severely limits the ability of users to obtain additional detail.
    • For example, if each time record represented an entire year, there will be one sales fact record for each year, and it would not be possible to obtain sales figures on a monthly or daily basis.
fact table granularity too low
Fact Table GranularityToo Low
  • Results in an exponential increase in the size requirements of the warehouse.
    • For example, if each time record represented an hour, there will be one sales fact record for each hour of the day
      • 8,760 sales fact records for a year with 365 days for each combination of Product, Client, and Organization
    • If daily sales facts are all that are required, the number of records in the database can be reduced dramatically.
dimension tables
Dimension Tables
  • Establish the context of the facts.
  • Dimensional tables store fields that describe the facts.
  • Below are examples of dimensions for the same industries:
    • Retail. Store name, store zip code, product name, product category, day of week
    • Telecommunications. Call origin, call destination
    • Banking. Customer name, account number, data, branch, account officer
    • Insurance. Policy type, insured party
    • Airline. Flight number, flight destination, airfare class
  • Dimensions Are De-normalized: Product Example:
time dimension
Time Dimension
  • The One of the goals of the data warehouse is to offload historical data from the operational systems.
  • Each fact in the data warehouse must therefore be time-stamped.
    • This requirement is met through the Time dimension, which is always present in any warehouse schema.
    • Each record in the Time dimension represents a meaningful chunk of time for the enterprise.
      • Time dimension depends entirely on the business requirements.
hierarchical drilling
Hierarchical Drilling
  • As a result of denormalization of the dimensions, each dimension will quite likely have hierarchies that imply the grouping and structure.
  • For example:
    • Time dimension has a Day-Month-Quarter-Year hierarchy. Similarly,
    • Store dimension may have a City-Country-Region-All Stores hierarchy.
    • Product dimension may have a Product-Product Category-Product Department-All Products hierarchy.
  • When warehouse users drill up and down for detail, they typically drill up and down these dimensional hierarchies to obtain more or less detail about the business.
aggregates summaries
Aggregates / Summaries
  • Aggregates or Summaries are one of the most powerful concepts in data warehousing.
  • The proper use of aggregates dramatically improves the performance of the data warehouse in terms of query response times, and therefore improves the overall performance and usability of the warehouse.
    • For queries that require only high-level or summarized data.
    • Users run queries against aggregated data versus detailed data--significantly smaller number of records.
design summary
Design Summary
  • Dimensional modeling is simple
    • Dimensional modeling techniques make it possible for warehouse designers to create database schemas that business users can easily grasp and comprehend.
  • Dimensional modeling promotes data quality
    • The Star schema allows the warehouse to enforce referential integrity checks. Since the fact record key is a concatenation of the keys of its related dimensions--adds a line of defense against corrupted warehouse data.
  • Performance optimization is possible through aggregates
    • As the size of the warehouse increases, performance optimization becomes a pressing concern--aggregates are one of the most manageable ways by which query performance can be optimized.