1 / 25

Data Warehouse

This guide provides an overview of data warehousing, including its definition, characteristics, motivations, architecture, and design approaches.

jbeulah
Download Presentation

Data Warehouse

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehouse IMS5024 – presented by Eder Tsang

  2. Data Warehouse • A data warehouse is a system consisting of processes and databases used to provide the “data infrastructure” for EIS and DSS • “… a subject-oriented, integrated, timevariant, and non-volatile collection of data in support of management’s decisions” Inmon and Hackathorn (1994)

  3. Data warehouse - subject oriented • The data warehouse is organised by “data subjects” that are relevant to the organisation. – Customer, claim, shipment, product • This may be contrasted with the process orientation of many OLTP systems

  4. Data warehouse - integrated • Data in the warehouse is structured based on a corporate-wide model, spanning the functional boundaries of legacy systems • This includes naming standards, units of measurement and periodicity

  5. Data warehouse - time variant • Data is the data warehouse is characterised by the time-series nature of historical data • The data consists of a series of “snapshots” which are time-stamped and record values at a moment in time • This supports trend analysis of the data

  6. Data warehouse - non volatile • The data warehouse is not continuously updated (inserts, eletes and changes) like data in an OLTP system • Data in a data warehouse is periodically up-loaded at a scheduled time intervals (say daily)

  7. Motivations for data warehousing • Demands on OLTP data bases for query processing would be too great • Data warehousing is designed for efficient retrieval • Data in legacy systems is frequently inconsistent, of poor quality and stored in different formats • Reduce costs in providing data for decision making

  8. Motivations for data warehousing • Support for focus on complete business processes (BPR) • Support for new initiatives – CRM, Balanced Scorecard • Industry sources quote ROI’s averaging 401% over 3 years • Remain competitive

  9. Typical Data Warehouse Architecture

  10. An Actual Data Warehouse

  11. Data warehouse development • Requirements identification • Logical design, data modelling • Data extract, transform and load (ETL) • Warehouse architecture , technology and tools • Physical database design • Delivery systems • Operational policies

  12. Designing a data warehouse – data design There are two main approaches to data modelling or data warehouse design – entity relationship modelling and normalisation – dimensional modelling

  13. The design of databases using a traditional E-R approach • Entities and relationships • Normalisation 3NF

  14. Entity relationship schema

  15. Why do we normalise data? • Normalisation is a process for converting complex data structures into simple, stable data structures • Normalisation protects integrity of database by avoiding anomalies (update, delete, create) • Normalised data models are: • robust and stable • have minimum redundancy

  16. Dimensional Modeling (star schema)

  17. Dimensional Modeling (star schema) Components of dimensional model: – Fact Tables : contain measurements of business eg. Sales, purchase order, shipment – Dimension Tables : store the descriptions of the dimensions of the business eg. Product, customer, vendor, store

  18. Dimensional Modeling (star schema) • Each dimension table has a single primary key that corresponds exactly to one of the components of the multipart key in the fact table. • A fact table always expresses a many to many relationship (the key is composed of foreign keys • The most useful facts in a fact table are numeric and additive ( typically values are added up)

  19. Snowflake schema • Snowflake schema –all the tables are normalised • Star schemas are preferable to snowflake – fewer joins for information retrieval

  20. Dimensional Modelling vs E-R modelling • the purpose of dimensional modelling structure data for easy and efficient analysis • E-R modelling creates a single required to support organisation’s Whereas • DM creates individual models for business/decision interest eg. • model for sales info • model for Inventory info

  21. Entity relationship schema (3NF)

  22. Corresponding Star schema

  23. Corresponding snowflake schema

  24. Dimensional Modelling vs E-R modelling (Con’t) • OLTP and DW have different purpose: – operational vs informational • Normalisation protects integrity of database by avoiding anomalies (update, delete, create) • Data models for data warehouse do not have to be normalised – In contrast, data in DW does not change often – periodic additions of new data

  25. DM vs. E-R modeling debate (Kimball’s view) • OLTP systems are volatile – high rates of update transactions • In normalised models the goal is to reduce data redundancy and prevent update anomalies • Data in a data warehouse does not need to be normalised because it is periodically refreshed not updated by user transactions

More Related