Intro. to Data Warehouse - PowerPoint PPT Presentation

Intro to data warehouse
1 / 49

  • Uploaded on
  • Presentation posted in: General

Intro. to Data Warehouse. รศ.ดร. วรพจน์ กรีสุระเดช Worapoj Kreesuradej , Ph.D. Ass ociate Professor Data Mining & Data Exploration Laboratory (DME Lab), Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang , Web:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Intro. to Data Warehouse

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Intro to data warehouse

Intro. to Data Warehouse

รศ.ดร. วรพจน์ กรีสุระเดช

WorapojKreesuradej, Ph.D.

Associate Professor

Data Mining & Data Exploration Laboratory (DME Lab),

Faculty of Information Technology,

King Mongkut's Institute of Technology Ladkrabang,



Intro to data warehouse


  • PaulrajPonniah, Data Warehousing Fundamentals, John Wiley & Sons, 2001.

  • Ralph Kimbal and Margy Ross, The Data Warehouse Toolkit, John Wiley and Sons, 2002.

Definition of dw

Definition of DW

  • “A collection of integrated, subject-oriented databases designed to supply the information required for decision-making.”- W. Inmon

  • A decision support database that is maintained separately from the organization’s operational databases.

  • A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format –E. Turban and etc.

R kimball s definition of a dw

R. Kimball’s definition of a DW

  • A data warehouse is a copy of transactional data specifically structured for querying and analysis.

Problem data management in large enterprises

Problem: Data Management in Large Enterprises

  • Vertical fragmentation of informational systems

  • Result of application (user)-driven development of operational systems

Sales Planning


Num. Control

Stock Mngmt

Debt Mngmt





Sales Administration




Problem data management in large enterprises1




Problem: Data Management in Large Enterprises

  • Two Approaches for accessing data:

    • Query-Driven (Lazy)

    • Warehouse (Eager)

The need for dw

The Need for DW

  • Query-driven (lazy, on-demand)



Integration System

. . .




. . .




Disadvantages of query driven approach

Disadvantages of Query-Driven Approach

  • Delay in query processing

  • Inefficient and potentially expensive for frequent queries

  • Competes with local processing at sources

The warehousing approach

The Warehousing Approach

  • Information integrated in advance

  • Stored in wh for direct querying and analysis





Integration System

. . .







. . .




Advantages of warehousing approach

Advantages of Warehousing Approach

  • High query performance

  • Doesn’t interfere with local processing at sources

  • Information copied at warehouse

    • Can modify, annotate, summarize, restructure, etc.

    • Can store historical information

    • Security, no auditing

Characteristics of dw

Characteristics of DW

Subject oriented

Subject Oriented

  • Data Warehouse is designed around

    “subjects” rather than processes

  • A company may have

    • Retail Sales System

    • Outlet Sales System

    • Catalog Sales System

  • DW will have a Sales Subject Area

Subject oriented1

OLTP Systems

Catalog Sales


Retail Sales System

Outlet Sales


Sales Subject Area

Data Warehouse

Subject-Oriented Sales Information

Subject Oriented



  • Heterogeneous Source Systems

  • Need to Integrate source data

  • For Example: Product codes could be different in different systems

  • Arrive at common code in DW



  • Information integrated in advance

  • Stored in DW for direct querying and analysis





Integration System

. . .







. . .




Non volatile


  • Operational update of data does not occur in the data warehouse environment.

    • Does not require transaction processing, recovery, and concurrency control mechanisms

    • Requires only two operations in data accessing:

      • initial loading of data and access of data.

Non volatile read mostly









Time variant

Time Variant

  • The time horizon for the data warehouse is significantly longer than that of operational systems.

    • Operational database: current value data.

    • Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)

Time variant1

Time Variant

  • Most business analysis has a time component

  • Trend Analysis (historical data is required)


2001 2002 2003 2004

Data warehousing process overview

Data Warehousing Process Overview

Data warehousing process overview1

Data Warehousing Process Overview

  • The major components of a data warehousing process

    • Data sources

    • Data extraction

    • Data loading

    • Comprehensive Database/Data Store

    • Data Mart

    • Metadata

    • Middleware tools /information delivery tools

Intro to data warehouse


  • Data Extraction

  • Data Cleaning and Transformation

    • Convert from legacy/host format to warehouse format

  • Load

    • Sort, summarize, consolidate, compute views, check integrity, build indexes, partition

The etl process

The ETL Process





DW Database




Intro to data warehouse

Data Staging Area

  • A storage area where extracted data is cleaned, transformed and deduplicated.

  • Initial storage for data

  • Need not be based on Relational model

  • Mainly sorting and Sequential processing

  • Does not provide data access to users

  • Analogy – kitchen of a restaurant

Intro to data warehouse

ETL Process

Issues & Challenges

  • Consumes 70-80% of project time

  • Heterogeneous Source Systems

  • Little or no control over source systems

  • Source systems scattered

  • Different currencies, measurement units

  • Ensuring data quality

Comprehensive database data store

Comprehensive Database/Data Store

  • Mostly a relational DB

    • Oracle, DB2, Sybase, SQL Server

  • New DB design for special purpose of DW (e.g., scale up, speed up, parallel processing)

Data warehouse design

Data Warehouse Design

  • OLTP Systems are Data Capture Systems

  • “DATA IN” systems

  • DW are “DATA OUT” systems



Dimensional modeling

Dimensional Modeling

  • Facts are stored in FACT Tables

  • Dimensions are stored in DIMENSION tables

  • Dimension tables contains textual descriptors of business

  • Fact and dimension tables form a Star Schema

  • “BIG” fact table in center surrounded by “SMALL” dimension tables

Star schema

Star Schema

Intro to data warehouse

Star Schema

Data mart

Data mart

  • Data mart = subset of DW for community users, e.g. accounting department

  • Sometimes exist as Multidimensional Database

  • Info mart = summarized data + report for community users

Meta data

Meta Data

  • Data about data

  • Needed by both information technology personnel and users

  • IT personnel need to know data sources and targets; database, table and column names; refresh schedules; data usage measures; etc.

  • Users need to know entity/attribute definitions; reports/query tools available; report distribution information; help desk contact information, etc.

Information delivery tools

Information Delivery Tools

  • Tools

    • Query & reporting

    • OLAP

    • Data mining, visualization, segmentation, clustering

    • New developments: text mining, web mining & personalization

    • Mining multimedia data

Information delivery tools1

Information Delivery Tools

  • Commercial tools

    • Crystal Report, Impromptu, WebFocus

  • Increasingly common mode of delivery: Web-enabled

Data warehouse architecture

Data Warehouse Architecture

  • Data Flow Architecture

  • System Architecture

Data flow architecture

Data Flow Architecture

Data flow architecture1

Data Flow Architecture

Data flow architecture2

Data Flow Architecture

  • Operational data stores (ODS)

    A type of database often used as an interim area for a data warehouse, especially for customer information files

  • MDB=Multidimensional databases

System architectures

System Architectures

  • Three parts of the data warehouse

    • The data warehouse that contains the data and associated software

    • Data acquisition (back-end) software that extracts data from legacy systems and external sources, consolidates and summarizes them, and loads them into the data warehouse

    • Client (front-end) software that allows users to access and analyze data from the warehouse

System architectures1

System Architectures

System architectures2

System Architectures

System architecture

System Architecture

System architecture1

System Architecture

Data warehouse development

Data Warehouse Development

  • Data warehouse development approaches

    • Inmon Model: EDW approach, Enterprise-wide warehouse, top down

    • Kimball Model: Data mart approach, Data mart, bottom up

  • Which model is best?

    • There is no one-size-fits-all strategy to data warehousing

    • When properly executed, both result in an enterprise-wide data warehouse, but with different architectures

The data mart strategy

The Data Mart Strategy

  • The most common approach

  • Begins with a single mart and architected marts are added over time for more subject areas

  • Relatively inexpensive and easy to implement

  • Can be used as a proof of concept for data warehousing

  • Can perpetuate the “silos of information” problem

  • Can postpone difficult decisions and activities

  • Requires an overall integration plan

The enterprise wide strategy

The Enterprise-wide Strategy

  • A comprehensive warehouse is built initially

  • An initial dependent data mart is built using a subset of the data in the warehouse

  • Additional data marts are built using subsets of the data in the warehouse

  • Like all complex projects, it is expensive, time consuming, and prone to failure

  • When successful, it results in an integrated, scalable warehouse

Dw lifecycle ralph kimball

DW Lifecycle (Ralph Kimball )

Data warehouse development1

Data Warehouse Development

  • Some best practices for implementing a data warehouse (Weir, 2002):

    • Project must fit with corporate strategy and business objectives

    • There must be complete buy-in to the project by executives, managers, and users

    • It is important to manage user expectations about the completed project

    • The data warehouse must be built incrementally

    • Build in adaptability

Data warehouse development2

Data Warehouse Development

  • Some best practices for implementing a data warehouse (Weir, 2002):

    • The project must be managed by both IT and business professionals

    • Develop a business/supplier relationship

    • Only load data that have been cleansed and are of a quality understood by the organization

    • Do not overlook training requirements

    • Be politically aware

  • Login