Chapter 2 data warehousing
1 / 38

Chapter 2: DATA WAREHOUSING - PowerPoint PPT Presentation

  • Uploaded on

Chapter 2: DATA WAREHOUSING. FUNDAMENTALS of DATABASE SYSTEMS , Fifth Edition. Who are my customers and what products are they buying?. Which customers are most likely to go to the competition ? . Introduction. What product promotions have the biggest impact on revenue?.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Chapter 2: DATA WAREHOUSING' - calder

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Chapter 2 data warehousing




Who are my customers and what products are they buying?

Which customers are most likely to go to the competition ?


What product promotions

have the biggest impact on revenue?

What impact will new products/services

have on revenue and margins?

Fundamentals of Database Systems, Fifth Edition

Introduction cont
Introduction (cont.)

  • There is a great need for tools that provide decision makers with information to make decisions quickly and reliably based on historical data.

  • The above functionality is achieved by

    • data warehousing

      • it characterized by subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management's decisions.

Introduction cont1
Introduction (cont.)

  • online analytical processing (OLAP)

    • A term used to describe the analysis of complex data from the data warehouse.

  • and data mining.

    • The process of knowledge discovery

Fundamentals of Database Systems, Fifth Edition

Characteristics of data warehouses subject oriented
Characteristics of Data Warehouses- Subject oriented

  • Organized around major subjects, such as product, sales.

  • Focusing on the modelling and analysis of data for decision makers, not on daily operations or transaction processing.

  • Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision process.

Characteristics of data warehouses integrated
Characteristics of Data Warehouses- integrated

  • Constructed by integrating multiple, heterogeneous data sources.

  • Data cleaning and data integration techniques are applied.

Characteristics of data warehouses time variant
Characteristics of Data Warehouses- Time Variant

  • Data warehouse data : provide information from a historical perspective (e.g., past 5-10 years)

  • Every data in the data warehouse contains an element of time.

Characteristics of data warehouses non volatile
Characteristics of Data Warehouses- Non Volatile

  • Operational update of data doesn’t occur in the data warehouse environment.

    • Doesn't require transaction processing, recovery, and concurrency control mechanism.

    • Require only two operations in data accessing

      • Initial loading of data and quering.

What is a data warehouse a practitioners viewpoint
What is a Data Warehouse?A Practitioners Viewpoint

  • “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.”

    Barry Devlin, IBM Consultant

What is a data warehouse
What is a Data Warehouse?







Data warehouse

Query and analysis tools

Data source in Chicago

Data source in New York

Data source in Taranto


Example of querying a cube
Example of Querying a Cube




Time Dimension
















Measures Dimension

From table and spreadsheet to data cubes
From table and spreadsheet to data cubes

  • A data warehouse is based on a multidimensional data model which views data in the form of data cube.

  • A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions.

    • Dimension tables contains descriptions about the subject of the business.

      • such as item (item_name, brand, type) or time (day, week, month, quarter, year)

From table and spreadsheet to data cubes cont
From table and spreadsheet to data cubes (cont.)

  • Fact table contain a factual or quantitative data

    • Fact table also contains measures (such as dollars_sold) and keys to each of the related dimension tables.

Cube a lattice of cuboids
Cube: a lattice of cuboids

0-D (apex) cuboids

1-D cuboids

2-D cuboids

3-D cuboids

4-D (base) cuboids

Conceptual modeling of data warehouses
Conceptual Modeling of Data Warehouses

  • Modeling data warehouses: dimensions & measures

    • Star schema: a fact table in the middle connected to a set of dimension tables.

    • Snowflake schema: a refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension table, forming a shape similar to snowflake.

Conceptual modeling of data warehouses cont
Conceptual Modeling of Data Warehouses (cont.)

  • Fact constellations: multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation

Measure of data cube three categories
Measure of Data Cube: three categories

  • Distributive: if the result derived by applying the function to n aggregated values is the same as that derived by applying the function on all the data without portioning.

    • E.g., count(), min()

Measure of data cube three categories cont
Measure of Data Cube: three categories (cont.)

  • Algebraic: if it can be computed by an algebraic function with M arguments ( where M is abounded integer), each of which is obtained by applying a distributive aggregated function

    • E.g., avg()

  • Holistic: if there is no constant bound on the storage size needed to describe a sub aggregate

    • Mode(), rank()

Typical olap operations
Typical OLAP operations

  • Roll up ( drill-up) summarize data.

    • By climbing up hierarchy

  • Drill down ( roll down): reverse of roll-up

    • From higher level summary to lower level summary or detailed data.

  • Slice and dice: project and select

Typical olap operations cont
Typical OLAP operations (cont.)

  • Pivot ( rotate)

    • Reorient the cub, visualization, 3D to series of 2D planes

  • Other operations:

    • Drill across: involving ( across) more than one fact table

    • Drill through: through the bottom level of the cube to its back-end relational tables (using sql)

Design of data warehouse a business analysis framework
Design of Data Warehouse: A Business Analysis Framework

  • Four views regarding the design of data warehouse

    • Top down view: allow selection of the relevant information necessary for the data warehouse

    • Data source view: exposes the information being captured, stored, and managed by operational systems

    • Data warehouse view: consists of the fact table and dimension table

Design of data warehouse a business analysis framework cont
Design of Data Warehouse: A Business Analysis Framework (cont.)

  • Business query view: see perspectives of data in the warehouse from the view of end-user

Data warehouse design process
Data Warehouse Design Process (cont.)

  • Top-down, bottom-up approaches or combination of both

    • Top-down: starts with overall design and planning

    • Bottom-up: starts with experiments and prototypes

  • From software engineering point of view

    • Waterfall: structure and systematic analysis at each step before proceeding to next.

Data warehouse design process cont
Data Warehouse Design Process (cont.) (cont.)

  • Spiral : rapid generation of increasingly function systems, quick turn around.

Data warehouse design process cont1
Data Warehouse Design Process (cont.) (cont.)

  • Typical data warehouse design process:

    • Choose a business process to model. E.g., orders, invoice, etc

    • Choose the grain (atomic level of data) of the business process

    • Choose the dimension that will apply to each fact table record

    • Choose measure that will populate each fact table record

Three data warehouse models
Three Data Warehouse Models (cont.)

  • Enterprise warehouse

    • Collect all of the organization about subjects spanning the entire organization

  • Data Mart:

    • A subset of corporate- wide data that is of value to specific group of users.

  • Virtual warehouse

    • Set of views over operational databases

Data warehouse back end tools and utilities
Data Warehouse Back-End Tools and Utilities (cont.)

  • Data extraction

  • Data cleaning

  • Data transformation

  • Load

  • refresh