Chapter 2 data warehousing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

Chapter 2: DATA WAREHOUSING PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on
  • Presentation posted in: General

Chapter 2: DATA WAREHOUSING. FUNDAMENTALS of DATABASE SYSTEMS , Fifth Edition. Who are my customers and what products are they buying?. Which customers are most likely to go to the competition ? . Introduction. What product promotions have the biggest impact on revenue?.

Download Presentation

Chapter 2: DATA WAREHOUSING

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Chapter 2 data warehousing

Chapter 2:DATA WAREHOUSING

FUNDAMENTALS of DATABASE SYSTEMS, Fifth Edition


Introduction

Who are my customers and what products are they buying?

Which customers are most likely to go to the competition ?

Introduction

What product promotions

have the biggest impact on revenue?

What impact will new products/services

have on revenue and margins?

Fundamentals of Database Systems, Fifth Edition


Introduction cont

Introduction (cont.)

  • There is a great need for tools that provide decision makers with information to make decisions quickly and reliably based on historical data.

  • The above functionality is achieved by

    • data warehousing

      • it characterized by subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management's decisions.


Introduction cont1

Introduction (cont.)

  • online analytical processing (OLAP)

    • A term used to describe the analysis of complex data from the data warehouse.

  • and data mining.

    • The process of knowledge discovery

Fundamentals of Database Systems, Fifth Edition


Characteristics of data warehouses subject oriented

Characteristics of Data Warehouses- Subject oriented

  • Organized around major subjects, such as product, sales.

  • Focusing on the modelling and analysis of data for decision makers, not on daily operations or transaction processing.

  • Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision process.


Characteristics of data warehouses integrated

Characteristics of Data Warehouses- integrated

  • Constructed by integrating multiple, heterogeneous data sources.

  • Data cleaning and data integration techniques are applied.


Characteristics of data warehouses time variant

Characteristics of Data Warehouses- Time Variant

  • Data warehouse data : provide information from a historical perspective (e.g., past 5-10 years)

  • Every data in the data warehouse contains an element of time.


Characteristics of data warehouses non volatile

Characteristics of Data Warehouses- Non Volatile

  • Operational update of data doesn’t occur in the data warehouse environment.

    • Doesn't require transaction processing, recovery, and concurrency control mechanism.

    • Require only two operations in data accessing

      • Initial loading of data and quering.


Data warehouse vs operational databases

Data Warehouse vs. operational databases

9


Oltp vs olap

OLTP vs. OLAP


What is a data warehouse a practitioners viewpoint

What is a Data Warehouse?A Practitioners Viewpoint

  • β€œA data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.”

    Barry Devlin, IBM Consultant


What is a data warehouse

What is a Data Warehouse?

Clean

Integrate

Transform

Load

Refresh

client

Data warehouse

Query and analysis tools

Data source in Chicago

Data source in New York

Data source in Taranto

client


3 d data cube

3-D data cube


Example of querying a cube

Example of Querying a Cube

1000

Q1

Q2

Time Dimension

Q3

Grapes

Cherries

Q4

Melons

Apples

ProductsDimension

Sales

Units

Ave

Units

Sales

Dollars

Net

Price

Measures Dimension


From table and spreadsheet to data cubes

From table and spreadsheet to data cubes

  • A data warehouse is based on a multidimensional data model which views data in the form of data cube.

  • A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions.

    • Dimension tables contains descriptions about the subject of the business.

      • such as item (item_name, brand, type) or time (day, week, month, quarter, year)


From table and spreadsheet to data cubes cont

From table and spreadsheet to data cubes (cont.)

  • Fact table contain a factual or quantitative data

    • Fact table also contains measures (such as dollars_sold) and keys to each of the related dimension tables.


4 d data cube

4-D Data cube


Cube a lattice of cuboids

Cube: a lattice of cuboids

0-D (apex) cuboids

1-D cuboids

2-D cuboids

3-D cuboids

4-D (base) cuboids


Conceptual modeling of data warehouses

Conceptual Modeling of Data Warehouses

  • Modeling data warehouses: dimensions & measures

    • Star schema: a fact table in the middle connected to a set of dimension tables.

    • Snowflake schema: a refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension table, forming a shape similar to snowflake.


Conceptual modeling of data warehouses cont

Conceptual Modeling of Data Warehouses (cont.)

  • Fact constellations: multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation


Example of star schema

Example of Star Schema

Measures


Example of snowflake schema

Example of Snowflake Schema

Measures


Example of fact constellation

Example of Fact Constellation

Measures


Cube definition syntax in dmql

Cube definition syntax in DMQL


Defining star schema in dmql

Defining star schema in DMQL


Defining snowflake in dmql

Defining snowflake in DMQL


Defining fact constellation in dmql

Defining fact constellation in DMQL


Measure of data cube three categories

Measure of Data Cube: three categories

  • Distributive: if the result derived by applying the function to n aggregated values is the same as that derived by applying the function on all the data without portioning.

    • E.g., count(), min()


Measure of data cube three categories cont

Measure of Data Cube: three categories (cont.)

  • Algebraic: if it can be computed by an algebraic function with M arguments ( where M is abounded integer), each of which is obtained by applying a distributive aggregated function

    • E.g., avg()

  • Holistic: if there is no constant bound on the storage size needed to describe a sub aggregate

    • Mode(), rank()


Typical olap operations

Typical OLAP operations

  • Roll up ( drill-up) summarize data.

    • By climbing up hierarchy

  • Drill down ( roll down): reverse of roll-up

    • From higher level summary to lower level summary or detailed data.

  • Slice and dice: project and select


Typical olap operations cont

Typical OLAP operations (cont.)

  • Pivot ( rotate)

    • Reorient the cub, visualization, 3D to series of 2D planes

  • Other operations:

    • Drill across: involving ( across) more than one fact table

    • Drill through: through the bottom level of the cube to its back-end relational tables (using sql)


Design of data warehouse a business analysis framework

Design of Data Warehouse: A Business Analysis Framework

  • Four views regarding the design of data warehouse

    • Top down view: allow selection of the relevant information necessary for the data warehouse

    • Data source view: exposes the information being captured, stored, and managed by operational systems

    • Data warehouse view: consists of the fact table and dimension table


Design of data warehouse a business analysis framework cont

Design of Data Warehouse: A Business Analysis Framework (cont.)

  • Business query view: see perspectives of data in the warehouse from the view of end-user


Data warehouse design process

Data Warehouse Design Process

  • Top-down, bottom-up approaches or combination of both

    • Top-down: starts with overall design and planning

    • Bottom-up: starts with experiments and prototypes

  • From software engineering point of view

    • Waterfall: structure and systematic analysis at each step before proceeding to next.


Data warehouse design process cont

Data Warehouse Design Process (cont.)

  • Spiral : rapid generation of increasingly function systems, quick turn around.


Data warehouse design process cont1

Data Warehouse Design Process (cont.)

  • Typical data warehouse design process:

    • Choose a business process to model. E.g., orders, invoice, etc

    • Choose the grain (atomic level of data) of the business process

    • Choose the dimension that will apply to each fact table record

    • Choose measure that will populate each fact table record


Three data warehouse models

Three Data Warehouse Models

  • Enterprise warehouse

    • Collect all of the organization about subjects spanning the entire organization

  • Data Mart:

    • A subset of corporate- wide data that is of value to specific group of users.

  • Virtual warehouse

    • Set of views over operational databases


Data warehouse back end tools and utilities

Data Warehouse Back-End Tools and Utilities

  • Data extraction

  • Data cleaning

  • Data transformation

  • Load

  • refresh


  • Login