chapter 2 data warehousing
Skip this Video
Download Presentation

Loading in 2 Seconds...

play fullscreen
1 / 38

Chapter 2: DATA WAREHOUSING - PowerPoint PPT Presentation

  • Uploaded on

Chapter 2: DATA WAREHOUSING. FUNDAMENTALS of DATABASE SYSTEMS , Fifth Edition. Who are my customers and what products are they buying?. Which customers are most likely to go to the competition ? . Introduction. What product promotions have the biggest impact on revenue?.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Chapter 2: DATA WAREHOUSING' - calder

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
chapter 2 data warehousing



Who are my customers and what products are they buying?

Which customers are most likely to go to the competition ?


What product promotions

have the biggest impact on revenue?

What impact will new products/services

have on revenue and margins?

Fundamentals of Database Systems, Fifth Edition

introduction cont
Introduction (cont.)
  • There is a great need for tools that provide decision makers with information to make decisions quickly and reliably based on historical data.
  • The above functionality is achieved by
    • data warehousing
      • it characterized by subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management's decisions.
introduction cont1
Introduction (cont.)
  • online analytical processing (OLAP)
    • A term used to describe the analysis of complex data from the data warehouse.
  • and data mining.
    • The process of knowledge discovery

Fundamentals of Database Systems, Fifth Edition

characteristics of data warehouses subject oriented
Characteristics of Data Warehouses- Subject oriented
  • Organized around major subjects, such as product, sales.
  • Focusing on the modelling and analysis of data for decision makers, not on daily operations or transaction processing.
  • Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision process.
characteristics of data warehouses integrated
Characteristics of Data Warehouses- integrated
  • Constructed by integrating multiple, heterogeneous data sources.
  • Data cleaning and data integration techniques are applied.
characteristics of data warehouses time variant
Characteristics of Data Warehouses- Time Variant
  • Data warehouse data : provide information from a historical perspective (e.g., past 5-10 years)
  • Every data in the data warehouse contains an element of time.
characteristics of data warehouses non volatile
Characteristics of Data Warehouses- Non Volatile
  • Operational update of data doesn’t occur in the data warehouse environment.
    • Doesn't require transaction processing, recovery, and concurrency control mechanism.
    • Require only two operations in data accessing
      • Initial loading of data and quering.
what is a data warehouse a practitioners viewpoint
What is a Data Warehouse?A Practitioners Viewpoint
  • “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.”

Barry Devlin, IBM Consultant

what is a data warehouse
What is a Data Warehouse?







Data warehouse

Query and analysis tools

Data source in Chicago

Data source in New York

Data source in Taranto


example of querying a cube
Example of Querying a Cube




Time Dimension
















Measures Dimension

from table and spreadsheet to data cubes
From table and spreadsheet to data cubes
  • A data warehouse is based on a multidimensional data model which views data in the form of data cube.
  • A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions.
    • Dimension tables contains descriptions about the subject of the business.
      • such as item (item_name, brand, type) or time (day, week, month, quarter, year)
from table and spreadsheet to data cubes cont
From table and spreadsheet to data cubes (cont.)
  • Fact table contain a factual or quantitative data
    • Fact table also contains measures (such as dollars_sold) and keys to each of the related dimension tables.
cube a lattice of cuboids
Cube: a lattice of cuboids

0-D (apex) cuboids

1-D cuboids

2-D cuboids

3-D cuboids

4-D (base) cuboids

conceptual modeling of data warehouses
Conceptual Modeling of Data Warehouses
  • Modeling data warehouses: dimensions & measures
    • Star schema: a fact table in the middle connected to a set of dimension tables.
    • Snowflake schema: a refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension table, forming a shape similar to snowflake.
conceptual modeling of data warehouses cont
Conceptual Modeling of Data Warehouses (cont.)
  • Fact constellations: multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation
measure of data cube three categories
Measure of Data Cube: three categories
  • Distributive: if the result derived by applying the function to n aggregated values is the same as that derived by applying the function on all the data without portioning.
    • E.g., count(), min()
measure of data cube three categories cont
Measure of Data Cube: three categories (cont.)
  • Algebraic: if it can be computed by an algebraic function with M arguments ( where M is abounded integer), each of which is obtained by applying a distributive aggregated function
    • E.g., avg()
  • Holistic: if there is no constant bound on the storage size needed to describe a sub aggregate
    • Mode(), rank()
typical olap operations
Typical OLAP operations
  • Roll up ( drill-up) summarize data.
    • By climbing up hierarchy
  • Drill down ( roll down): reverse of roll-up
    • From higher level summary to lower level summary or detailed data.
  • Slice and dice: project and select
typical olap operations cont
Typical OLAP operations (cont.)
  • Pivot ( rotate)
    • Reorient the cub, visualization, 3D to series of 2D planes
  • Other operations:
    • Drill across: involving ( across) more than one fact table
    • Drill through: through the bottom level of the cube to its back-end relational tables (using sql)
design of data warehouse a business analysis framework
Design of Data Warehouse: A Business Analysis Framework
  • Four views regarding the design of data warehouse
    • Top down view: allow selection of the relevant information necessary for the data warehouse
    • Data source view: exposes the information being captured, stored, and managed by operational systems
    • Data warehouse view: consists of the fact table and dimension table
design of data warehouse a business analysis framework cont
Design of Data Warehouse: A Business Analysis Framework (cont.)
  • Business query view: see perspectives of data in the warehouse from the view of end-user
data warehouse design process
Data Warehouse Design Process
  • Top-down, bottom-up approaches or combination of both
    • Top-down: starts with overall design and planning
    • Bottom-up: starts with experiments and prototypes
  • From software engineering point of view
    • Waterfall: structure and systematic analysis at each step before proceeding to next.
data warehouse design process cont
Data Warehouse Design Process (cont.)
  • Spiral : rapid generation of increasingly function systems, quick turn around.
data warehouse design process cont1
Data Warehouse Design Process (cont.)
  • Typical data warehouse design process:
    • Choose a business process to model. E.g., orders, invoice, etc
    • Choose the grain (atomic level of data) of the business process
    • Choose the dimension that will apply to each fact table record
    • Choose measure that will populate each fact table record
three data warehouse models
Three Data Warehouse Models
  • Enterprise warehouse
    • Collect all of the organization about subjects spanning the entire organization
  • Data Mart:
    • A subset of corporate- wide data that is of value to specific group of users.
  • Virtual warehouse
    • Set of views over operational databases
data warehouse back end tools and utilities
Data Warehouse Back-End Tools and Utilities
  • Data extraction
  • Data cleaning
  • Data transformation
  • Load
  • refresh