designing a data warehouse l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Designing a Data Warehouse PowerPoint Presentation
Download Presentation
Designing a Data Warehouse

Loading in 2 Seconds...

play fullscreen
1 / 33

Designing a Data Warehouse - PowerPoint PPT Presentation


  • 178 Views
  • Uploaded on

Designing a Data Warehouse. Issues in DW design. Data Warehouse. A read-only database for decision analysis Subject Oriented Integrated Time variant Nonvolatile consisting of time stamped operational and external data. . Highly tuned Real time Data Detailed records Current values

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Designing a Data Warehouse


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
designing a data warehouse

Designing a Data Warehouse

Issues in DW design

data warehouse
Data Warehouse

A read-only database for decision analysis

  • Subject Oriented
  • Integrated
  • Time variant
  • Nonvolatile

consisting of time stamped operational and external data.

data warehouse vs operational databases
Highly tuned

Real time Data

Detailed records

Current values

Accesses small amounts of data in a predictable manner

Flexible access

Consistent timing

Summarized as appropriate

Historical

Access large amounts of data in unexpected ways

Data Warehouse vsOperational Databases
data warehouse purpose
Data Warehouse Purpose
  • Identify problems in time to avoid them
  • Locate opportunities you might otherwise miss
data warehouse new approach
Data Warehouse:New Approach

An old idea with a new interest because of:

Cheap Computing Power

Special Purpose Hardware

New Data Structures

Intelligent Software

warehousing problems
Warehousing Problems

Business Issues

Data Quantity

Data Accuracy

Maintenance

Ownership

Cost

warehousing problems8
Warehousing Problems

Business Issues

Database Issues

DBMS Software

Technology

Complexity

warehousing problems9
Warehousing Problems

Business Issues

Data Issues

Analysis Issues

User Interface

Intelligent Processing

three approaches
Three Approaches
  • Classical Enterprise Database

Contains operational data from all areas of the organization.

  • Data Mart

Extracted and managerial support data designed for departmental or EUC applications

  • Data Package

Data required for a specific application

three fundamental processes
Three Fundamental Processes
  • Data Acquisition
  • Data Storage
  • Data a
  • Access
data acquisition
Data Acquisition
  • Handles acquisition of data from legacy systems and outside sources.
  • Data is identified, copied, formatted and prepared for loading into the warehouse.
acquisition steps
Acquisition steps
  • Catalog the data
    • Develop an inventory of where it is and what it means.
  • Clean and prepare the data.
    • Extract from legacy files and reformat to make it usable.
  • Transport data from one location to another.
storage
Storage

The storage component holds the data so that the many different data mining, executive information and decision support systems can make use of it effectively.

the storage area
The Storage Area

Managed by

  • Relational databases
    • like those from Oracle Corp. or Informix Software Inc.
  • Specialized hardware
    • symmetric multiprocessor (SMP)
    • or massively parallel processor (MPP) machines
storage20
Storage
  • The majority of warehouse storage today is being managed by relational databases running on Unix platforms.
  • Oracle, Sybase Inc., IBM Corp. and Informix control 65 percent of the warehouse storage market. Meta Group Inc. (1996)
access
Access
  • Different end-user PCs and workstations draw data from the warehouse with the help of multidimensional analysis products, neural networks, data discovery tools or analysis tools.
  • These powerful, "smart" software products are the real driving force behind the viability of data warehousing.
access tools
Access Tools
  • Intelligent Agents and Agencies
  • Query Facilities and Managed Query Environments
  • Statistical Analysis
  • Data Discovery.

(decision support, artificial intelligence and expert systems)

  • OLAP
  • Data Visualization
hardware budget
Hardware Budget
  • A typical startup warehouse project allocates more than 60 percent of its budget for hardware and software to the creation of a powerful storage component, spending just 30 percent on data mining and user access technologies.
systems analysis budget
Systems Analysis Budget

Budgeting for systems analysis and development, however, follows a very different pattern.

  • More than 50 percent of development dollars are spent on building acquisition capabilities,
  • 30 percent fund the development of user solutions and
  • 20 percent are dedicated to the creation of databases in the storage component.
design issues
Design Issues

Relational and Multidimensional Models

  • Denormalized and indexed relational models more flexible
  • Multidimensional models simpler to use and more efficient
star schemas in a rdbms
Star Schemas in a RDBMS

In most companies doing ROLAP, the DBAs have created countless indexes and summary tables in order to avoid I/O-intensive table scans against large fact tables. As the indexes and summary tables proliferate in order to optimize performance for the known queries and aggregations that the users perform, the build times and disk space needed to create them has grown enormously, often requiring more time than is allotted and more space than the original data!

building a data warehouse from a normalized database
Building a Data Warehouse from a Normalized Database

The steps

  • Develop a normalized entity-relationship business model of the data warehouse.
  • Translate this into a dimensional model. This step reflects the information and analytical characteristics of the data warehouse.
  • Translate this into the physical model. This reflects the changes necessary to reach the stated performance objectives.
the business model
The Business Model

Identify the data structure, attributes and constraints for the client’s data warehousing environment.

  • Stable
  • Optimized for update
  • Flexible
business model
Business Model

As always in life, there are some disadvantages to 3NF:

  • Performance can be truly awful. Most of the work that is performed on denormalizing a data model is an attempt to reach performance objectives.
  • The structure can be overwhelmingly complex. We may wind up creating many small relations which the user might think of as a single relation or group of data.
structural dimensions
Structural Dimensions
  • The first step is the development of the structural dimensions. This step corresponds very closely to what we normally do in a relational database.
  • The star architecture that we will develop here depends upon taking the central intersection entities as the fact tables and building the foreign key => primary key relations as dimensions.
other dimensions
Other Dimensions
  • Categorical dimensions: generated groups (additional key components)
  • Partitioning dimensions: subtypes (planned vs. actual)
  • Informational dimensions: generate different types of data (messy).