chapter 5 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Chapter 5 PowerPoint Presentation
Download Presentation
Chapter 5

Loading in 2 Seconds...

play fullscreen
1 / 56

Chapter 5 - PowerPoint PPT Presentation


  • 191 Views
  • Uploaded on

Data Warehouse Fundamentals. Chapter 5. Analyzing and Defining Business Requirements for a Data Warehouse. Paul K Chen. 1. Chapter 5- Objectives. Learn what is the definition of the business requirements Understand the role of business dimensions related to DW business requirements

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Chapter 5' - beverly


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
chapter 5

Data Warehouse Fundamentals

Chapter 5

Analyzing and Defining Business Requirements for a Data Warehouse

Paul K Chen

1

chapter 5 objectives
Chapter 5- Objectives
  • Learn what is the definition of the business requirements
  • Understand the role of business dimensions related to DW business requirements
  • Learn specifically the steps in defining and recording DW business requirements
  • Review methods for gathering requirements (JAD, Interviews and Sampling)
  • Discuss briefly architecture concepts impacted by business requirements
definition of the business requirements
Definition of The Business Requirements
  • The definition of requirements is the user’s statement of how he or she wants to do business, and the information required to support his or her new methods of operations.
definition of the business requirements4
Definition of The Business Requirements

The requirements can be broadly divided into two areas:

1 Functional requirements—written in user terminology since it is user operations that are being described.

2 Non-functional requirements –these are the limitations and demands imposed upon the computing solutions; such as architectural plan, data storage specifications and information system performance expectations.

requirements as the driving force for data warehousing
Requirements As the Driving Force for Data Warehousing
  • Understand why business requirements are the driving force
  • Discuss how requirements drive every development phase
  • Specifically Learn how requirements influence data design
  • Review the impact of requirements on architecture
  • Note the special considerations for ETL and metadata
  • Examine how requirements shape information delivery
business requirements as the driving force
Business Requirements As the Driving Force

Business Requirements

Maintenance

Planning &

Management

Deployment

Design

Architecture Infrastructure

Data Acquisition

Data Storage

Information Delivery

Construction

Architecture Infrastructure

Data Acquisition

Data Storage

Information Delivery

dimensional nature of business data
Dimensional Nature of Business Data

The business data of sales units (fact) is measured and analyzed

in three dimensional.

Product

Geography

Time

examples of business facts and dimensions
Examples of Business Facts and Dimensions

Manufacturing Company

Supermarket Chain

Supermarket Chain

Cust-ship-to

Time

Time

Promotion

Ship from

Sale Unit

Shipment

Ship Mode

Store

Product

Deal

Product

Airline Company

Insurance Business

Customer

Time

Agent

Time

Flight

Claim

Frequent

Flyer Flights

Claims

Fare Class

Insured

Party

Status

Airport

Status

Policy

defining and recording information requirements for a data warehouse
Defining and Recording Information Requirements for a Data Warehouse
  • Nine-Step Methodology includes the following steps:

Step 1: Choosing the process

Step 2: Choosing the grain

Step 3: Identifying and conforming the dimensions

Step 4: Choosing the facts

Step 5: Storing pre-calculations in the fact table

Step 6: Rounding out the dimension tables

Step 7: Choosing the duration of the database

Step 8: Tracking slowly changing dimensions

Step 9: Deciding the query priorities and the query modes.

step 1 choosing the process subject area
Step 1: Choosing The Process (Subject Area)
  • The process (function) refers to the subject matter of a particular data mart.
  • First data mart built should be the one that is most likely to be delivered on time, within budget, and to answer the most commercially important business questions.
subject area
Subject Area
  • Selecting the first subject area or areas to be populated

Use the enterprise level data model in selecting appropriate subject area(s)

  • Three Options:

-- Implement a single subject area (best option)

-- Implement a subset of a subject area

-- Implement a subset of several subject areas (most common)

  • Determine how much data should be loaded and its variety
step 2 choosing the grain
Step 2: Choosing The Grain
  • Decide what a record of the fact table is to represent.
  • Identify dimensions of the fact table. The grain decision for the fact table also determines the grain of each dimension table.
  • Also include time as a core dimension, which is always present in star schemas. Due to disk space constraint, data selected must be time relevant in terms of trend, predictability, and profitability for the enterprise.
step 3 identifying and conforming the dimensions
Step 3: Identifying And Conforming The Dimensions
  • Dimensions set the context for asking questions about the facts in the fact table.
  • If any dimension occurs in two data marts, they must be exactly the same dimension, or one must be a mathematical subset of the other.
  • A dimension used in more than one data mart is referred to as being conformed (shared).
step 4 choosing the facts
Step 4: Choosing The Facts
  • The grain of the fact table determines which facts can be used in the data mart.
  • Facts should be numeric and additive.
  • Unusable facts include:
    • non-numeric facts
    • non-additive facts
    • fact at different granularity from other facts in table.
fact criteria
Fact Criteria

Weight the Fact attributes based upon the following

criteria:

  • They exhibit measurable results to the Users and Management.
  • They are visible within the business and through management.
  • They are manageable.
subject area16
Subject Area

Subject areas are collections of like data that support

analysis of the major subjects in a business. Election

criteria:

  • They consist of two or more attributes.
  • They are essential to the successful operation of the target system or business area to meet client objectives.
  • They can be defined by governing business rules.
step 5 storing pre calculations in the fact table
Step 5: Storing Pre-Calculations In The Fact Table
  • Once the facts have been selected each should be re-examined to determine whether there are opportunities to use pre-calculations.
add derived data
ADD DERIVED DATA
  • Benefits

Less space used

Enhanced performance

Breaking_lease

Percentage_of_breaking_lease (< 3_months)

Percentage_of_breaking_lease (>3 but < 6 months)

Percentage_of_breaking_lease (>6 but <9 months)

Percentage_of_breaking_lease (>9 but <12 months)

Percentage_of_breaking_lease (> 12 months)

add summarization schemes
Add Summarization Schemes
  • Simple summation
  • Summation by group
  • Aggregation
  • Vertical summarization
simple summation add summarization schema
Simple Summation --Add Summarization Schema

Individual Daily sales

Date Product Qty Sales $

Jan 1 nuts 100 300

Jan 1 nuts 200 600

Jan 2 nuts 300 900

Jan 2 nuts 100 300

Jan 3 Nuts 50 150

Jan 3 Nuts 40 120

Daily Sales Summary

Date Product Qty Sales $

Jan 1 Nuts 300 900

Jan 2 Nuts 400 1,200

Jan 3 Nuts 90 20

summation by group
Summation By Group

Group data attributes based on usage and

stability.

  • Group stable and slowly changing data all in one table
  • Group unstable and frequently changing data all in another table
aggregation
Aggregation

Aggregation is used to create data marts.

For instance, a group of users frequently perform analysis comparing sales across geographic regions, broken by product line. If a data mart were created that stores the sales data already aggregated to the desired level, the users’ queries would be simpler.

aggregation23
Aggregation

Add up amounts by day

in sql: SELECT date, sum (amt) FROM SALE

GROUP BY date

p

sale

Store

date

amt

  • ans

date

sum

p1

p2

p1

p2

1

1

2

1

1

2

4

3

c1

c2

c3

c1

6

4

1

2

Roll Up

Drill Down

vertical summarization
Vertical Summarization

Summarization building upon a single dimensional

theme:

Monthly renters

  • Total # of all renters
  • Total # of new renters
  • Total rental income

Monthly sales

  • Staff name
  • Total sales $
  • Total houses sold
step 6 rounding out the dimension tables
Step 6: Rounding Out The Dimension Tables
  • Text descriptions are added to the dimension tables.
  • Text descriptions should be as intuitive and understandable to the users as possible.
  • Usefulness of a data mart is determined by the scope and nature of the attributes of the dimension tables.
step 7 choosing the duration of the database
Step 7: Choosing The Duration Of The Database
  • Duration measures how far back in time the fact table goes. For ex. Insurance &Tax Considerations.
  • Very large fact tables raise at least two very significant data warehouse design issues.
    • Often difficult to source increasing old data.
    • It is mandatory that the old versions of the important dimensions be used, not the most current versions, known as the ‘Slowly Changing Dimension’ problem.
step 8 tracking slowly changing dimensions
Step 8: Tracking Slowly Changing Dimensions
  • Slowly changing dimension problem means that the proper description of the old dimension data must be used with old fact data.
  • Often, a generalized key must be assigned to important dimensions in order to distinguish multiple snapshots of dimensions over a period of time.
step 9 deciding the query priorities and the query modes
Step 9: Deciding The Query Priorities And The Query Modes
  • Most critical physical design issues affecting the end-user’s perception includes:
    • physical sort order of the fact table on disk
    • presence of pre-stored summaries or aggregations.
  • Additional physical design issues include administration, backup, indexing performance, and security.
criteria for assessing the dimensionality of a data warehouse
Criteria For Assessing The Dimensionality Of A Data Warehouse
  • Criteria proposed by Ralph Kimball to measure the extent to which a system supports the dimensional view of data warehousing.
  • Twenty criteria divided into three broad groups: architecture, administration, and expression.
architectural criteria
Architectural Criteria

Architectural criteria describes way the entire system

is organized.

  • Explicit declaration
  • Conformed dimensions and facts
  • Dimensional integrity
  • Open aggregate navigation
  • Dimensional symmetry
  • Dimensional scalability
  • Sparsity tolerance
administration criteria
Administration Criteria

Administration criteria are considered to be essential

to the ‘smooth running’ of a dimensionally-oriented

data warehouse.

  • Graceful modification
  • Dimensional replication
  • Changed dimension notification
  • Surrogate key administration
  • International consistency
expression criteria
Expression Criteria

Expression criteria are mostly analytic capabilities

that are needed in real-life situations.

  • Multiple-dimension hierarchies
  • Ragged-dimension hierarchies
  • Multiple valued dimensions
  • Slowly changing dimensions
  • Roles of a dimension
  • Hot-swappable dimensions
  • On-the-fly fact range dimension
  • On-the-fly behaviour dimension
business requirements use automaker sales as an example
Business Requirements (Use Automaker Sales as an example)

In order to get an idea of the data to be used by the sales and

Inventory department, a facilitation session was held with

15 key end users and the IT data warehouse team. The

following business questions were generated from that

meeting:

  • What is the sales trend in quantity and dollar amounts sold each Make, Model, Series and Color for a specific dealer, for each
matching user requirements to dw data requirements develop fact table
Matching User Requirements to DW Data Requirements (Develop Fact Table)

Primary Key

  • dealer_id
  • month_year
  • sales_area_id
  • make
  • model
  • series
determine dimensions attributes
Determine Dimensions & Attributes

Dimensions

  • sales_area_dim
  • sales_time_dim
  • dealer_dim

Attributes

  • dealer_mms_sales_qty
  • dealer_mms_sales_dollar_amt
  • dealer_ytd_mms_sales_qty
  • dealer_ytd_mms_sales_amt
  • dealer_inventory_qty
collecting the business requirements via jad sessions
Collecting The Business Requirements via JAD Sessions

JAD (Joint Application Development) vs. Traditional Way of

Gathering Requirements

  • JAD sessions (also called facilitated session) are used to gather information and feedback and confirm the results of requirements gathering.
  • JAD sessions replace the traditional way of conducting a series of interviews on a one-to-one basis with the users.

Advantages: Achieving consensus during the session when

multiple sources of information exist, raising and

addressing issues or assigning them for resolution, and

immediately confirming information.

jad session
JAD Session
  • JAD sessions are used to scope the project. Each session should last two to three day. They are very focused and fast-paced.
  • JAD sessions can be very formal and follow strict guidelines or be informal group sessions.
jad roles
JAD - Roles

Whether they are formal or informal, there are four

necessary roles to be filled:

  • Facilitator

The Facilitator is the session leader. It is the facilitator’s

responsibility to ensure that the objectives of the sessions

are met.

  • Scribes(s)

Scribes are responsible for recording the minutes of the

session and optionally constructing deliverables using an

automated tool as the session progresses.

jad roles40
JAD - Roles
  • User

The users provide knowledge specific to the scope of the

project.

  • Developers

Developers are the team members who will be building

the system.

jad session41
JAD Session

The session is divided into three segments:

  • Introduction: Welcoming remarks; description of the facilities such as rest room locations, messages, reviewing the agenda and setting expectations.
  • Conducting the session: To confirm deliverables set out in the session objectives.
  • Wrapping up the session: By summarizing progress towards the objectives; reviewing the agenda for the next one and obtaining feedback from the participants.
jad session42
JAD Session

Potential drawbacks

  • The commitment of a large block of time for all participants
  • Requirements collected could be less than satisfactory due to unpredictability of the JAD session or organizational culture not sufficiently developed to enable the concerted efforts required to be productive in a JAD setting.
five steps in interview preparation
Five Steps in Interview Preparation
  • Reading background material
  • Establishing interview objectives
  • Deciding when to interview
  • Preparing the interviewee
  • Deciding on question type and structure
two types of questions open end questions vs closed questions
Two Types of Questions – Open-End Questions vs. Closed Questions

Open-ended interview questions

Open describes the interviewee’s options for responding.

They are open.

Advantages:

  • Putting the interviewee at ease
  • Allowing more spontaneity

Disadvantages:

  • Possibly losing control of the interview
  • May not get the types of answers you want
two types of questions open end questions vs closed questions45
Two Types of Questions – Open-End Questions vs. Closed Questions

Closed interview questions

Such as “ How many subordinates do you have?

Benefits:

  • Getting to relevant data
  • Keeping control over the interview

Drawbacks:

  • Failing to obtain rich detail
  • Intimidating the interviewee
three basic ways of structuring interviews
Three Basic Ways of Structuring Interviews
  • Pyramid Structure: Starting from closed questions, then gradually expand into open territory.
  • Funnel Structure: The reverse of pyramid structure approach.
  • Diamond-Shaped: A combination of the two above structures.
the needs for sampling
The Needs for Sampling
  • Containing costs
  • Speeding up the data gathering
  • Improving effectiveness
  • Reducing bias
sampling design
Sampling Design

Four steps:

  • Determine the data to be collected or described
  • Determine the population to be sampled
  • Choose the type of sample
  • Decide on the sample size
kinds of information sought in investigation
Kinds of Information Sought in Investigation

Type of hard data (other than interviewing and observation) - Quantitative Data

  • Reports for decision making
  • Performance reports
  • Records
  • Data capture forms
kinds of information sought in investigation50
Kinds of Information Sought in Investigation

Qualitative Data

  • Memos
  • Signs in bulletin boards and in work areas
  • Corporate websites
  • Manuals
  • Policy handbooks
the architectural plan non functional requirements
The Architectural Plan (Non-Functional Requirements)
  • Client / Server Architecture
  • Data Warehouse Parallel Database Technology
  • RAID Technology

Relevant Architecture Concepts Impacted by

Requirements

client server architecture
Client / Server Architecture

Machine Configuration Example

slide53
Client / Server architecture provides flexibility to support different combinations of host machine configurations.

3 Tier hosting configuration for Typical Queries (or any process that makes a high volume of DB calls) suffers in performance due to Network Messaging overhead.

slide54

2 Tier hosting configuration supports the performance requirements of Typical Queries by eliminating 99.8% of Network Messaging overhead.

data warehouse parallel database technologies
Data Warehouse Parallel Database Technologies
  • Shared memory architecture (SMP)
    • All the servers share all the data
  • Shared nothing architecture (MPP)
    • Each server has its own partition of data
raid technology
RAID Technology
  • RAID 0: Sector interleave, no error checking (no redundancy)
  • RAID 1: Mirroring (duplicate copy)
  • RAID 2: Bit interleave with error correction codes on multiple drives
  • RAID 3: Bit interleave with error correction on single drive
  • RAID 4: Sector interleave with dedicated parity drive
  • RAID 5: Sector interleave, parity stored on all drives