Class agenda 02 13 2014
Download
1 / 30

Class Agenda: 02/13/2014 - PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on

Class Agenda: 02/13/2014. Review Goals of assignments. Technology: SQL Server, Tableau Internal Data Project Questions about assignments Discuss process of data warehouse design Discuss issues in data warehouse design Contrast different approaches to data warehouse design

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Class Agenda: 02/13/2014' - gale


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Class agenda 02 13 2014
Class Agenda: 02/13/2014

  • Review Goals of assignments.

    • Technology: SQL Server, Tableau

    • Internal Data Project

  • Questions about assignments

  • Discuss process of data warehouse design

  • Discuss issues in data warehouse design

  • Contrast different approaches to data warehouse design

  • Design a data warehouse


Goals for data warehouse design
Goals for data warehouse design

  • Make complete and accurate information easily accessible.

  • Present information consistently.

  • Be adaptive and flexible to change.

  • Provide reasonable and expected performance for information to support decision making.

  • Protect/secure information.


How do we achieve those goals
How do we achieve those goals?

  • Use systems analysis and design techniques.

  • Have domain knowledge of required decision support systems.

  • Model the data in a variety of different forms.

  • Appropriate use (or non-use) of normalization.

  • Use an appropriate DBMS for implementation.


Three different general data models
Three different “general” data models

  • Transaction (operational) data model: Contains current data required by separate and/or integrated operational systems. Supports the transactional processing of the organization. Is frequently used to support day-to-day decision making. 3rd normal form. Does not usually contain external data.

  • Reconciled (enterprise data warehouse) data model: Contains detailed, current data intended to be the single, authoritative source for all decision support applications. Usually in 3rd normal form. May contain data generated externally from the organization.

  • Derived (data mart) data model: Contains data that are selected, formatted and aggregated for end-user decision support applications. Star or snowflake schema. May not be normalized. May contain data generated externally from the organization.


Reconciled and derived data models

Reconciled (EDW)

Independent of specific decisions

Centralized control; usually owned by IT

Historical

Not usually summarized

Normalized

Flexible

Many data sources

Long life

Starts large, becomes larger

Derived (Data Mart)

Specific decisions

One central subject

Usually accessed directly by users; usually decentralized into user area

Closely defined subject area

Detailed and/or summarized

Usually denormalized

Restrictive – few sources

Short life span

Starts small, becomes large

Reconciled and Derived Data Models


Two general approaches to design

Enterprise Data Warehouse (Bill Inmon)

Focus is on enterprise subjects that will be needed to support comprehensive decision making.

Emphasis on creating design that is consistent among subject areas.

Implementation is of a data mart.

Uses ERD for modeling.

Relies on comprehensive blueprint for interrelation of data.

Interrelated Data Marts

(Ralph Kimball)

Focus is on business subject area for data warehouse.

Emphasis on creating simple design that can be implemented quickly.

Implementation is of a data mart.

Uses “dimensional model” for modeling. Kind of like an ERD with UML-type aspects.

Relies on consistent interrelation of data by integration of existing data models.

Two general approaches to design


Compare contrast approaches
Compare/Contrast Approaches

  • Similarities:

    • Both focus on subject areas for development of data model.

    • Both require extensive input from data warehouse stakeholders.

    • Both produce a subject-oriented, non-volatile, time-related data warehouse.

    • Both try to quickly implement a prototype data mart.

  • Differences:

    • Inmon creates a more integrated and consistent data warehouse by attempting to design an enterprise-wide warehouse at the beginning of the first data warehouse project. This is called a “reconciled” DW design.

    • Kimball relies on future project teams referencing existing data warehouse models for new projects.


What do both approaches yield
What do both approaches yield?

  • A design for a data mart.

  • The design for a data mart is based on the concept of a data warehouse “cube.”

  • A cube is a logical construct containing a “fact” table that is accessed on multiple “dimension” tables.

  • A fact table contains values that a manager uses to make decisions.

  • A dimension table is used as a reference for the values in the fact table.


Process of data warehouse design
Process of data warehouse design

  • Identify the stakeholders that need data to support their decisions.

  • Define and describe the data needs of those stakeholders.

  • Define the subject area.

  • Choose (EDW and data mart) or just data mart, or some combination thereof.

  • Select the data of interest. May be internal, external. May be purchased. May be stored in a transaction database – may not. May be generated just for the data warehouse.

  • Identify the dimensions (master data/strong entities).

  • Add element of time.

  • Determine granularity level.

  • Identify the fact data.

  • Add derived data if necessary or desired.


How do you identify those people within an organization who require data to support their decision making processes?


Define and describe the data needs
Define and describe the data needs require data to support their decision making processes?

  • Usually termed “stakeholder analysis”.

  • Differing levels of decision making require differing sets of data.

    • Internal vs. external data.

    • Integrated vs. non-integrated data.

    • Detailed vs. summarized data.

  • Different stakeholders require different access mechanisms.

    • Online vs. reports.

    • Pre-formatted vs. ad-hoc availability of data.

  • Different stakeholders require different timing.

    • Online, real time vs. delay.

    • Relative size of delay/timeliness is always an issue.


Stakeholder analysis table example replica toys
Stakeholder Analysis Table Example – Replica Toys require data to support their decision making processes?


Define the subject area
Define the subject area require data to support their decision making processes?

  • Potential subject areas in common to many businesses:

    • Customers: people and organizations who acquire and/or use the company’s products.

    • Equipment: Machinery, devices, tools and their components.

    • Facilities: Real estate and their components.

    • Sales: Transactions that move a product from company to a customer.

    • Suppliers: Entities that provide a company with goods and services.

    • Products: Goods and services that the company, or its competitors, provide to customers.

    • Materials: Goods and services that the company uses to produce its products.

    • Financials: Information about money that is received, retained, expended, invested or in any way tracked by the company.

    • Human resources: Individuals who perform work for the company – may be employees, contracts, or simply positions.


Select the data of interest
Select the data of interest require data to support their decision making processes?

  • Use the existing transaction database model.

  • Identify and understand the necessary business decisions.

  • Identify external data that could help support decisions.

  • Use tables to help sort available attributes.



Transform operational data to dw
Transform operational data to DW years?

  • Transient vs. Periodic Data

    • Transient: Data in which changes to existing records are written over previous records, thus destroying the previous data content. (Type 1 change)

      • Most transaction systems are based on transient data.

      • Most data warehouses avoid transient data.

    • Periodic: Data that are never physically altered or deleted once they have been added to the data store. (Type 2 change)

      • Most data warehouses are based on periodic data.


Data warehouse periodic data
Data warehouse Periodic Data years?

  • Fact vs. dimension

    • A “fact” is a numeric measure.

      • Replica example: A registration is a “fact” along with the price that was paid for the purchase that spawned the registration.

      • Facts are “weak entities”

      • Facts are usually transactions

    • A “dimension” is reference information that relates to the fact.

      • Replica examples: customer, product model, feature, place of purchase.

      • Dimensions are “strong entities”

      • Dimensions are also considered the “master data” of an organization


Dimensions are different in dw land
Dimensions are different in DW-land years?

  • Slowly changing dimension: Dimension will change values over time. How to maintain knowledge of the past

  • Approaches:

    • Type 1: just replace old data with new (lose historical data)

    • Type 2: for each changing attribute, create a current value field and several old-valued fields (multivalued)

    • Type 3: create a new dimension table row each time the dimension object changes, with all dimension characteristics at the time of change. Most common approach.


Other dimensional issues
Other dimensional issues years?

  • Degenerative dimension: A dimension that has no interesting dimension attributes (e.g. serial number)

  • Multi-valued dimension: A dimension that needs to be qualified by a set of values (e.g. feature)

    • May have a related hierarchy

    • Example: group-> category -> family -> product




Conformed dimensions for growth
Conformed Dimensions for growth years?

  • Conformed dimension: One or more dimension tables associated with two or more fact tables. Dimensions must have the same meaning for all related fact tables. Very hard to achieve without good planning.

  • Goal of any data warehouse is to plan the dimensions so that they span business processes/decision areas.

  • Enhances consistency of facts.

  • Allows integration of diverse systems.

  • Helps a designer to create data warehouse systems incrementally.



Time is a dimension
Time is a dimension years?

  • Data warehouse is a historical model rather than a current “point in time” model.

  • Must have a way to incorporate changes that occur over time.

  • Important issues:

    • Fact table must include a time component.

    • Ranges of time vs. effective period in time

    • Time also relates to dimension tables

    • May have to deal with differing time periods. Examples are fiscal years, “holiday rush,” billing cycle, etc.



Fact tables
Fact tables years?

  • Measures:

    • Sale

    • Flag

    • Quantity

  • Can have a “factless” fact table


Determine granularity level
Determine granularity level years?

  • What are the benefits and drawbacks of a low level of granularity?

  • What are the benefits and drawbacks of a high level of granularity?

  • What factors should be considered when determining the level of granularity in the data warehouse?


Might have to derive facts
Might have to “derive” facts years?

  • Derived data includes any kind of calculated field.

  • Usually derive facts when there will be an overwhelming amount of data if not derived.

  • Examples: total sales; net sales amount; total funds raised; total cost of products.

  • Issues:

    • Must be identified, defined and agreed upon by data warehouse stakeholders.

    • Must be documented in metadata.

    • Must be consistent.


ad