populating data warehouse structures l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Populating Data Warehouse Structures PowerPoint Presentation
Download Presentation
Populating Data Warehouse Structures

Loading in 2 Seconds...

play fullscreen
1 / 25

Populating Data Warehouse Structures - PowerPoint PPT Presentation


  • 469 Views
  • Uploaded on

Populating Data Warehouse Structures. Examining the Star Schema. Sales Star Schema. Fact Table. Dimension Tables. Dimension Table. Implementing the Star Schema. 1. Extract Data From Multiple Sources 2. Integrate, Transform, and Restructure Data

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Populating Data Warehouse Structures' - adamdaniel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
examining the star schema
Examining the Star Schema

Sales Star Schema

Fact Table

Dimension

Tables

Dimension Table

implementing the star schema
Implementing the Star Schema

1. Extract Data From Multiple Sources

2. Integrate, Transform, and Restructure Data

3. Load Data Into Dimension Tables and Fact Tables

the star schema data load

External

Files

Internal

Files

External

Files

The Star Schema Data Load

Heterogeneous

Data Sources

Polaris

Data Warehouse

Northwind

OLTP

Staging Area

Sales

Star

DTS

DTS

Financial

Inventory

Star

Extracting Data From Transforming Loading the Heterogeneous Sources Data Star Schema

DTS

DTS

verifying the dimension source data
Verifying the Dimension Source Data

Verifying Accuracy of Source Data

  • Integrating data from multiple sources
  • Applying business rules
  • Checking structural requirements

Correcting Invalid Data

  • Transforming data
  • Reassigning data values

Managing Invalid Data

  • Rejecting invalid data
  • Saving invalid data to a log
dimension data load examples

buyer_name

reg_id

Barr, Adam

2

Chai, Sean

4

Smith, Jane

2

Paper, Anne

4

buyer_code

buyer_code

buyer_first

buyer_last

buyer_last

buyer_last

reg_id

reg_id

reg_id

buyer_name

reg_id

U999

Adam

Barr

Barr

Barr

2

2

2

Barr, Adam

A123

2

A123

Sean

Chai

Chai

Chai

4

4

4

Chai, Sean

B456

4

Erin

B456

O’Melia

O’Melia

O’Melia

6

6

6

O’Melia, Erin

...

6

...

...

...

...

...

...

...

...

...

...

buyer_name

reg_id

Barr, Adam

II

Chai, Sean

IV

buyer_name

reg_id

Smith, Jane

2

Paper, Anne

4

Dimension Data Load Examples:

DTS

DTS

DTS

maintaining integrity of the dimension
Maintaining Integrity of the Dimension
  • Assigning a Surrogate Key to Each Record
    • Defines the dimension’s primary key
    • Relates to the foreign key fields of the fact table
  • Loading One Record Per Application Key
    • Maintains uniqueness in the dimension
    • Depends on how you manage changing dimension data
    • Maintains integrity of the fact table
managing changing dimension data
Managing Changing Dimension Data
  • Dimensions with Changing Column Values
    • Inserts of new data
    • Updates of existing data
  • Slowly-Changing Dimension Design Solutions
    • Type 1: Overwrite the dimension record
    • Type 2: Write another dimension record
    • Type 3: Add attributes to the dimension record
type 1 overwriting the dimension slide

Product Dimension

After

product key

product name

product size

product package

product dept

product cat

product subcat

...

001

Rice Puffs

10 oz.

Bag

Grocery

Dry Goods

Snacks

...

Before

12 oz.

001

Rice Puffs

12 Oz

Bag

Grocery

Dry Goods

Snacks

...

Existing record

is changed

Type 1: Overwriting the Dimension Slide
type 2 writing another dimension record

Product Dimension

Before

product key

product name

product size

product package

product dept

product cat

product subcat

effective_date

001

Rice Puffs

10 oz.

Bag

Grocery

Dry Goods

Snacks

05-01-1995

...

001

Rice Puffs

10 Oz

Bag

Grocery

Dry Goods

Snacks

05-01-1995

...

Type 2: Writing Another Dimension Record

After

Rice Puffs

12 Oz

Bag

Grocery

Dry Goods

Snacks

10-15-1998

...

731

12 oz.

10 oz.

Adds a new record

type 3 adding attributes in the dimension record

Before

After

Product Dimension

001

Rice Puffs

10 Oz

Bag

Grocery

Dry Goods

Snacks

05-01-1995

11 Oz

03-20-1994

(null)

(null)

...

001

Rice Puffs

12 oz.

Bag

Grocery

Dry Goods

Snacks

10-15-1998

10 oz.

05-01-1995

11 Oz

03-20-1994

...

product key

product name

product size

product package

product dept

product cat

product subcat

current product size date

previous product size

previous product size date

2nd previous product size

2nd previous product size date

...

12 oz

10 oz.

product size

10-15-1998

previous product size

previous product size date

11 oz.

03-20-1994

05-01-1995

11 oz.

03-20-1994

Additional information is storedin an existing record

Type 3: Adding Attributes in the Dimension Record
verifying the fact table source data
Verifying the Fact Table Source Data

Verifying Accuracy of Source Data

  • Integrating data from multiple sources
  • Applying business rules
  • Checking structural requirements

Correcting Invalid Data

  • Transforming data
  • Reassigning data values

Managing Invalid Data

  • Rejecting invalid data
  • Saving invalid data to a log
assigning foreign keys

time_dim

134 1/1/2000

Assigning Foreign Keys

DimensionTables

Source Data

customer_dim

customer id

product id

order date

quantity_sales

amount_sales

201 ALFI Alfreds

ALFI

123

123

1/1/2000

1/1/2000

400

400

10,789

10,789

product_dim

25 123 Chai

Sales Fact Data

cust_key

prod_key

time_key

quantity_sales

amount_sales

201

123

25

1/1/2000

134

400

400

10,789

defining measures

product_id

9GZ

1KJ

0ZA

customer_key

product_key

qty

...

100

512

32

238

207

48

437

338

9

customer_id

price

qty

total_sales

...

...

...

VINET

.55

32

17.60

ALFI

1.10

48

52.80

HANAR

.98

9

8.82

...

...

...

...

Defining Measures
  • Loading Measures from the Source System
  • Calculating Additional Measures

Source System Data

Fact Table Data

maintaining data integrity
Maintaining Data Integrity
  • Adhering to the Fact Table Grain
    • A fact table can only have one grain
    • You must load a fact table with data at the same level of detail as defined by the grain
  • Enforcing Column Constraints
    • NOT NULL constraints
    • FOREIGN KEY constraints
implementing staging tables
Implementing Staging Tables
  • Centralize and Integrate Source Data
  • Break Up Complex Data Transformations
  • Facilitate Error Recovery

market_stage

Staging Area

sales_stage

shipments_stage

inventory_stage

dts functionality
DTS Functionality
  • Accessing Heterogeneous Data Sources
  • Importing, Exporting, and Transforming Data
  • Creating Reusable Transformations and Functions
  • Automating Data Loads
  • Managing Metadata
  • Customizing and Extending Functionality
defining dts packages
Defining DTS Packages
  • Identifies Data Sources and Destinations
  • Defines Tasks or Actions
  • Implements Transformation Logic
  • Defines Order of Operations
identifying package components
Identifying Package Components
  • ConnectionsAccess Data Sources and Destinations
  • TasksDescribe Data Transformations or Functions
  • StepsDefine the Order of Task Operations or Workflow
  • Global Variables Store Data that Can Be Shared Across Tasks
creating packages
Creating Packages
  • Using the DTS Import / Export Wizard
    • Perform ad-hoc table and data transfers
    • Develop a prototype package
  • Using DTS Package Designer
    • Edit packages created with the DTS Import/Export Wizard
    • Create packages with a wide range of functionality
  • Programming DTS Applications
    • Directly access the functionality of the DTS Object Model
    • Requires Microsoft Visual Basic or Microsoft Visual C++
using dts to populate the sales star
Using DTS to Populate the Sales Star
  • Populating the Sales Star Dimensions
  • Populating the Sales Star Fact Table
populating the sales star dimensions
Populating the Sales Star Dimensions

product_dim

Product

Tab Delimited

Files

DTS

customer_dim

Northwind

OLTP

DTS

time_dim

SQL Server

Stored Procedure

DTS

populating the sales star fact table

product_dim

sales_stage

customer_dim

time_dim

Populating the Sales Star Fact Table

Sales Data

File

sales_stage

DTS

sales_fact

DTS

designing modular packages
Designing Modular Packages
  • Creating Modular Packages
    • Simplify complex workflows
    • Create more readable packages
    • Produce smaller packages that are easier to debug
  • Using Outer Packages
    • Execute multiple packages within a single package
    • Combine modular packages into logical workflows
    • Reuse modular packages in different workflows
    • Execute packages in parallel