240 likes | 597 Views
Transportation: Loading Warehouse Data. Chapter 12. Transporting data into the Warehouse. Loading moves the data into the warehouse Loading can be time-consuming: - Consider the load window. - Schedule the task; automates all processes.
E N D
Transportation: Loading Warehouse Data Chapter 12
Transporting data into the Warehouse • Loading moves the data into the warehouse • Loading can be time-consuming: - Consider the load window. - Schedule the task; automates all processes. • Initial load moves large volumes • Subsequent refresh moves smaller volumes • Business determines the cycle
Extract Processing Environment • After each time interval, build a new database • Run queries Operational database T1 T2 T3
Warehouse Processing Environment • Build a new database • After each time interval, add changes to database • Archive or pure oldest data • Run queries Operational database T1 T2 T3
First-Time Load • Single event that populates the data with historical data • Involves large volume of data • Employs distinct ETT tasks • Involves large amounts of processing after load Operational database T1 T2 T3
Refresh • Performed according to a business cycle • Simple task • Less data to load than first-time load • Less-complex ETT • Smaller amounts of postload processing Operational database T1 T2 T3
Building the Transportation Process Specification • Techniques and tools • File transfer methods • The load window • Time window for other tasks • First-time and refresh cycle • Connectivity bandwidth
Building the Transportation Process • Test the proposed techniques • Document proposed load • Gain agreement on the process • Monitor • Review • Revise
Granularity Low-level grain - Expensive, high level of processing, more disk, detail High-level grain - Cheaper, less processing, less disk, little detail • Important design and operational issue • Space requirements - Storage - Backup - Recovery - Load
Transportation Techniques • Tools • Utilities and 3GL • Gateways • Customized copy programs • Replication • FTP • Manual
Transportation Technique Considerations • Tools are comprehensive but costly. • Data-movement utilities are fast and powerful. • Gateways are not always the fastest method: - Access other databases - Supply dependent data marts - Support a distributed environment - Provide real-time access if needed
Using SQL*Loader to Load Data Control file Input files • Fastest load mechanism • Direct path • Parallel and unrecoverable • Direct-load INSERT (Oracle8) • Direct-path load API (Oracle8i) SQL*Loader Bad files Log files Discard files
Direct-Path Load API in Oracle8i • Allows ETT and other tools to load Oracle databases efficiently • Permits load behavior to be customized • Gives direct-path load performance • Provided complete access to all direct-load functionality using OCI
More Transportation Technique Considerations • Use customized programs as a last resort • Replication is limited by data-transfer rates
Postprocessing of Loaded Data Transform Extract Transport Create Indexes Generate keys Postprocessing of loaded data Summarize Filter
Indexing Data • Before load: fast index reenablement • During load: adds time to load window • After load: adds time to load window Index Operational databases Staging files Warehouse database
Unique Indexes • Disable constraints load • Enable constraints to create index Disable constraints Load data Enable constraints Create index Catch errors Reprocess
Creating Artificial Keys • Use generalized or derived keys • Maintain the uniqueness of a row • Use an administrative process to assign the key • Concatenate operational key with number: - Easy to maintain - Cumbersome keys - No clean value for retrieval
Creating Unique Keys for Records • Assign a number from a list: - No semantic meaning - Extract operations must reference table assign numbers Update metadata • Verdict 109908 1
Creating Summary Tables • CTAS • pCTAS Summary data Warehouse Data marts
Verifying Data Integrity • Load data into intermediate file • Compare target flash totals with totals before load Flash totals Counts and amounts File 1 Load Load File 1 File 2 File 2 Preserve, inspect, fix, then load Intermediate file Warehouse
Standard Quality Assurance Checks • Load status • Completion of the process • Completeness of the data • Data reconciliation • Violations • Reprocessing • Comparison of counts and amounts
Summary This lesson discussed the following topics: • First-time load considerations • Techniques for transporting data • Tasks involved in the postload processing stage