1 / 23

Business Intelligence Fundamentals: Data Loading

Business Intelligence Fundamentals: Data Loading. Ola Ekdahl IT Mentors. Dimension Table Load Fact Table Load Working with SCD’s SSIS SCD Task Load Data using Staging Tables Top 10 Best Practices. Agenda. Encapsulates the data flow engine. Data Flow Task. Extract. Transform. Load.

Download Presentation

Business Intelligence Fundamentals: Data Loading

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors

  2. Data Loading Dimension Table Load Fact Table Load Working with SCD’s SSIS SCD Task Load Data using Staging Tables Top 10 Best Practices Agenda

  3. Data Loading Encapsulates the data flow engine Data Flow Task Extract Transform Load

  4. Dimension source Data Loading Populating Dimension Tables Transform Correlaterecords New record? Y N Type 1change? Y Update changedcolumn(s) N Type 2change? Y Expire existingrecord Insert newrecord

  5. Factsource Data Loading Populating Fact Tables Transform Lookupdimension key Repeat for each dimension key Lookup failed? Y Insert newdimension record N Insert newrecord

  6. Data Loading • The wizard-based configuration promotes rapid ETL development • Supports • Type 0 (Fixed Attribute) • Type 1 (Changing Attribute) • Type 2 (Historical Attribute) • Inferred member management • Automatically constructs the downstream data flow • Handles the majority of slowly changing dimension scenarios Slowly Changing Dimension Transformation

  7. Dimension Source Data Loading Populating Dimension Tables Transform Correlaterecords New record? Y N Type 1change? Y Update changedcolumn(s) N Type 2change? Y Expire existingrecord Insert newrecord

  8. Data Cleansing

  9. Data Loading Existing record is updated History is not preserved SCD Type 1 LastName update to Valdez-Smythe

  10. Data Loading Existing record is ‘expired’ and new record inserted History is preserved Most common form of Slowly Changing Dimension SCD Type 2 SalesTerritoryKey update to 10

  11. Data Loading Select the target dimension table Configure the relationship between the source data and the dimension table Configuring the SCD Transformation Step 1 The relationship is established with the business key stored in the dimension table

  12. Data Loading • Select the columns and their change type: • Fixed (Type 0) • Changing (Type 1) • Historical (Type 2) Configuring the SCD Transformation Step 2

  13. Data Loading Configure the behavior if Fixed attributes change Configure whether Changing attributes should update the current record or all matching records Configuring the SCD Transformation Step 3

  14. Data Loading • Configure how Historical attributes identify current and expired records: • Single Boolean column, or • Start and End date columns Configuring the SCD Transformation Step 4 It is best practice to store Start and End dates

  15. Data Loading • If inferred members are stored in the dimension table, define how they are identified: • When all columns with a change type are null, or • By a single Boolean column Configuring the SCD Transformation Step 5

  16. Data Loading The Slowly Changing Dimension Transformation Based on your configuration, the wizard completes the downstream data flow

  17. Data Cleansing

  18. Data Loading Create a staging table that matches the logical structure of the target table. Load the bulk data into this table. This loading can be much faster (compared to loading directly into the target table) because the staging table has no indexes or constraints on it. More importantly, while the new data is being loaded, the existing data is fully available for all transactions without any impact, because the data load is taking place on a separate staging table. Create constraints and indexes on the staging table that are equivalent to those that exist on the target table. While you create constraints and indexes on the staging table, the existing data is fully available for all transactions without any impact, because this is taking place on the staging table. Execute the ALTER TABLE … SWITCH statement to move the data from the staging table to an empty partition of the target table. This is a metadata-only operation, and is very fast (usually under a second) when there are no long transactions on the target table. This way, all the loading and indexing happens outside the main table, and then the data quickly moves into the main table. One important thing to note here is that the ALTER TABLE … SWITCH operation requires a schema modification (Sch-M) lock on the table. Long running transactions can block the switch operation from acquiring the Sch-M lock and make it wait Staging Tables

  19. Data Loading In data warehouse applications, it is very common to partition data on the date dimension. This helps in periodically archiving or deleting the old data as new data comes in, with minimal effect on performance and availability. Depending on data volume, database designers pick yearly, quarterly, monthly, weekly, daily or even hourly partitions. One way to partition a table is to create one partition for each incremental load. For example, if you load data on a daily basis, create one partition for each day. By using this partitioning approach, you can use the technique described earlier in this paper (load and index data in a staging table, and then use ALTER TABLE … SWITCH) very efficiently. One partition is added for every incremental load, and the number of partitions increases over time. Partitions

  20. Data Loading • Query processing improvements • Partition-aware seeks • Parallel queryplan strategies • Partition-aligned indexed views • Switched togetherwith the partition • Easy-to-switchpartitions Optimize PerformancePartitioning Detail Data Day level Indexed View Month, Year level P1 Agg(P1) P2 Agg(P2) P3 Agg(P3) Switch new partition New Partition Aggregates for New Partition

  21. Data Cleansing

  22. Data Loading SQL Customer Advisory Team www.sqlcat.com Working With Partitions http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/loading_bulk_data_partitioned_table.mspx Thank You

  23. More recordings available at:

More Related