1 / 24

A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse

A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse. Vikas Ranjan CSCI 693. Agenda. Introduction Extract, Transform and Load (ETL) Approach Strengths/ Weaknesses of ETL Extract, Load and Transform (E-LT) Approach Strengths/ Weaknesses of E-LT

jerry
Download Presentation

A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

  2. Agenda • Introduction • Extract, Transform and Load (ETL) Approach • Strengths/ Weaknesses of ETL • Extract, Load and Transform (E-LT) Approach • Strengths/ Weaknesses of E-LT • Experiments • Results • Conclusion/ Future Work • References • Q & A

  3. Introduction • All Business Intelligence applications are data-centric. • Large volumes of data is stored and processed along with its history in the Data Warehouse. • The Data is extracted from various heterogeneous source systems and transformed as per the business requirements. • Data is used for analytical purposes- future forecasting, profitability analysis, trend analysis etc. to drive the business.

  4. Extract, Transform and Load (ETL) • Traditional approach of loading data into data warehouses. • Data is first pulled/ pushed from various heterogeneous sources like ERP, CRM, RDBMS, Flat Files. • Business rules are applied on the data in the Staging Area. • Transformed data is loaded into target database. • Often designed backward, thus only the relevant data is fetched.

  5. Strengths/ Weaknesses of ETL Strengths • Can perform complex operations in single data flow diagrams. • Mostly designed backward, thus only relevant data is loaded. • Used for building real-time data warehouses. • Robust tools are available like Informatica, Ab Initio, Data Stage . Weaknesses • Data transformation step of ELT is performed by ETL engine, therefore increasing processing time. • Data is moved over the network twice. • Since it is developed backward, more effort for future redesign.

  6. Extract, Load and Transform (E-LT) • Newer Approach of loading into target data warehouse. • The Data is extracted from various sources same way it is done via ETL. • This extracted data is loaded directly into target data warehouse. • The transformations and complex business rules are applied by native SQL drivers. • The processing is done by database engine rather than ETL engine.

  7. Strengths/ Weaknesses of E-LT Strengths • Since all the data is available, the future changes can be easily incorporated. • Once the data is loaded on the target platform, all transformations are placed on the RDBMS engine. This reduces network congestion. • Provides optimal performance as no extra hardware needed. Weaknesses • E-LT suffers from a limited availability of tools like Informatica Pushdown, Data Integrator etc. • Does not work well for complex business cases. • E-LT cannot be used to design near real-time enterprise data warehouse.

  8. Tools and Data Set Software / Hardware • OS Platform: Sun M9000 Server (www.sun.com). • Relational Database Management System: Teradata V2R6.2.1 (www.teradata.com). • ETL/E-LT Tool: Informatica PC 8.6.1 Hot Fix 9 Advanced Edition (www.informatica.com). Data Set • All the data used is experiments is obtained from the test database a telecom company. • All the experiments were conducted using Informatica as ETL and E-LT tool.

  9. Experiment 1 ETL vs. E-LT (Full Pushdown): Informatica has recently introduced both ETL and E-LT capabilities in its Power Center tool Job Name: Tax _Write_Off Job • Same Job was developed using ETL approach as well as E-LT approach. • Informatica pushed down all the code processing to the RDBMS engine.

  10. Experiment1 contd. ETL Job: Informatica Server handled all the code processing and generated its own internal SQL

  11. Experiment 1 contd. E-LT Job ( Full Pushdown): In the full pushdown, Informatica Server pushed all the code to RDBMS engine and Informatica Server worked purely as an E-LT tool

  12. Experiment 1 Results Approximately Five Times more performance gain using E-LT Full Pushdown

  13. Experiment 2 ETL vs. E-LT (Target Pushdown): • Job Name: Tax_Extract_ETL_tst Job • Same job was developed using ETL approach as well as E-LT approach with Target Pushdown only • The Pushdown to database engine happened at Target database only

  14. Experiment 2 contd. ETL Job: Informatica Server handled all the code processing and generated its own internal SQL

  15. Experiment 2 contd. E-LT Job (Target Pushdown):Informatica Server pushed code processing on target database to RDBMS engine

  16. Experiment 2 Results No Performance Differences since the Pushdown was on Target Database only

  17. Experiment 3 ETL vs. E-LT (Source Pushdown): • Job Name: Tax_Extract_ETL Job • Same job was developed using ETL approach as well as E-LT approach with Source Pushdown only • The Pushdown to database engine happened at Source database only

  18. Experiment 3 contd. ETL Job:Informatica Server handled all the code processing and generated its own internal SQL

  19. Experiment 3 contd. E-LT Job (Source Pushdown): Informatica Server pushed down code processing on source side to RDBMS engine

  20. Experiments 3 Results • No Performance Differences as Pushdown was on Source Database only • Source Pushdown E-LT does not work with Teradata Sequence Generator

  21. Results • Significant performance gains were obtained using full pushdown E-LT over ETL approach. • No performance gains using both source and target pushdown of E-LT approach and ETL approach as both used ETL Server and Database Engine resources( memory and CPU). • Code changes were needed to redesign existing ETL jobs to use E-LT power of database engine. • E-LT does not work in building Real Time Data warehouses

  22. Conclusion and Future Work • ETL works well for very complex transformations and active data warehouses • E-LT works well for small and medium-sized data marts and when the source and target are on the same database platform only • Future: Building data warehouse solutions using hybrid approach (combination of ETL and E-LT processes) .

  23. References [1] A. Simitsis, P. Vassiliadis, T. Sellis “Optimizing ETL Processes in Data Warehouses,” in Proc. 21st International Conference on Data Engineering, 2005, (ICDE 2005), pp. 564-575. [2] G.X. Zhou, Q.S. Xie, Y. Hu, “E-LT Integration to Heterogeneous Data Information for SMEs Networking based on E-HUB,” in Proc. Fourth International Conference on Natural Computation, 2008, IEEE, pp. 212-216. [3] I. William, S. Derek, and N. Genia, DW 2.0: The Architecture for the Next Generation of Data Warehousing. Burlington, MA: Morgan Kaufman, 2008, pp. 215-229. [4] R. J. Davenport, September 2007. [Online] ETL vs. ELT: A Subjective View. InSource IT Consulting Ltd., U.K. Available at: http://www.insource.co.uk/pdf/ETL_ELT.pdf [5] L. Troy, C. Pydimukkala, How to Use PowerCenter with Teradata to Load andUnload Data, Informatica Corporation [Online], Available at: www.myinformatica.com .

  24. Questions ???

More Related