1 / 19

What is ETL?

What is ETL?. ETL: Extract Transformation and Load The term is used to describe data migration or the data conversion process ETL may be part of the regularly repeated business process

chen
Download Presentation

What is ETL?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What is ETL? • ETL: Extract Transformation and Load • The term is used to describe data migration or the data conversion process • ETL may be part of the regularly repeated business process • The amount and complexity of data grew dramatically over the past decade so the ETL processes became more complex and demanding

  2. Phases of ETL Project • Requirements • Analysis • Design • Proof of Concept • Development • Testing • Execution • Verification

  3. Requirements • Scope of the data migration - what data is required in the target system and how is it going to be used? • Execution requirements – has to be within certain timeframe, sequence, geographic location, repeatability, acceptable system down time, etc. • Source data retention period, backup and restore requirements • Requirements should be made with this in mind: • Data is the company’s most valuable asset. • Consequences of corrupt data are usually very costly.

  4. Analysis • Understanding the source data • Data Dictionary that is used for designing ETL process has to be created • Mission critical task • Frequently underestimated (importance and time) • All available resource should be used to do analysis properly: • Available system documentation including Data Model and Data Dictionary • People • Reverse engineering

  5. Design • Choice of methodology • Choice of technology • Design Target Database • Design ETL process • Data Mapping Document • Map source data to the target database • Specifies transformation rules • Specifies generated data (not from source) • Design ETL verification process • Ensures that all requirements are addressed

  6. Proof of Concept • Helps to determine or estimate: • Feasibility of the concept • Development time • Performance, capacity and execution time • Requirements to be met • Gain knowledge about the technology • Code produced in this phase usually can be re-used during the development phase

  7. Development • Includes: • Produce code and processes as per Design and Data Mapping Document • Data verification scripts or programs as per Test Plan • Execution scripts as per Execution Plan • Unit testing – performed and documented by developers. • Typical Challenges: • Inadequate requirements and design documents • Developers unfamiliar with technology

  8. Testing • Ensures that requirements are met • Test Plan is highly recommended • Types of testing: • Functional, stress, load, integration, connectivity, regression • Challenges: • Automation and repeatability (testing and verification scripts) • Creation of the Test Data • Extracting small data sets from large data volumes • Confidential data may not be made available for testing

  9. Execution • Execution plan should include: • Sequence of tasks • Time of execution and expected duration of execution • Checkpoints and success criteria • Back out plan and continuation of business • Resources involved • For mission, critical system down time could be limited or even entirely unacceptable • Execution should be controlled and verified

  10. Verification • Confirms that the data migration was successful • Determined during the design phase • Various methodologies and technologies could be used • Automated verifications are highly recommended (driven by requirements)

  11. Why ETL Projects Fail? • Underestimate complexity of the project • Overlook or neglect phases of the project • Wrong choice of technology • Common misconceptions like • Expensive ETL tools will solve all problems • No or very little programming will be required • We don't need or we don't have time for plans, but we know exactly what we need to do

  12. Problems with Mainstream ETL Tools • Maintaining license and consultants is very expensive • Significant time required to learn • Usually require dedicated hardware • Cannot take advantage of the database vendor proprietary technologies that are optimized for the fastest data migration • Complex tasks very often require integration with other technologies • Very limited performance • Only small amount of provided functionality is actually required for ETL project • Very limited application for Data Analysis • Huge discrepancy between marketing promises and actual performance

  13. Malloc Inc. • During the 15 years in the IT Consulting business, a proprietary ETL methodology and technology was developed • Consists of two major modules: • Database Analyzer • G-DAO Framework • Major advantages: • Inexpensive, easier to learn and performs better than mainstream ETL Software • Any Java developer can master it and start using it within several days • It is proven and it works

  14. Database Analyzer • Produces ETL Data Analysis Reports in various formats • Major usage: • To analyze and understand source data and the database attributes • Create data mapping and transformation documents • Create a data dictionary • Suggests ways to improve database design • Valuable source of information for Business Analysis, Data Architects, Developers, and Database Administrators.

  15. Database Analyzer Reports • Intuitive, descriptive and easy to read • In HTML format • Can be imported and edited in major document editors such as MS Word

  16. Graphical User Interface • User friendly GUI Interface • It can also run in Batch mode for lengthy analysis (large data sources)

  17. G-DAO Framework • Java Code Generator • Eliminates huge legwork to develop the code required for the ETL Process • Uses analysis performed by Database Analyzer to produce the code optimized for a particular database • Code may be used for purposes other than ETL (any kind of database access and data manipulation) • Uses the advantage of an almost unlimited world of java libraries (no proprietary language and interfaces)

  18. G-DAO Framework Advantages • Data can flow directly from source to target (no need for intermittent storage into files) • XA transactions are supported for all major databases • Functionality is limited only by the limitation of the JDBC driver and Java language • Easy to learn and implement • No dedicated hardware required • Provides a platform for any kind of business application that requires data access

  19. Malloc Inc • Incorporated in 1993 in Toronto, Canada • Provided IT Consulting to: • Oracle Corporation • General Electric • Citibank • Royal Bank • Bank of Montreal • The Prudential • Standard Life • and many more • Most recent implementations of Database Analyzer and G-DAO Framework • Citibank • Royal Bank of Canada • http://www.mallocinc.com

More Related