The ABCs of SSIS!. Glenda Gable Email: firstname.lastname@example.org LinkedIn: linkedin.com/in/tig313. Agenda.
This presentation shows the basics of SSIS to help with automating database tasks, such as maintenance, importing and exporting data, or ETL transactions. The discussion will start with an understanding of when SSIS should be used vs. when a database object, such as a view or stored procedure, should be used. You will also see how to create a basic package, how to use the built-in logging and configurations, and a view of a decent sized ETL used for creating a data warehouse for BI. Lastly, we will talk about the different ways SSIS is used, based on the role of the person using it and the importance of organizing the overall SSIS structure.
Microsoft Integration Services is a platform for building high performance data integration solutions, including extraction, transformation, and load (ETL) packages for data warehousing. Integration Services includes graphical tools and wizards for building and debugging packages; tasks for performing workflow functions such as FTP operations, executing SQL statements, and sending e-mail messages; data sources and destinations for extracting and loading data; transformations for cleaning, aggregating, merging, and copying data; a management service, the Integration Services service for administering package execution and storage; and application programming interfaces (APIs) for programming the Integration Services object model.
Origins and How to Develop in SSIS:
For SQL Server 2000, SSIS didn’t exist, instead, Data Transformation Services (DTS) was used. SSIS was introduced in SQL Server 2005. It isn’t just a new version of DTS, it was actually built from the ground up as a separate application. For SQL 2005, 2008 and 2008 R2, all development of SSIS packages happens in Business Intelligence Development Studio (BIDS), from within Visual Studio. For SQL 2012, SSIS development occurs in SQL Server Data Tools (SSDT).
SSIS uses server memory to do manipulation, whereas T-SQL uses the SQL engine. Some things are done much quicker and easier in T-SQL. For example, a JOIN statement in T-SQL is quicker than a lookup transformation in SSIS. However, there are ways of using T-SQL within SSIS to leverage both, but the performance should be compared to see what works best in your environment.
Hardware; SQL Server version
What does your hardware look like - can it support memory intensive transactions without impacting performance, or did your company spend more on disks? Also, which version are you using, for example SQL 2000 – SSIS doesn’t exist (psst... perfect excuse to upgrade if you can!).
Capabilities/Features ; Data Sources/Destinations
There are things that are much easier to do in either T-SQL or SSIS. Each has its own set of capabilities that should be leveraged. For example, importing files is much easier in SSIS. With this idea, knowing where the data is coming from and going into also helps decide which is a better option to use. With multiple data sources and types of sources, such as Oracle, XML, files, etc., SSIS is better equipped than T-SQL. Also, the complexity of the task can make an impact, T-SQL had advantages in this.
Just because SSIS has a graphical interface doesn’t mean it is always quicker to use for development, sometimes T-SQL is faster. For instance, if you have several tables to combine, it may take longer to drag and drop the sources, and configure the unions, than it would be to write the full select statement in T-SQL. Also, think about the likelihood that when an upgrade happens to SQL Server, things will need to be “fixed”. An example of this is when T-SQL code needs to be changed because of features being removed.
In SSIS, the C# script task allows soooo much more to be accomplished, both database and non-database related tasks, that it is a big consideration to make. T-SQL is limited to database tasks only. Another option within SSIS is built in logging features, which T-SQL doesn’t have.
Lastly, what is your environment going to be able to work with most comfortably?
Does your company have certain standards or best practices that will dictate which tool is to be used?
What do you feel comfortable receiving from a co-worker with no verbal instructions at all? This can be a scary thing to think about – how many developers/administrators comment or document as well as they should? It’s almost like documentation and testing are bad words in our industry.
Everyone has had something dropped in their laps that they didn’t know how to do (if you havent – you will – lol). If everyone is using something that is commonly known, it takes some of the pressure off when tasks get assigned with an URGENT deadline.
All in all… each situation should warrant a review of what is best to use. I tried to show some advantages of each, and possible thoughts to follow, without showing preferential treatment for one over the other.
The difference between the synchronous and asynchronous components are if the output requires a new buffer to be created, or if the existing buffer can be used.
Synchronous is faster, and MUCH preferred. However, most of the time a transformation is chosen due to its capabilities, not whether it is synchronous or not.
Slowly Changing Dimension
OLE DB Command
Data Mining Query
* Bolded components are used frequently
This is what I look like when I forget 1 small detail when working with SSIS!! I find I like having someone else to talk to, and I end up solving it myself half the time. Its ok to be frustrated
(at least that is what I keep telling myself)
This is an ETL that is what is used by my company. I started here in March, and have been making improvements here and there. I am in the planning stages for a major overhaul in organization of the packages and logging, then will be planning on optimizing the packages, starting with the low-hanging fruit first. This is not a display of best-practices, nor is it a scare tactic. Mostly, I just wanted to show how big an ETL process can be, for those not exposed to it before. Keep in mind, there are many ETLs that are MUCH bigger than this.
Todd McDermid - http://toddmcdermid.blogspot.com/2011/06/code-snippet-for-ssis-dataflow.html#!/2011/06/code-snippet-for-ssis-dataflow.html
James Serra - http://www.sqlservercentral.com/blogs/jamesserra/2011/08/29/when-to-use-t_2D00_sql-or-ssis-for-etl/
Vincent Rainardi - http://dwbi1.wordpress.com/2011/04/05/stored-procedure-vs-etl-tool/
Dan English - http://denglishbi.wordpress.com/2011/02/27/ssis-data-flow-vs-hand-coded-etl/
Martin Schoombee - http://did-i-say-that.com/2011/09/25/etl-for-data-warehouses-the-ssis-vs-t-sql-debate/