1 / 17

Best Practices in Loading Large Datasets

SQL Server Sri Lanka User Group Meeting Oct 2013. Best Practices in Loading Large Datasets. Asanka Padmakumara ( BSc,MCTS ). Agenda. Definition of “Large Dataset” Database and table structures for large data sets. Loading data using T-SQL Loading data using SSIS Q & A. A Large Dataset….

zoie
Download Presentation

Best Practices in Loading Large Datasets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SQL Server Sri Lanka User Group Meeting Oct 2013 Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS)

  2. Agenda • Definition of “Large Dataset” • Database and table structures for large data sets. • Loading data using T-SQL • Loading data using SSIS • Q & A

  3. A Large Dataset…. • A Collection of data sets: • large • complex • difficult to process~ Wikipedia • Large data set to you is depend on your hardware configuration.

  4. A Large table…. • A large table is one that does not perform as desired or one where the maintenance costs have gone beyond pre-defined maintenance periods. • if one user’s activities significantly affect another or if maintenance operations affect other user’s abilities. In effect, this even limits availability. ~Microsoft White Paper (Partitioned Tables and Indexes)

  5. Database and table structures • Try to have multiple file group across multiple disks. • Can backup only the file group. • To get Performance • Data on a separate I/O path. • Index on a separate I/O path • Partition the tables and Indexes • Partition means Break large table into parts. • Easy to insert-Switch partition in to table • Easy to delete- Move partition from table • Can rebuild index only of the partition

  6. Database and table structures

  7. Database and table structures • Have Clustered index for the most using column. • Have non Clustered indexes for other most using columns. • Try to have correct data types. • Int, smallint makes a different when no of rows are high.

  8. Loading Data with T-SQL • TSQL Commands and Utilities • BCP • BULK INSERT • OPENROWSET

  9. Loading Data with T-SQL • BCP • bulk copy is an utility program: bcp.exe • copies data between Microsoft SQL Server and a data file in a user-specified format • Can generate format file for data. • performance is improved if the data being imported is sorted according to the clustered index on the table • IN – Insert to table, Out- Export to file • Syntax: bcp AdventureWorks2012.sales.CurrencyRate out F:\DemoData\Currency.dat –c –usa –SSLLAPTOP266\SQL2012

  10. Loading Data with T-SQL • Bulk Insert • Transact-SQL statement.faster than the BCP utility • Unlike BCP, Can’t export data to files. Only insert • you can specify up to 1024 fields only, If more than that use BCP. • Syntax : BULK INSERT AdventureWorks2012.Sales.SalesOrderDetail FROM 'f:\orders\lineitem.tbl’

  11. Loading Data with T-SQL • OPENROWSET • Alternative to accessing tables in a linked server and is a one-time • SELECT a.* FROM OPENROWSET('SQLNCLI', 'Server=Seattle1;Trusted_Connection=yes;','SELECT Name FROM Department AS a;) • Use BULK keyword for use OPENROWSETfor bulk loading • SELECT a.* FROM OPENROWSET( BULK 'c:\testvalues.txt',FORMATFILE = 'c:\test\values.fmt') AS a; • IGNORE_CONSTRAINTS, IGNORE_TRIGGERS

  12. Loading Data with T-SQL • Considerations…. • Disable Index • Alter index [IXYourIndex] ON YourTable DISABLE • Disable constrain Do not disable clustered index. Table become read-only. • If you disable it have to rebuild index. • Enable Index • ALTER INDEX [IXYourIndex] ON YourTable REBUILD

  13. Deleting Data with T-SQL • Truncate instead of delete.(if possible) • (if not) Move the Partition to new table, then truncate it.-New table should be in same file group • Delete batch by batch avoid growing log file • delete is a single auto-commit transaction • Disable Triggers • Simple recovery mode to minimize growing log

  14. Loading data using SSIS • Buffer size • Can increase buffer size of dataflow task • Default size=10MB, can set up to=100MB • Cachelookup • If same table is looked up more than once inside SSIS package, cache that lookup • Vender specific providers will give you more performance. • Ex: Oracle provider instead of Microsoft Oledb

  15. Loading data using SSIS • Use raw file instead on temp table when using staging table • Avoid using Slowly changing dimension control. • Use merge statement instead of SCD control • Use parallel flow inside dataflow task to multi threading • Minimize usage of blocking ,Partiallyblocking transformations • Ex: Aggregate, Sort, Merge, Merge Join, and Union All

  16. Demo • Demo # 1 • Delete data from partitioned table. • Demo # 2 • Import/Export data using BCP

  17. Q & A

More Related