1 / 21

SQL Server Disaster Recovery 101

SQL Server Disaster Recovery 101. Dean Vitner dvitner@gmail.com. SQL Server UG Zagreb, 16.4.2014 . Transaction log. Every data modification is first written to tlog Before the transaction commits Delayed durability in 2014! Data pages are updated in memory

mahola
Download Presentation

SQL Server Disaster Recovery 101

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SQL Server Disaster Recovery 101 Dean Vitner dvitner@gmail.com SQL Server UG Zagreb, 16.4.2014.

  2. Transaction log • Every data modification is first written to tlog • Before the transaction commits • Delayed durability in 2014! • Data pages are updated in memory • CHECKPOINT writes dirty pages to disk • Both committed and uncommitted • Frequency depends on recovery interval • When TX rolls back, state of the data prior to update is read from tlog • Space in transaction log is being freed and reused • When exactly this happens, depends on recovery model used

  3. Transaction log and recovery • If SQL Server service shuts down, recovery will happen on the next startup • These can happen: • Dirty pages from uncommitted tx not written to disk • Dirty pages from committed tx written to disk • Dirty pages from uncommitted tx written to disk • Have to be rolled back (undo) • Dirty pages from committed tx not written to disk • Have to be rolled forward (redo) • Without the tlog, transactional consistency of the database cannot be guaranteed

  4. Recovery models • Three recovery models in SQL Server • SIMPLE recovery model • FULL • BULK-LOGGED • Model database is in FULL recovery model • If you want newly created databases be in SIMPLE, change recovery model for model database

  5. SIMPLE recovery model Full and differential backups only Transaction log is not backed up Space in tlog is freed when the data is committed and checkpointed to disk Eliminates the need for managing tlog space Restore only to the most recent backup Consider not using SIMPLE model on production system

  6. FULL recovery model • Full, differential, and log backups • Log backups are required • Space in tlog is freed when the data is committed and checkpointed to disk and the log is backed up • Provides full recoverability • Up to the point of failure • Point in time restore

  7. BULK-LOGGED recovery model • Same as the FULL model, but: • Bulk operation are only minimally logged • In general, only allocation and deallocation of extents is logged • Increased performance and lower log space consumption • Use FULL, but switch to BULK-LOGGED for bulk operations • Bulk insert and bcp • SELECT INTO • ALTER INDEX … REBUILD • etc

  8. Backup and restore • Full, differential, and transaction log backup • Log backup in full and bulk-logged model • Backup strategy • How much data I am permitted to lose? • How much time I am permitted to be down? • Use differential backups to minimize the restore time • Check your backups • If you can’t restore from backup.. • Use WITH CHECKSUM option • Backup chain • Use COPY_ONLY

  9. Full Backups • Creates starting point for all backups • Contains all used pages in db • Can run concurrently to transactions • ~2% processor overhead • Completely non-blocking backup • Transaction log • Database growth/shrink

  10. Full Backups - Operation Transaction Log Mark log Mark log Data changes Checkpoint Mark log Dump all used pages Mark log Copy portion of log between marks

  11. 00010 01110 00100 10001 01100 Bitmap Differential Backups • Contains only pages changed since last full backup • Differential bitmap • NOT an incremental backup

  12. TransactionLog • Backup the inactive portion of log • Committed transaction • Clears backed up portion on completion • Only allowed in full recovery or bulk_logged mode

  13. Partial Backups • Partial database • Primary filegroup • Read/write filegroups • Partial differential • Primary filegroup differential • Read/write filegroups differential

  14. 2014 backup and restore • Backup on-premise database to cloud BACKUP DATABASE AdventureWorks2012 TO URL = 'https://mystorageaccount.blob.core.windows.net/mycontainer/AdventureWorks2012.bak' WITH CREDENTIAL = 'mycredential' • Supports: • DATABASE, LOG, DIFFERENTIAL, COPY_ONLY, PARTIAL, COMPRESSION, CHECKSUM • Restore from cloud backup • SQL Server Managed Backup to Windows Azure • smart_admin • Backup encryption

  15. I/O errors • Three types • 823 (a hard I/O error) • Page can’t be read from disk • 824 (a soft I/O error) • Page was read, but is torn (inconsistent) • 825 (a read-retry error) • Page was read, but after retries • Logged in msdb..suspect_pages • Input into single-page restore operations

  16. PAGE_VERIFY • Page is composed of 16 512-byte sectors • It is possible to be not written entirely • Allows fast detection of problems • Three options • NONE • TORN_PAGE_DETECTION • Writes two control bits in each sector • CHECKSUM • Calculates and writes a checksum of data on the page • Written last, checked first • Failures cause 824 error

  17. DBCC CHECKDB • Checks the logical and physical integrity of all the objects in the specified database • DBCC CHECKDB db_name WITH ALL_ERRORMSGS, NO_INFOMSGS • How often? It depends. • DBCC CHECKDB could take a long time to run • Depends on • Size of database • Load • Number and type of corruptions • Run DBCC CHECKDB on restored database

  18. DBCC CHECKDB, contd • Did CHECKDB fail? • If it stops before completing successfully, something bad has happened that is preventing CHECKDB from running • This means there is no choice but to restore from a backup as CHECKDB cannot be forced to run (and hence repair) • Examples of fatal (to CHECKDB) errors • 7984 – 7988: corruption in critical system tables • 8967: invalid states within CHECKDB itself • 8930: corrupt metadata in the database such that CHECKDB could not run

  19. Are the corruptions only in non-clustered indexes? • If recommended repair level is REPAIR_REBUILD, then YES! • Otherwise, check all the index IDs in the errors – if they’re all greater than 1, then YES! • If YES, you *could* just rebuild the corrupt indexes • Depends on the error, and the size of the index • But, what caused the corruption? • If you just rebuild the indexes, the corruption will probably happen again (especially if caused by the IO subsystem) • Make sure you do root-cause analysis and take preventative measures

  20. Was there an un-repairable error found? • 8909, 8938, 8939 (page header corruption) errors where the type is ‘PFS’ • 8970 error: invalid data for the column type • 8992 error: CHECKCATALOG (metadata mismatch) error • Plus a few more obscure ones • E.g. an 8904 error (extent is allocated to two objects). This is usually repairable except in the case where the extent is marked as mixed and dedicated, and has pages allocated to multiple objects. • The repair is too complicated and/or destructive so is not attempted. • None of these can be automatically repaired

  21. Restore or repair? REPAIR_ALLOW_DATA_LOSS is destructive Lot of errors cannot be fixed automatically Restore is better if you have a good backup and proper backup strategy

More Related