1 / 44

Database Corruption

Database Corruption. Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software. Dealing With Database Corruption. Types Of Corruption. Corruption can be small or widespread User based corruption Internal system based corruption Physical

agnesgreen
Download Presentation

Database Corruption

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Corruption Be prepared, not scared. • Richard BanvilleFellow, OpenEdge DevelopmentProgress Software

  2. Dealing With Database Corruption

  3. Types Of Corruption • Corruption can be small or widespread • User based corruption • Internal system based corruption • Physical • Block level corruption • Hardware: Bad disk, memory, etc • Logical • Missing Data • Relational issues • Data access • Index issues

  4. Be Prepared • Modern Release (all facets of deployment) • Backups – perform regularly • Backup database AND application • Perform large backups with split mirrors • Run online backup with –Bp • TEST your backups with restore & access or hot stand-by • prorest – Validates data written successfully (not proper data written) • prorest –vf: Compares against original, but who wants to be down that long? • Use offsite storage • Run with AI enabled • Put AI files on separate disk/separate controller • AI management tool makes AI management easy prorest<db> -vp prorest<db> -vf

  5. Be Really Prepared • Keep hot standby • Continually roll forward AI files • OpenEdge Replication • Have a comprehensive recovery strategy • Audit changes • Plan for natural disasters • Plan for not so natural disasters • Document and test your recovery strategy • Educate at all levels of organization • Implement redundancy • Failover Clusters • Have a duplicate remote site

  6. Database Consistency Checking Seen these messages before? Index name incustomer for recid16991could not be deleted. Wrong key in index 10 for record 2010. Invalid size of an index entry.

  7. Database Consistency Checking Or how about these… Invalid RM block for area 10 rmdoins: pbk->free went negative dbkey4096 bkwrite: bktbldbk4096 not equal to bkbufdbk-1234 bkaddr called with negative blkaddr:  -1234

  8. Database Consistency Checking Stop shared memoryproblems before they happen • Memory overwrite protection • -MemCheck • Ensure block changes written to proper shm location Buffer 1 Buffer 2 Buffer 1 Buffer 2 Insert new key entry Oops! Miscalculation results in memory stomp of next block header.

  9. Database Consistency Checking Stop database corruption from becoming persistent • Physical block consistency checking • -DbCheck • Validates record and index blocks after each update operation • -AreaCheck “area name” • -Index Check “index name” • -TableCheck ‘table name” • Typically the result of a bug • Available for OLTP and roll forward

  10. Enabling Database Consistency Checking • Database startup parameter (-MemCheck, -DbCheck) • Managed via promon R&D, 4. Admin Functions • 8. Block level consistency check Current consistency check status: 1. -MemCheck: enabled 2. -DbCheck: enabled 3. –AreaCheck: disabled 4. -IndexCheck: disabled 5. -TableCheck: disabled Enter the option to enable/disable a consistency check:

  11. < 1% ~5% Database Consistency Checking Performance Impact • Memory checking: unnoticeable impact • Block level checking: still reasonable • On error, get .lg file to Progress Technical Support Current consistency check status: 1. -MemCheck: enabled 2. -DbCheck: enabled 3. –AreaCheck: disabled 4. -IndexCheck: disabled 5. -TableCheck: disabled Enter the option to enable/disable a consistency check:

  12. Identifying Problem Types and Reacting • There are many ways for data to get corrupted • Identifying corruption type • Key word association can help direct recovery effort • Understanding process can also help • Quickest way to recovery • Knowing the tools & which to use is key • Practice recovery efforts before needed • Let’s examine a few

  13. Index Issues • Index Messaging • Index <i> in <t> for recid <r> could not be deleted. (1422)  • Logical corruption: Missing entries or record not found • Index <i>, block <b>, element no. 1: bad compression size. (4423) • Physical corruption: Storage format of index is incorrect • How to proceed • Index • (ix, cx, ky) • Root block • B-tree • Key entry • Cursor

  14. Index Validation Tools proutil<db> -C idxcheck • Idxcheck online validation levels • Physical/Block corruption • Physical consistency • Logical/key entry corruption • Keys to records • Records to keys • Validate key order • Lock table option • New index rebuild may be faster!

  15. Index Validation & Repair Tools proutil<db> -C idxfix Index Fix Utility 1. Scan records for missing index entries.2. Scan indexes for invalid index entries.3. Both 1 and 2 above.4. Cross-reference check of multiple indexes for a table.5. Build indexes from existing indexes.6. Delete one record and it's index entries.7. Quit. Select one of the following: All (a/A) - Fix all the indexes Some (s/S) - Fix only some of the indexesBy Area (r/R) - Fix indexes in selected areasBy Schema (c/C) - Fix indexes by schema ownersBy Table (t/T) - Fix indexes in selected tablesBy Activation (v/V) - Fix selected active or inactive indexes Fix indexes on Scan. Is this correct? (y/n)

  16. Index Validation & Repair Tools proutil<db> -C idxfix • Online operation • Transactions are relatively small • Does not fix physical block corruption • One concurrent idxfix process per table Index Fix Utility 1. Scan records for missing index entries.2. Scan indexes for invalid index entries.3. Both 1 and 2 above.4. Cross-reference check of multiple indexes for a table.5. Build indexes from existing indexes.6. Delete one record and it's index entries.7. Quit.

  17. Using Index Fix: Record but no index entry • OLTP (.lg and screen): Index namein customerfor recid16691 could not be deleted. • 1. Scan records for missing index entries: Index 12 (customer, name): couldn't find key <RICHB> recid 16691. • Option #1: Add key entry to index • 1. Scan records for missing index entries. • Fix indexes on Scan. Yes • NOTE: 2. Scan indexes for invalid index entries. • Would NOT report an error! proutil<db> -C idxfix bbbb richb aaaa Field2 Field2 Field2 Field3 Field3 Field3 Field4 Field4 Field4 16689 16690 16691 11 12 10

  18. Using Index Fix: Record but no index entry • OLTP (.lg and screen): Index namein customerfor recid16691 could not be deleted. • 1. Scan records for missing index entries: Index 12 (customer, name): couldn't find key <RICHB> recid 16691. • Option #2: Delete record and its key entry in table’s other indexes • 6. Delete one record and it's index entries. • Type the recid to delete: 16691 • Type the area (number) for the recid(s): 8 • Look in the .st file to match area number and area name. proutil<db> -C idxfix richb bbbb aaaa Field2 Field2 Field2 Field3 Field3 Field3 Field4 Field4 Field4 16689 16690 Find first cust where recid(cust) = 16691. display cust 16691 11 12 10

  19. Using Index Fix: Record but no index entry • Often no runtime error reported. • 2. Scan indexes for invalid index entries: Index 12 (customer, name): found invalid key <RICHB> recid 16691. • Only option: remove invalid key entry • 2. Scan indexes for invalid index entries • Fix indexes on Scan. Yes • NOTE: 1. Scan records for missing index entries. • Would NOT report an error! proutil<db> -C idxfix 11 12 10

  20. Fixing Index Corruption (continued) • Missing key entries or record not found (logical corruption) • Index fix • Action based on record removal or index entry insert/delete • Index <i>, block <b>, element no. 1: bad compression size • Physical b-tree corruption • Must rebuild index to recover

  21. Index Repair Tools proutil<db> -C idxbuild • Offline utility • Performance improvements since 10.2b06 • Will repair: • Index block corruption (physical) • Orphan index blocks • Adds missing index entries • Assumes record data is correct • Flexible options (db, area, table, index) • Truncates existing BI file • Does not record idxbuild changes into BI file

  22. Index Build/Repair Tools proutil<db> -C idxactivate<i1> useindex<i2> • Builds and activates index • Online • One concurrent idxactivateprocess per table • Requires client schema re-cache • Transaction size based on “recs” parameter • Deactivate requires exclusive access • Repair logical and physical index corruption • Assumes valid record data • *** Static queries require recompile to consider new index

  23. Record Issues • Record Messaging • bffld: nxtfld: scan past last field. (16) Looking for field #5 but only 4 fields exist • Record continuation not found, fragment recid <r> area <a>. (10831) Pointer to next record fragment is invalid • How to proceed • Record • (rm, bf, rec) • recid • rowid • field Field1 Field2 Field3 Field4 Record Fragment 1

  24. Checking For Inconsistencies Online proutil<db> -C dbanalys |tabanalys • Reads record for statistics purposes • Physical Validation 5. Read or Validate Database Block(s) • Validation levels • 0: Block header info only • 1: Record header & record size • 2: Record overlap checking • Logical Validation w/schema 3. Record Validation 4. Record Version Validation dbtool<db>

  25. Record Repair Tools bffld: nxtfld: scan past last field. (16) • Online and multi-threaded 6. Record Fixup • Adds missing fields • Removes invalid “end-rec” indicator 6. Delete one record and it’s index entries dbtool<db> proutil<db> -C idxfix

  26. Record Repair Tools Record continuation not found, fragment recid <r> area <a>. 3. Remove Bad Record Fragment 14. Display Record Contents • Exclusive access • Truncate bi file Record Fragment 1 proutil<db> -C dbrpr

  27. More Record Repair Tools • Record continuation not found, fragment recid <r> area <a> Record Fragment 1 Warning: The use of dbrpr to fix problems in the database should be done with the assistance of Progress Technical Support.

  28. Dbrpr Record Fix-up Example – Last resort • Before you do anything: Validate current backup Options: proutil<db> -C dbrpr proutil<db> -C truncate bi • 1. Database Scan Menu • 2. Test One or More Indexes • 3. Remove Bad Record Fragment • 4. Dump Block • 5. Load Block • 6. Copy Bytes Between Files • 7. Load RM Dump File • 8. Reformat Block to a Free Block • 9. Change Current Working Area • 10. Display the Free Chain • 11. Display the RM Chain • 12. Display the Index Delete Chain • 13. Display Block Contents • 14. Display Record Contents • 15. Display Cluster Chain • 16. Scan/Fix block checksum

  29. Dbrpr Record Fix-up Example – Last resort Record continuation not found, fragment recid16896 area 8 3. • Before you do anything: Validate current backup • Validate bad record info 1. Database Scan Menu proutil<db> -C truncate bi proutil<db> -C dbrpr • 1. Report Bad Blocks • 3. Fix Bad Blocks • 4. Report Bad Records • 5. Delete Bad Records • 6. Dump Records to RM File • 7. Rebuild Free Chain • 8. Rebuild RM Chain • 9. Rebuild Index Delete Chain • 10. Change Current Working Area • 11. Fix Cluster Chains in Type II Area

  30. Dbrpr Record Fix-up Example – Last resort Record continuation not found, fragment recid16896 area 8 3. • Get a view of what you are going to delete: • 9. Change Current Working Area • 13. Display Block Contents • 1. Dump Data Block Details • 6. Start Dbkey • Delete partial record • 9. Change Current Working Area • 3. Remove Bad Record Fragment • Re-validate (see previous screen) proutil<db> -C truncate bi proutil<db> -C dbrpr

  31. Other Record Oriented Repair Tools proutil<db> -C dump <table> . -index <i> • Binary dump • Online & multi-threaded • Binary record format • May not fix individual record corruption • May fail when encountering physical corruption • Use selective binary dump to dump in ranges • -index defaults to primary index • Use different index if primary cannot be used • Use –index 0 if no valid index exists (Type II storage area)

  32. Other Record Repair Mechanisms • Dump records in “PUB” schema by rowid • Manual Ascii dump and load “repair” • Reload w/bulk load or ABL import • Specify index to use or TABLE-SCAN DEFINE VARIABLE ix AS INTEGER NO-UNDO. FIND _file "item". OUTPUT TO item.d. DO ix = 1 TO 10000: FIND item WHERE RECID(item) = ix NO-ERROR. IF AVAILABLE item AND ix <> INTEGER(_file._template) THEN EXPORT item. END. Make sure Large enough!

  33. Block Issues • Block and shared memory buffer messages • Wrong dbkeyin block. Found <x>, should be <y> in area <z>. (1124) • Read, write, modify, release • Most often O/S File System issue • Reboot often fixes this error – but why? • bkioWrite:UnknownO/S error during write, errno 2, fd <x>, len<y>, offset <z>, filename <s> database <t>. (14676) • Attempt to read block <n> which does not exist in area <a>. (201) • Often index rebuild will fix this error. (rebuild on area level) • bkio, bk, bm • Dbkey • Block • Buffer • Area • Extent

  34. Block Repair Tools • Checksum validation of dbkey <d> block type 4 in area <a> does not match data. Expected: <e> received <r>. (14410) 1. Report Bad Checksum 2. Fix Bad Checksum 16. Scan/Fix block checksum (Type II Area) • Ignore for free blocks (block type 4) • Validate database by other means prior to “fixing” • True corruption will require a database rebuild • dump and load • restore/roll forward • Master block: 1 • Record block: 3 • Free block: 4 • Index block: 5 proutil<db> -C dbrpr

  35. Block Chain Repair Tools RM chain count inconsistency. 20 Blocks indicated on record free chain (actually 5) RM block found not on RM chain, but flagged RM chain. RM block free chain link error <type> Block <number> with invalid chain type <number> on RM chain Free block marked on free chain but linked into RM chain   RM RM RM RM RM FREE RM RM RM RM RM RM RM RM RM

  36. Block Chain Repair Tools RM chain count inconsistency. RM block found not on RM chain, but flagged RM chain. <name> Block <number> with invalid chain type <number> on RM chain  1. Database Scan Menu 7. Rebuild Free Chain 8. Rebuild RM Chain 9. Rebuild Index Delete Chain 11. Fix Cluster Chains in Type II Area • Rebuild free chains/rm chains from dbrpr • Seek help from support proutil<db> -C dbrpr

  37. Recovery Manager Issues • Recovery Messages • ** The after-image file expected Tue Feb 26 16:47:29 2013. (832) ** Those dates don't match, so you have the wrong copy of one of them. (833) • Undo failed to reproduce the record in area <a> with rowid <r> and return code  -1. (10566) • Invalid block <x> for file <y>.a3, max is 1024 (2329) • How to proceed • Restore / roll forward • Switch to hot standby • Recovery (rl) • Redo • Undo • Retry • Before image • After image • ai, a<n> • bi, b<n> • Transaction (tm)

  38. Recovering From Recovery Failures I’ve got no backup & crash recovery won’t work? • Looks further back in BI. • Should no longer be needed but its worth a try! **** As a very last resort, force truncate • What are the side effects of skipping crash recovery? • -F: How bad could it be? • Dump and re-load into new database • Reconcile data contents and relationships after load • Backup & enable AI • Maintain hot standby proutil <db> -C truncate bi –G 120 proutil <db> -C truncate bi -F

  39. Structural Repair Those dates don't match, so you have the wrong copy of one of them. • Usually the result of an OS copy or move • Make sure all right pieces in place & .st file identifies them correctly • Does NOT repair corrupt database • Updates path names to those specified in .st file • Use “sparingly” • Does NOT repair corrupt database • Patches date mismatch & creates dummy extents • Use to recover what ever data remains when no backup exists prostrctrepair<db> <x>.st prostrctunlock<db> <x>.st

  40. Structural Repair rmx.db - Ooops! • Rebuild database “control area” (.db file) from .st file • Changes to control area are not logged • Cancelling a txn that changes control area may require builddb • May force re-base for OpenEdge Replication • Always have an up to date .st file prostrctbuilddb<db> <db>.st prostrctlist<db>

  41. Summary

  42. ? Questions

  43. October 6–9, 2013 • Boston #PRGS13 www.progress.com/exchange-pug Special low rate of $495 for PUG Challenge attendees with the code PUGAM And visit the Progress booth to learn more about the Progress App Dev Challenge!

More Related