1 / 65

HANDLING FAILURES

HANDLING FAILURES. Warning. This is a first draft I welcome your corrections. One common objective. Maintaining database in a consistent state Means here maintaining the integrity of the data

dglisson
Download Presentation

HANDLING FAILURES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HANDLING FAILURES

  2. Warning • This is a first draft • I welcome your corrections

  3. One common objective • Maintaining database in a consistent state • Means here maintaining the integrity of the data • After a money transfer between two accounts, the amount debited from the fist account should be equal to the amount credited to the second account • Assuming no-fee transfer

  4. Two different problems • Handling outcomes of system failures: • Server crashes, power failures, … • Preventing inconsistencies resulting from concurrent queries/updates that interfere with each other • Next chapter

  5. Failure modes • Erroneous data entry: • Will impose constraints • Required range of values, … 10-digit phone numbers, ... • Will add triggers • Programs that execute when some condition occurs • Even more controls

  6. Failure modes • Media failures: • Disk failures • Complete • Irrecoverable read errors • Recovery • Use disk array redundancy (RAID) • Maintain an archive of DB • Replicate DB

  7. Failure modes • Catastrophic failure • Everything is lost • Recovery • Archive (if stored at another place) • Distributed replication • ...

  8. Failure modes • System failures • Power failures • Software errors • Could stop the system in the middle of a transaction • Need a recovery mechanism

  9. Transactions (I) • Any process that query and modify the database • Typically consist of multiple steps • Several of these steps may modify the DB • Main problem is partial execution of a transaction: • Money was taken from account A but not credited to account B

  10. Transactions (II) • Running the transaction again will rarely solve the problem • Would take the money from account A a second time • We need a mechanism allowing us to undo the effects of partially executed transactions • Roll back to safe previous state

  11. General organization • Uses a log • Transaction manager interacts with • Query processor • Log manager • Buffer manager • Recovery manager will interact with buffer manager

  12. Involved entities • The "elements" of the database: • Tables? • Tuples? • Best choice are disk blocks/pages.

  13. Correctness principle • If a transaction • executes in the absence of any other transactions or system errors, • starts with the DB in a consistent state, it will then leave the DB in a consistent state. We do not question the wisdomof authorized transactions

  14. The converse • Transactions are atomic: • Either executed as a whole or not at all • Partial executions are likely to leave the DB in an inconsistent state • Transactions that execute simultaneously are likely to leave the DB in an inconsistent state • Unless we take some precautions

  15. Primitive operations (I) • INPUT(X) • Read block containing data base element X and store it in a memory buffer • READ(X,t) • Copy value of element X to local variable t • May require an implicit INPUT(X)

  16. Primitive operations (II) • WRITE (X,t) • Copy value of local variable t to element X • May require an implicit INPUT(X) • OUTPUT(X) • Flush to disk the block containing X

  17. Example • Transaction T doubles the values of elements A and B: • A= A*2;B = B*2 • Integrity constraint A = B • Start with A = B = 8

  18. Steps • READ(A,t)t = t*2;WRITE(A, t)OUTPUT(A);READ(B,t)t = t*2;WRITE(B, t)OUTPUT(B);

  19. Undo logging

  20. Undo logging • Idea is to undo transactions that did not complete • Will keep on a log the previous values of all data blocks that are modified by the transaction • Will also note on log whether the transaction • completed successfully (COMMIT) • failed (ABORT)

  21. The log • Log records include • <Start T> • <Commit T> • Notes that T completed successfully • Abort<T> • Transaction failed, we need to undo all possible changes it made to the DB

  22. The undo log • Also includes • <T, X, v> • Transaction T changed DB element X and its former value is v

  23. Start T1 T1 A, 50 Start T2 T1 B, 30 T2C, "i" T1 D, 30 Start T3 T3E, "x" CommitT1 T2F, 0 T3G, "z" CommitT2 … An undo log • Will contain several interleaved transaction

  24. Undo logging rules • If a transaction T modifies DB element X, the log record <T, X, v> must be written to disk before the new value of is written to disk • If T commits, its <COMMIT> record cannot be written to disk until after all database elements changed by T have been written to disk • And not much later than that!

  25. Example • <START T>READ (A,t);t = t*2;WRITE(A, t) preceded by <T, A, 8>READ (B,t);t = t*2;WRITE(B, t) preceded by <T, B, 8>FLUSH LOG;OUTPUT(A);OUTPUT(B) followed by <COMMIT> original value original value

  26. Another example (I) • Transferring cash from account A to account B • Start with • A = $1200 • B = $100 • Want to transfer $500

  27. Another example • <START T>READ (A,t);t = t - $500 ;WRITE(A, t) preceded by <T, A, 1200>READ (B,t);t = t + 500;WRITE(B, t) preceded by <T, B, 100>FLUSH LOG;OUTPUT(A);OUTPUT(B) followed by <COMMIT> original value original value

  28. Important • You cannot commit the transaction until all physical writes to disk have successfully completed

  29. Recovery using undo logging • Look at translation records on the log • Do they end with a <COMMIT> • If translation is committed • Do nothing else • Restore the initial state of the DB

  30. Why? • Since the transaction <COMMIT> marks the completion of all any physical writes to the disk • We can safely ignore all committed transactions because they have safely completed • We must undo all other transactions because they could have left the DB in an inconsistent state

  31. Start T1 T1 A, 50 Start T2 T1 B, 30 T2C, "i" T1 D, 30 Start T3 T3E, "x" CommitT1 T2F, 0 T3G, "z" CommitT2 Recovering from an undo log • Transactions T1 and T2 have completed • Nothing to do • Transaction T3 never completed • One action to undo • Reset entity E to previous value "x"

  32. Checkpointing (I) • Quiescent checkpoints • Wait until all current transactions have committed then write <CHECKPOINT> • Very simple but slows down the DB while the checkpoint waits for all transactions to complete

  33. A quiescent checkpoint Can safely ignore the part of the log before the checkpoint Must look for uncommitted transactions

  34. Checkpointing (II) • Non-Quiescent Checkpoints • Two steps • Start checkpoint noting all transactions that did not yet complete<START CHECKPOINT(T1, T2, ...)> • Wait until all these transactions have committed then write <END CHECKPOINT(T1, T2, ...)> • Does not slow down the DB

  35. A non-quiescent checkpoint Can safely ignore this part of the log Must look for uncommitted transactions Must look for uncommitted transactions STARTCHECKPOINT ENDCHECKPOINT

  36. Another non-quiescent checkpoint Cannot ignore this part of the log but can restrict search to transactions(T1, T2, …, Tn) Must look for uncommitted transactions START CHECKPOINT (T1, T2, …, Tn)

  37. Purging the log • Can remove all log entries pertaining to transactions that started before • A quiescent checkpoint • The start of a non quiescent checkpoint after that checkpoint ended

  38. Redo logging

  39. Redo logging • Idea is to redo transactions that did complete and not let other transactions modify in any way the DB • Will keep on a log the new values of all data blocks that that the transaction plans to modify • Will also note on log whether the transaction • completed successfully (COMMIT) • failed (ABORT)

  40. The redo log • Log records include • <Start T> • <Commit T> • <Abort<T> • <T, X, w> • Transaction T changed DB element X and its new value is w.

  41. Start T1 T1 A, 80 Start T2 T1 B, 20 T2C, "i" T1 D, 40 Start T3 T3E, "x" CommitT1 T2F, 0 T3G, "z" CommitT2 … A redo log • Will contain several interleaved transaction

  42. Redo logging rules • If a transaction T modifies DB element X, the log record <T, X, v> must be written to disk before the transaction commits • If T commits, its <COMMIT> record must be written to disk before any database element changed by T can be written to disk

  43. Example • <START T>READ (A,t);t = t - $500 ;WRITE(A, t) preceded by <T, A, 700>READ (B,t);t = t + 500;WRITE(B, t) preceded by <T, B, 600>FLUSH LOG;<COMMIT>OUTPUT(A);OUTPUT(B); new value new value must be written to logbefore any OUTPUT

  44. Recovery using redo logging • Look at translation records on the log • Do they end with a <COMMIT> • If translation is committed • Replay the transaction from the log else • Do nothing Just the opposite of what undo logging does!

  45. Why? • Since the transaction <COMMIT> now precedes any physical writes to the disk • We must replay all committed transactions because we do not know if the physical writes were actually completed before the crash. • We can ignore non-committed transactions because they did not modify the data on disk

  46. Start T1 T1 A, 60 Start T2 T1 B, 50 T2C, "i" T1 D, 5 Start T3 T3E, "z" CommitT1 T2F, 6 T3G, "u" CommitT2 Recovering from a redo log • Transactions T1 and T2 have completed • Must replay them • Transaction T3 never completed • Did not modify the DB • Can ignore it

  47. Important • You must flush all the buffer pages that were modified by the transactions that have already committed • And no other! • If you flush any buffer page that was modified by a transaction that did not yet commit, you will be in big trouble if the transaction aborts

  48. A non-quiescent checkpoint Can safely ignore this part of the log Big flush Can now ignore all transactions that completed before the start of the checkpoint STARTCHECKPOINT ENDCHECKPOINT

  49. Recovering after a check point • Roll back to most recent complete checkpoint • Replay all committed transactions that • Are in the list of in progress transactions at the start of the checkpoint • Started after the start of the checkpoint • Can ignore all other transactions

  50. A new problem • What if the same block of the DB is modified • By a transaction that has already committed, • By another transaction that has not yet committed? • Should we flush the block or not? • No good answer

More Related