CS411 Database Systems

CS411Database Systems 12: Recovery obama and ericschmidthttp://www.youtube.com/watch?v=k4RRi_ntQc8sysadmin song http://www.youtube.com/watch?v=udhd9fmOdCs14th century sysadminhttp://www.youtube.com/watch?v=8UXAF-CUmIA

Bad things happen, but the DB contents must live on regardless. System crashes are the most common problem. We’ll worry about media failure later.

On restart, some transactions should be aborted, others must be durable. crash! T1 T2 T3 T4 T5 T1, T2,T3are should be durable. T4,T5should be aborted.

Recovery has a big impact on buffer management. Force T’s writes to disk at commit time? • Poor response time. • If not, how do we guarantee durability? Steal dirty buffer pool pages from uncommitted Tns? • If not, poor throughput. • If so, what about atomicity? No Steal Steal Force Trivial Desired No Force If T aborts, must undo T’s writes on disk!

The log helps us guarantee atomicity and durability. Append-only file with all info needed to REDO or UNDO every write Give it its own disk (why?)

Undo Logging(force, steal)

Undo logs don’t need to save after-images Log record types: <START T> • transaction T has begun <COMMIT T> • T has committed <ABORT T> • T has aborted <T, X, old_v> • T has updated element X, and its old value was old_v

Undo logging has 2.5 rules. There may be many pages to force, & other Tns may want them in memory U1: If T modifies X, then the log record <T, X, old_v> must be on disk before X is written to disk U2: If T commits, then <COMMIT T> can’t be written to disk until all data changes by T are on disk (“early OUTPUTs”) Buffer management rule, not a logging rule U2.5: Need to do the right thing when a transaction aborts (what?)

Crash recovery is easy with an undo log. • Scan log, decide which transactions T completed.  <START T>….<COMMIT T>….  <START T>….<ABORT T>…….  <START T>……………………… • Starting from the end of the log, undo all modifications made by incomplete transactions. The chance of crashing during recovery is relatively high! But undo recovery is idempotent: just restart it if it crashes.

Detailed algorithm for undo log recovery From the last entry in the log to the first: • <COMMIT T>: mark T as completed • <ABORT T>: mark T as completed • <T,X,v>: if T is not completed then write X=v to disk else ignore • <START T>: ignore So how should we handle ordinary Tn aborts?

Undo recovery practice Which actions do we undo, in which order? What could go wrong if we undid them in a different order? … <T6,X6,v6> … … <START T5> <START T4> <T1,X1,v1> <T5,X5,v5> <T4,X4,v4> <COMMIT T5> <T3,X3,v3> <T2,X2,v2>

Scanning a year-long log is SLOW and businesses lose money every minute their DB is down. Solution: checkpoint the database periodically. Easy version: • Stop accepting new transactions • Wait until all current transactions complete • Flush log to disk • Write a <CKPT> log record, flush • Resume transactions

During undo recovery, stop at first checkpoint. … … <T9,X9,v9> … … (all completed) <CKPT> <START T2> <START T3 <START T5> <START T4> <T1,X1,v1> <T5,X5,v5> <T4,X4,v4> <COMMIT T5> <T3,X3,v3> <T2,X2,v2> other transactions T2,T3,T4,T5

This “quiescent checkpointing” isn’t good enough for 24/7 applications. Instead: • Write <START CKPT(T1,…,Tk)>,where T1,…,Tk are all active transactions • Continue normal operation • When all of T1,…,Tk have completed, write <END CKPT>

Example of undo recovery with nonquiescentcheckpointing … … … … … <START CKPT T4, T5, T6> … … … … <END CKPT> … … … earlier transactions plus T4, T5, T5 T4, T5, T6, plus later transactions What would go wrong if we didn’t use <END CKPT> ? later transactions

Crash recovery algorithm with undo log, nonquiescent checkpoints. • Scan log backwards until the start of the latest completed checkpoint, deciding which transactions T completed.  <START T>….<COMMIT T>….  <START T>….<ABORT T>…….  <START CKPT {T…}>….<COMMIT T>….  <START CKPT {T…}>….<ABORT T>…….  <START T>……………………… • Starting from the end of the log, undo all modifications made by incomplete transactions.

Redo Logging(no force, no steal)

Redo log entries are just slightly different from undo log entries. <START T> <COMMIT T> <ABORT T> <T, X, new_v> • T has updated element X, and its new value is new_v same as before

Redo logging has one rule. R1: If T modifies X, then both <T, X, new_v> and <COMMIT T> must be written to disk before X is written to disk (“late OUTPUT”) Don’t steal dirty buffer pages from uncommitted tns! Don’t have to force all those dirty data pages to disk before committing! Implicit and reasonable assumption: log records reach disk in order; otherwise terrible things will happen.

Recovery is easy with an undo log. • Decide which transactions T completed.  <START T>….<COMMIT T>….  <START T>….<ABORT T>…….  <START T>……………………… • Read log from the beginning, redo all updates of committed transactions. The chance of crashing during recovery is relatively high! But REDO recovery is idempotent: just restart it if it crashes.

Example of redo recovery Which actions do we redo, in which order? What could go wrong if we redid them in a different order? <START T1> <T1,X1,v1> <START T2> <T2, X2, v2> <START T3> <T1,X3,v3> <COMMIT T2> <T3,X4,v4> <T1,X5,v5> … …

Nonquiescentcheckpointing is trickier with a redo log than an undo log • Write a <START CKPT(T1,…,Tk)>where T1,…,Tk are the active transactions • Flush to disk all dirty data pages of transactions committed by the time the checkpoint started, while continuing normal operation • After that, write <END CKPT> dirty = written

What exactly does “dirty” mean? • When you are talking about buffer management and which buffers you can steal, a dirty page is a data page in memory that has been modified but not yet sent back to disk. • When you are talking about concurrency control, a dirty page is a data page in memory that has been modified but not yet committed. A dirty read is a read of a dirty page. Either way, the dirty pages are the ones that can get you in trouble.

Example of redo recovery with nonquiescentcheckpointing … <START T1> … <COMMIT T1> … … <START CKPT T4, T5, T6> … … <END CKPT> … … <START CKPT T9, T10> … 2. Redo from <START T>, for committed T in {T4, T5, T6}. All data written by T1 is known to be on disk 3. Normal redo for committed Tnsthat started after this point. 1. Look for the last <END CKPT>

But neither undo nor redo logging matches what we would like to have for buffer management No Steal Steal Undo Logging Force Trivial Use undo/redo logging to attain this nirvana Redo Logging Desired No Force

Redo/undo logs save both before-images and after-images. <START T> <COMMIT T> <ABORT T> <T, X, old_v, new_v> • T has written element X; its old value was old_v, and its new value is new_v

Undo/redo recovery has 1.5 rules. “Write-ahead logging” • Must force the log record for an update to disk before the corresponding data page goes to disk. As usual, T committed iff<T commits> is on disk 1.5: Need to do the right thing when a transaction aborts (what?) Item X can be updated on disk once <T wrote X> is on disk , before<T commits> is on disk (i.e., early or late OUTPUT)

Recovery is more complex with undo/redo logging. • Redo all committed transactions, starting at the beginning of the log • Undo all incomplete transactions, starting from the end of the log REDO <START T1> <T1,X1,v1> <START T2> <T2, X2, v2> <START T3> <T1,X3,v3> <COMMIT T2> <T3,X4,v4> <T1,X5,v5> … … UNDO How do we know these undos won’t undo some committed writes? “incomplete” = started & not committed or aborted

Pointers are one of many tricks to speed up future undos Algorithm for non-quiescent checkpoint for undo/redo … <start checkpoint, active Tns are T1, T2, …> … <end checkpoint> … • Write <start checkpoint, list of all active transactions> to log • Flush log to disk • Write to disk all dirty buffers, whether or not their transaction has committed (this implies some log records may need to be written to disk (WAL)) • Write <end checkpoint> to log • Flush log to disk Active Tns Flush dirty buffer pool pages 29

Algorithm for undo/redo recovery with nonquiescent checkpoint • Backwards undo pass (end of log to start of last completed checkpoint) • C = transactions that committed after the checkpoint started • Undo actions of transactions that (are in A or started after the checkpoint started) and (are not in C) • Undo remaining actions by incomplete transactions • Follow undo chains for transactions in (checkpoint active list) – C • Forward pass (start of last completed checkpoint to end of log) • Redo actions of transactions in C … <start checkpoint, A=active Tns> … <end checkpoint> … Active Tns UNDO REDO S

Exampleswhat to do at recovery time? no <T1 commit> … T1 wrote A, … … checkpoint start (T1 active) … T1 wrote B, … … checkpoint end … T1 wrote C, … …  Undo T1 (undo A, B, C)

Exampleswhat to do at recovery time? … T1 wrote A, … … checkpoint start (T1 active) … T1 wrote B, … … checkpoint end … T1 wrote C, … … T1 commit  Redo T1: (redo B, C)

Real world actions E.g., dispense cash at ATM Ti = a1…... aj…... an $ Why are these a problem from a DB perspective? “Solution”: (1) try to make idempotent (2) execute real-world actions after commit

Physical disasters

These recovery algorithms won’t help you if your disk fails. Solution: careful replication!

Example 1Triple modular redundancy Keep 3 copies on separate disks • Output(X) --> three outputs • Input(X) --> three inputs + vote Copy 3 Copy 1 Copy 2

Example 2 Redundant writes, single reads Keep N copies on separate disks • Output(X) --> N outputs • Input(X) --> Input one copy - if ok, done - else try another one Assumes bad data can be detected (traditional but false) Copy 1 Copy 1 Copy 1 Copy 1 Copy 1 Copy 1

Example 3: DB dump + log backup database active database log If active database is lost, • restore active database from backup • bring up-to-date using redo entries in log

When can log be discarded? last needed undo checkpoint db dump log time not needed for media recovery not needed for undo after system failure not needed for redo after system failure

DB prevLSN XID type pageID length offset before-image after-image LSN (log sequence number) The real picture: what’s stored where LOG RAM LogRecords Xact Table lastLSN status Dirty Page Table recLSN flushedLSN Data pages each with a pageLSN (LSN of last write to that data page) Master record

Summary of Logging/Recovery • Recovery manager guarantees atomicity & durability---two of the ACID properties. • Redo logging and undo logging are simple but make the system too slow in practice for serious applications. • Use write-ahead logging with undo/redo logging to speed up the system (by allowing STEAL/NO-FORCE) without sacrificing correctness.

Summary, Cont. • Checkpointing: A quick way to limit the amount of log to scan on recovery. Nonquiescent checkpoints are especially useful. • Recovery works in 3 phases: • Analysis: Forward from checkpoint. • Redo: Forward. • Undo: Backward.

CS411 Database Systems