1 / 28

§1. Recovery

§1. Recovery. Recovery (and concurrency) is tied to notion of transaction processing Transaction is a logical unit of work.

Download Presentation

§1. Recovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. §1. Recovery Recovery (and concurrency) is tied to notion of transaction processing Transaction is a logical unit of work. Specifically in DBMS context, it is a sequence of (usually) several database operations that transform a consistent state of the database into another consistent state (or do not change it). Consistencyis not necessarily preservedat intermediate points during processing. http://csiweb.ucd.ie/Staff/acater/comp30150.html

  2. Example: consider the relational-database tables: Supplier S# Sname City (S) S3 Acme Cork … … … Shipments S# P# Qty (SP) S3 P117 0144 … … … and the user request: “Change supplier no S3 to no S9” => must change both tables above -- “cascade” the update. http://csiweb.ucd.ie/Staff/acater/comp30150.html

  3. TRANEX1: PROC OPTIONS ( MAIN ) ; /* declarations omitted */ GET LIST ( SX , SY ); /* get values from user */ EXEC SQL UPDATE S SET S# = :SY WHERE S# = :SX ; EXEC SQL UPDATE SP SET S# = :SY WHERE S# = :SX ; RETURN ; END /* TRANEX1 */ ; http://csiweb.ucd.ie/Staff/acater/comp30150.html

  4. NB: This “simple update” involves 2 actual updates to database. In between, database is not consistent, SP records might exist with no S records corresponding. Must guard against doing 1 update but not the other, which would leave database inconsistent. Ideally one wants a guarantee that both updates will be carried out - but that is impossible. Many possible sources of failure: OS errors DBMS errors Program errors Operator mistake Hardware failures Power cuts Fire Sabotage etc http://csiweb.ucd.ie/Staff/acater/comp30150.html

  5. But all is not lost - you can be guaranteed recovery The key to recovery is redundancy : redundant information is stored in dumps and logs, and provides the ability to partially or completely reconstruct the database. Outline of commonest recovery mechanism: Dump total database regularly; Record all changes in log; If failure occurs, either • Restore DB from dump and redo completed txns if possible • Undo uncompleted txns Duplexing is also possible: keep 2 copies of database and update both simultaneously. But ensure that they have different failure modes. http://csiweb.ucd.ie/Staff/acater/comp30150.html

  6. With "transaction processing" you are guaranteed that if a failure occurs during updates, the work of transactions will either be redone (if properly finished) or undone. Transactions must appear atomic to the end-user -- executed in their entirety or not at all. Achieve this by commit and rollback operations. • Commit signals successful end-of-transaction: database is believed consistent, update can be made permanent -- “committed”. • Rollback = unsuccessful end-of-transaction; tells system that all updates within the transaction must be “rolled back” i.e. undone. http://csiweb.ucd.ie/Staff/acater/comp30150.html

  7. TRANEX2: PROC OPTIONS ( MAIN ) ; /* declarations omitted */ EXEC SQL WHENEVER SQLERROR GO TO UNDO ; GET LIST ( SX , SY ); /* get values from user */ EXEC SQL UPDATE S SET S# = :SY WHERE S# = :SX ; EXEC SQL UPDATE SP SET S# = :SY WHERE S# = :SX ; EXEC SQL COMMIT ; GO TO FINISH ; UNDO: EXEC SQL ROLLBACK ; FINISH: RETURN ; END /* TRANEX2 */ ; http://csiweb.ucd.ie/Staff/acater/comp30150.html

  8. In “TRANEX2” example we issue commit if we get through the updates successfully. But if either update fails it “raises an error condition” and a program-initiated rollback is issued to undo changes. Commit/Rollback operations may not appear in code but may be implicit - depends on implementation. Rollback of transactions in progress at the time of a crash should also happen automatically upon DBMS restart. Messages are also an issue: (see later). http://csiweb.ucd.ie/Staff/acater/comp30150.html

  9. Undoing updates System keeps log ( = journal ) of all update operations. Log records values of items before and after any change, identifying the item changed/ deleted/ inserted, and the id of the transaction. This can be used to restore database to consistent state. When a transaction commits, log also records that fact. Log may be stored online or online/offline combination. For busy multi-user systems the log can quickly become very large. http://csiweb.ucd.ie/Staff/acater/comp30150.html

  10. Synchronisation points (SYNCH points) Executing a Commit or Rollback operation establishes a synchpoint: it represents the boundary between two consecutive transactions - a point where the database is consistent. Synchpoints are only established by Commit, Rollback, and normal “program termination”. Once established: - all updates since previous synchpoint are committed or undone; - all database positioning is lost (see below - “cursors” are closed) - all record locks are released (see later - “concurrency”) http://csiweb.ucd.ie/Staff/acater/comp30150.html

  11. Cursors When SQL is embedded in a host language like COBOL (or C or C++ or Java or …), there is a need to bridge between the set-at-a-time nature of SQL and the record-at-a-time nature of COBOL (…). This is done using cursors - pointers that allow you to run through a set of records, pointing to each one in turn. Cursors allow the host language to process the records in the way natural for it; SQL on its own does not need them. http://csiweb.ucd.ie/Staff/acater/comp30150.html

  12. The process is illustrated in outline in the example, which is intended to retrieve supplier details (S#, SNAME, and STATUS) for all suppliers in the city given by the host variable Y. EXEC SQL DECLARE X CURSOR FOR /* define cursor X */ SELECT S#, SNAME, STATUS FROM S WHERE CITY = :Y; EXEC SQL OPEN X; /* execute the query /* DO WHILE (more-records-to-come); EXEC SQL FETCH X INTO :S#,:SNAME,:STATUS; /* fetch next supplier */ ........ /* and then do something! */ END; EXEC SQL CLOSE X; /*deactivate the cursor*/ http://csiweb.ucd.ie/Staff/acater/comp30150.html

  13. Commit / Rollback terminate transactions, not the program. A given program may carry out numerous consecutive transactions. A transaction is a unit of work; it is also a unit of recovery. If a program commits, then its updates must be guaranteed even if a crash occurs before the updates are flushed to disk. System restart after a crash should install updates in the database from entries in the log. Implication: one must write the log before Commit operation finishes - so-called “Write-Ahead Log Protocol” http://csiweb.ucd.ie/Staff/acater/comp30150.html

  14. System and Media Recovery Local failure -- affects only current transaction e.g. arithmetic overflow error Global failure -- affects all transactions in progress at time of failure. Include: • System failure ("soft crash") • no physical damage • Media failure ("hard crash") • database is physically damaged, eg disk head crash http://csiweb.ucd.ie/Staff/acater/comp30150.html

  15. Media failure - physical damage. • Restore database from backup, redo transactions that had completed. • No need to undo. • Can use standard dump/restore software. System failure -- no damage, but • The contents of memory are lost; • so the state of transactions is lost; • so transactions cannot be completed; • so they must be undone at restart time. • May also have to redo transactions that had been finished but not flushed-to-disk. • If the log is very large, restart can be very expensive. http://csiweb.ucd.ie/Staff/acater/comp30150.html

  16. T1 … T5 here are meant to be classes of transactions System uses checkpoints to help identify rapidly which transactions to undo, which transactions to redo. System "takes a checkpoint" at regular intervals: it flushes buffers to disk, and writes checkpoint record to physical log. Thus it records all transactions in progress at time of checkpoint. http://csiweb.ucd.ie/Staff/acater/comp30150.html

  17. Classes of transactions to be undone: those in checkpoint record (like T3) or begun after it (like T5) without Commit in log Classes of transactions to be redone: those in checkpoint record (like T2) or begun after it (like T4) that do have a Commit in log (but their changes are perhaps not flushed to disk) System restart will use log to undo, and then redo, appropriate transactions. Only then will normal activity begin. http://csiweb.ucd.ie/Staff/acater/comp30150.html

  18. Message handling Non-trivial in transaction processing. Eg: Transfer $100 from 68224 to 97636 "Transaction" should update database and issue message to user. If it does a Commit (or a voluntary Rollback), then an appropriate message should be sent. But in the event of a system failure, neither message should be displayed - just as if the transaction had never started. (One could display system-generated failure message.) So should not output messages until end-of-transaction. http://csiweb.ucd.ie/Staff/acater/comp30150.html

  19. TRANSFER: PROC; GET (FROM, TO, AMOUNT); /* input message */ FIND UNIQUE (ACCOUNT WHERE ACCOUNT# = FROM); /* now decrement the FROM balance */ ASSIGN (BALANCE - AMOUNT) TO BALANCE; IF BALANCE < 0 THEN DO; PUT ('INSUFFICIENT FUNDS'); /*output msg*/ /* undo update & terminate transaction */ ROLLBACK; END; ELSE DO; FIND UNIQUE (ACCOUNT WHERE ACCOUNT# = TO); /* now increment the TO balance*/ ASSIGN (BALANCE + AMOUNT) TO BALANCE; PUT ('TRANSFER COMPLETE'); /*output msg*/ /* commit update and terminate transaction*/ COMMIT; END; END /* TRANSFER */ http://csiweb.ucd.ie/Staff/acater/comp30150.html

  20. Place messages in pending queue, to be delivered on termination or discarded on failure. [In the case of a cash dispensing terminal, the delivery of your money is one message] Messages are handled by Data Communications Manager (DCM): when it receives input it places it in a queue. GET retrieves a copy from queue & logs it. PUT outputs to queue Commit/Rollback cause DCM to log messages, transmit them or clear input queues. DCM cancels output messages on failure. Log is used for redo. http://csiweb.ucd.ie/Staff/acater/comp30150.html

  21. Transaction structure - general format • - accept input • - perform database processing • - send output http://csiweb.ucd.ie/Staff/acater/comp30150.html

  22. Undo & Redo are idempotent : System can fail during Undo/Redo. Must ensure that • Undo(Undo(Undo...(x))) = Undo(x) • Redo(Redo(Redo...(x))) = Redo(x) ie that Undoing a change any number of times has the same effect as undoing it once; and similarly, Redoing it any number of times has the same effect as Redoing once. http://csiweb.ucd.ie/Staff/acater/comp30150.html

  23. 3 types of System Startup • Cold start: start from scratch. Normally only when 1st installed, but also possible after disasters (esp media failure) • Warm start: Startup after controlled shutdown. No need for Redo/Undo. • Emergency restart: a process invoked by operator after failure. Involves Redo/Undo and - perhaps - reloading the database. http://csiweb.ucd.ie/Staff/acater/comp30150.html

  24. Two phase Commit When a transaction commits, you are guaranteed that a recovery manager will be able to redo the transaction in the event of a failure. (force-write log). If transaction involves 2 (or more) systems - eg in a distributed DBMS - recovery is more difficult: there are 2 (or more) independent recovery mechanisms. The aim is still to preserve the "all or nothing" principle of transaction processing. This leads to two phase commit. Need to be able to exercise control over different systems, so need one system to act as coordinator component. http://csiweb.ucd.ie/Staff/acater/comp30150.html

  25. Individual transactions now send Commit/ Rollback to coordinator, which operates in 2 phases: • request all participants in the transaction to get ready to go, and send an acknowledgement to coordinator - OK or NOT OK • coordinator then broadcasts Commit to all participants, if all replies were OK; or Rollback if not all OK or if timeout occurred. http://csiweb.ucd.ie/Staff/acater/comp30150.html

  26. IN COORDINATOR: for each participant send “get ready to commit” to participant; wait for reply or timeout; if all participants replied ‘OK’ then forcewrite “broadcasting COMMIT” to coordinator log for each participant until acknowledgement received send “COMMIT” to participant wait for acknowledgement or timeout else forcewrite “broadcasting ROLLBACK” to coordinator log for each participant until acknowledgement received send “ROLLBACK” to participant wait for acknowledgement or timeout http://csiweb.ucd.ie/Staff/acater/comp30150.html

  27. IN PARTICIPANT: wait for “get ready to commit” message; force outstanding change records to local log; force “agree to commit” to local log; if errors occurred then send ‘NOT OK’ to coordinator else send ‘OK’ to coordinator wait for broadcast command from coordinator if command is “COMMIT” then commit changes to local resources if command is “ROLLBACK” then undo changes to local resources send acknowledgement to coordinator http://csiweb.ucd.ie/Staff/acater/comp30150.html

  28. Note 1) Timeout 2) Resources are held until global termination - termination on all systems. In the event of failure :- • in coordinator • before broadcasting: restart should issue Rollback • after: restart procedure issues Commit or Rollback as appropriate • in participant • before "agree to commit”: restart issues "NOT OK" (note that timeout will have occurred anyway) • after: ask coordinator to rebroadcast message, and undo/redo transaction locally http://csiweb.ucd.ie/Staff/acater/comp30150.html

More Related