1 / 26

Recovery

Recovery. Recovery. Lightweight Recoverable Virtual Memory Rio Vista. Introduction. failure when a system does not perform in the manner defined erroneous state state that could lead the system to the failure fault anomalous physical condition causes design/manufacturing error

dermot
Download Presentation

Recovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recovery

  2. Recovery • Lightweight Recoverable Virtual Memory • Rio Vista

  3. Introduction • failure • when a system does not perform in the manner defined • erroneous state • state that could lead the system to the failure • fault • anomalous physical condition • causes • design/manufacturing error • damage/fatigue • external disturbance • faults lead the system to an erroneous state which may or may not results in a failure

  4. Failures • process failure • deadlock, timeout, protection violation, ... • OS should confine this failure to the process • system failure • software and hardware • amnesia failure: cannot recover the state just before the failure • pause failure: the state can be reinstated • halting failure: the system never restarts • disk failure • serious problem when it is the last backup storage • usually backed up by tape OR • mirrored (it will enhance read throughput anyway) • communication medium failure • does not cause total system failure

  5. Error Recovery • Forward Error Recovery • allow the process to proceed after fixing errors • difficult to remove all the errors (in software, procedures to cope with all kinds of error should be prepared, which is almost impossible) • Backward Error Recovery • the process should restart from the saved (or predefined) state • roll-back mechanism is needed • easy to cope with any kind of errors (it is not necessary to anticipate all kinds of errors) • overhead to restore previous state • checkpointing is needed • same error may occur again

  6. Backward Error Recovery • Operation-based approach • using a log, undo(roll-back) what has been done until an error-free state can be restored • write ahead log (for a write to X) • records in a log new value of X • updates X • State-based approach • checkpoint • a complete state of a process • at crash, rollback to the most recent safe state • needs many checkpoints • shadow page • copy of a page that is to be updated • updates are done only on the original page • at crash, goes back to the shadow page • at commit, keep using the original page

  7. Issues in Recovery(1) • failure and recovery of a process affect other processes that exchange data with the failed process • orphan message • when a process rolls back to the point before sending out a message • actions of other processes depending on the orphan message should be rolled back, too (domino effects) • lost message • node Y receives a message from X • Y rolls back to the point before receiving the message • effects are the same as when the message is lost

  8. Issues in Recovery(2) • livelocks 2. orphan message, roll back x X n1 x m1 Y 1. failure, and roll back • Y sends out m1 and receives an orphan message n1, and rolls back • m1 becomes an orphan message • receiving m1, X rolls back

  9. Checkpoints • local checkpoint • snapshot of a single node • superscalar CPU and out-of-order memory operations made checkpointing difficult • global checkpoint • strongly consistent set of checkpoints • all the checkpoints are inside a given interval • no information is exchanged between any processes during this interval • this is the last place any process should rolls back to

  10. Checkpoints(2) • consistent set of checkpoints • a message recorder as “received” in a checkpoint should be recorded as “sent” in another checkpoint • no orphan message • recorded as “sent” may NOT be recorded as “received” in other checkpoint • possible lost message • simple to make this set • take a checkpoint after sending every message • or after sending N messages for better efficiency but at more chances of domino effect • lost message can be dealt as in other network protocols

  11. Synchronous Checkpointing • Assumption • FIFO delivery of messages • no lost message • Operations • an initiating node P broadcasts a message • all the other node • take temporary checkpoints if necessary • reply OK to the P • do not send any message until they hear from P • P broadcasts either • GO: if all the nodes reply OK to P • Fail: otherwise • Nodes make the temporary checkpoint permanent or discard it • start to send messages from this point

  12. Synchronous Checkpointing • advantages • east recovery: all processes restarts from the checkpoint • disadvantages • message overhead • hinder normal progress (no computational messages are allowed during checkpointing)

  13. Asynchronous Checkpointing • checkpoint at each node is made independently • no guarantee of consistent set • recovery is complex to find the nearest consistent set • optimization: all incoming messages are logged after checkpoint • recovery algorithm analyzes the log and find the most recent consistent set of checkpoints

  14. Asynchronous Checkpointing(2) • Y crashes • Y restarts from the last checkpoint • send ROLLBACK(Y,2) to X since the last checkpoint records that Y has sent 2 msgs to X • ROLLBACK(Y,1) to Z (red lines) • other nodes sends back ROLLBACK msgs similarly (blue lines) • X sends out (X,2), (X,0) to Y and Z, respectively • each node sets the chkpnt as to prevent orphan msgs (red brackets) • number of received msg from i recorded in the chkpnt < N, where ROLLBACK(i,N) msg has arrived • loop until a consistent set of checkpoints comes up • bounded by N (?) X [ [ x Y [ [ Z [

  15. Free Transactions with Rio Vista • crash taxonomy • hardware: not frequent • software: frequent due to bugs in OS • power: UPS • motivations • transactions are useful but high overhead (disk accesses) • file cache is useful, but vulnerable to system crashes

  16. Traditional Approach: RVM • at the beginning of a transaction, RVM copies the page to undo log(shadow page) • user abort is serviced by the undo log • at commit, RVM reclaims undo space, and writes updated pages to redo log on disk • system/process failure is serviced by the redo log • at leisure time, database is updated from the redo log

  17. Rio file cache • protect cached data from system crashes • cache is as reliable as a disk • then, write ahead log for recovery is not needed • writes to disk can be delayed infinitely • OS errors can corrupt any part of the system • the issue is how to reduce the chances • at a crash • warm reboot process writes the cache to disk

  18. file cache vs disk • why people view memory more vulnerable than disk? • memory access is a simple write • an error in the address bits will overwrite the file cache • interface to access disk is complex and explicit • hardware controller is accessed only through device driver • calls to device drivers are checked for their arguments • it is extremely unlikely that accidental errors can forge the logic of device driver

  19. How to protect from system crashes? • prevent OS from accidentally overwriting the file cache • virtual memory mapping • turn off the write-permission bits in the page table for the pages in the file cache • unauthorized accesses will encounter protection violation • file cache module enables the bit before writing and disables the bit afterwards • the file cache is vulnerable to crashes while being written • disk has the same problem • solutions • verify after writes • use shadow copy for atomic writes

  20. How to protect from system crashes? • some kernels bypass the address translations (TLB) • many systems can disable such bypasses • otherwise, code insertion (sandboxing) • check for every kernel write using physical address • 20-50% slower • memory-mapped file • kernel procedures that modify the memory-mapped file should be changed as above • faulty user program can still corrupt files to which it has write access

  21. Warm Reboot • Recovery needs to access many data structures • internal file cache lists • page tables (memory-mapped files) • all these data must be protected from crash but they are scattered inside the kernel • Registry • a separate physical memory region • contains all the information to recover the file cache • it is updated only when a buffer is replaced (reloaded)

  22. File System Modifications • writes to disk can be saved • most disk writes are reliability-induced • writes to disk are needed only when the file cache overflows • writing back dirty copies when the system is idle • reduces the time when a buffer is replaced

  23. Vista Recoverable Memory

  24. Recovery • operations • prepare undo log • writes directly to DB’s mapped image in Rio • these updates are persistent • at commit, discard the undo log • at abort, restore the undo log to the mapped DB • at recovery • Rio writes back Vista segments that were mapped at the time of crash • Visa examines the segment if there is any uncommitted transactions • roll back (restore undo log) • recovery process should be idempotent • crash can happen while recovering

  25. Persistent Heap • only transactions can use • when they aborts, all the used heaps are returned • undo records mentioned above are stored here • programs can store their original data structures • usually convert them to record style when stored in a file • meta data for the heap is in user space • why? • need a protection from corruption • reduce the risk by using isolated range of addresses • software fault isolation • virtual memory protection

  26. Fault Tolerance with DSM • DSM maintains multiple copies of a page • if a copy is lost, it can be recovered from another copy • maintain at least two copies for each page • cope with a single failure • can be extend to cope with n-failures • what about state information? • can be rebuilt

More Related