Recovery in main memory databases
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

Recovery in Main Memory Databases PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Recovery in Main Memory Databases. -Le Gruenwald, Jing Huang, Margaret H. Dunham el al - Engineering Intelligent Systems, Vol.4, No. 3, September 1996 이 인선 97/08/21. Introduction. General MMDB Architecture Main Memory (MM) in RAM memory Stable Memory(SM) optional nonvolatile memory

Download Presentation

Recovery in Main Memory Databases

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Recovery in main memory databases

Recovery in Main Memory Databases

-Le Gruenwald, Jing Huang, Margaret H. Dunham el al -

Engineering Intelligent Systems, Vol.4, No. 3, September 1996

이 인선 97/08/21



  • General MMDB Architecture

    • Main Memory (MM) in RAM memory

    • Stable Memory(SM)

      • optional nonvolatile memory

      • used to hold log buffers(log tail)

      • avoid I/O actions when transaction are committed

      • essential to performance

    • Archive Memory(AM) holds a backup of the entire database

  • focus on logging, checkpointing, reloading

Mmdb logging 1

MMDB Logging(1)

  • physical logging

    • the state of the database modified by an operation are logged

    • it is recommended for MMDB systems

  • logical logging

    • contains descriptions of higher level operations and records the state transition of the database

    • the idempotent property does not hold

Mmdb logging 2

MMDB Logging(2)

  • Logging rules

    • Write Ahead Rule

      • undo-log data must be written to a nonvolatile memory prior to the updating in the database

    • Commit rule

      • if a DBMS allows a transaction to commit, the redo-log data of it should be ensured in nonvolatile storage

    • Logging After Writing

      • the after image of an updated item should be written to the log after its corresponding update is propagated to the database

      • simplifies the log processing with a fuzzy checkpointing MMDB

Mmdb logging 3

MMDB Logging(3)

  • MMDB logging differs from DRDB logging in three ways

    • a nonvolatile log buffer should be used to satisfy WAL without requiring I/O prior to transaction commit

    • physical logging is recommended as it is easier to use with fuzzy checkpointing

    • to reduce the amount of the log needed to redo transactions after a system failure, the LAW policy should be followed

Checkpointing drdb

Checkpointing DRDB

  • Commit consistent checkpointing

    • periodically stop processing transactions

    • flush all dirty cache slots and mark the log

  • cache consistent checkpointing

  • fuzzy checkpointing

    • only flushes those dirty slots that have not been flushed since before the previous checkpoint

    • normal replacement activity will flush most cache slots that were dirty since before the previous checkpoint

    • checkpoint won’t have much flushing to do and won’t delay active transaction for very long.

Checkpointing mmdbs 1

Checkpointing MMDBs(1)

  • Focuses on low-interference with normal transactions and supporting efficient recovery

  • Fuzzy checkpointing

    • Hagmann

      • first suggested using fuzzy checkpointing for MMDBs

      • “a crash recovery scheme for a memory-resident database system”

      • IEEE transactions on computers. Vol. C-35, No. 9, september 1986

      • the checkpointer does not need to obtain the locks on the data items to be checkpointed

      • the database is dumped in sections

      • after dumping a section, the checkpointer writes a log record to the log

      • a section must not overwrite its previous image (sliding monoplexed backups)

Law with fuzzy checkpointing

LAW with fuzzy checkpointing

Checkpointing mmdbs 2

Checkpointing MMDBs(2)

  • Salem and Garcia-Molina

    • “checkpointing memory-resident databases”(‘89)

    • compared the fuzzy checkpointing scheme with two-non-fuzzy checkpointing schemes

    • fuzzy checkpointing is the most efficient one

    • ping-pong scheme

      • each dirty page is flushed twice

  • Lin and Dunham

    • “segmented fuzzy checkpointing for main memory databases”(‘94)

    • checkpoints one segment at a time in a round-robin fashion

    • automatically changes the segment boundaries based on the distribution of update operations

Checkpointing mmdbs 3

Checkpointing MMDBs(3)





Redo log size in the Segmented fuzzy checkpointing

  • Li et al

    • “checkpointing and recovery in partitioned main memory databases(‘95)

    • the database is divided into partitions, each of which has its own log disks

    • the time to recover from a system failure is reduced







B C1

B C2

Checkpointing mmdbs 4

Checkpointing MMDBs(4)

  • Non-Fuzzy Checkpointing

    • overhead comes from locking the checkpointed objects to ensure transaction-consistency or action-consistency

    • Lehman and Carey

      • “a recovery algorithm for a high-performance memory-resident database system”(‘87)

      • transaction-consistent(at relation level)scheme

      • no need to maintain undo-log-records in nonvolatile storage

      • checkpointing increases the data contention with normal transaction

Checkpointing mmdbs 5

Checkpointing MMDBs(5)

  • Salem and Garcia-Molina

    • “checkpointing memory-resient databases” (‘89)

    • discuss two non-fuzzy checkpointing approaches

      • the first(black and white) one aborts some update transactions

      • the second(Copy-On-Update) one requires some update transactions storing the original values of data items to be updated

      • both have severe impact on the system performance

  • Jagadish et al

    • “recovering from main-memory lapses” (‘93)

    • propose an action-consistent checkpointing scheme

    • the undo-logs of active transactions are first written to the log, and then dirty pages are flushed to disk

    • during normal processing, the redo-logs of the committed transactions are written to the log

    • ping-pong update

    • this approach was originally used in Dali

Checkpointing mmdbs 6

Checkpointing MMDBs(6)

  • Log-driven checkpointing

    • applies the log to a previous dump to generate a new dump

    • originally used to generate remote backup of the database

    • is adopted to “incremental recovery in main memory database systems” (‘92)

    • with high transaction processing rate in MMDBs, the size of the log can increase rapidly

    • it is quite inefficient compared to fuzzy checkpointing

Mmdb reloading 1

MMDB Reloading(1)

  • Issues

    • occurrence frequency of the reload process

      • on average, a system failure occurs once every few weeks

      • media failure, MM page faults

    • when the system should resume its execution after a failure

      • 28.43 minutes are needed to recover 1Giga DB [?]

      • if the system is not available at all during recovery, many transactions will be backlogged

    • reload prioritization

      • reload priority can be determined based on access frequency, transaction deadline(“MMDB reload algorithms”) or temporal data interval from real-time applications[?]

Mmdb reloading 2

MMDB Reloading(2)

  • Existing reload schemes

    • simple reloading

      • the system can not be brought online until the entire database is memory-resident

    • concurrent reloading

      • Grenwald

        • “mmdb reload algorithms” (‘91)

        • two processors(RP & DP), nonvolatile shadow memory(SM) and dual address translation mechanism in the MARS system

        • ordered reload with prioritization/ smart reload/ frequency reload

        • the differences lie in the structure of AM, utilization of data access frequency, reload prioritization, and reload granularity

        • the frequency reload yields the best transaction response time and system throughput

Mmdb reloading 3

MMDB Reloading(3)

  • Lehman

    • “a recovery algorithm for a high-preformance”

    • after the system catalogs and their indices are reloaded then regular transaction processing is allowed to resume

  • Levy and Silberschatz

    • “incremental recovery in main memory database systems”, (‘92)

    • resume transaction processing immediately after a system failure and recovers pages individually according to the demand of post-crash transaction.

    • Stale/fresh marking technique

    • in order to implement a page-based recovery, log records must be grouped together on a page basis during normal operation

Recovery with existing mmdb systems 1

Recovery with Existing MMDB Systems(1)

  • Dali from AT&T

    • the original recovery manager was implemented according to “recovering from main-memory lapses” (‘93)

      • logging only redo records during normal execution

      • segment-level action-consistent checkpoints

      • checkpointer write to the disk relevant parts of the undo log

      • recovery has only a single pass over the log

      • require no special h/w to preserve the data

    • test led to a restructuring of its recovery manager

      • “multi-level recovery in the Dali storage manager” (‘95)

      • multi-level logging, post-commit actions, dirty page detection, and fuzzy checkpoints

Recovery with existing mmdb systems 2

Recovery with Existing MMDB Systems(2)

  • Fast Path

    • supports the memory-resident data and disk-resident data

    • performs updates to memory resident data at commit time

    • no undo operations are required when a failure occurs

    • a group commit is adopted

    • transaction-consistent backup copy of the database is refreshed during system shutdown or infrequently checkpoints.

    • Two backup database with ping-pong backups

Recovery with existing mmdb systems 3

Recovery with Existing MMDB Systems(3)

  • two real-time system examples

    • NEC Real-Time DBMS

    • Stone RTDB

  • NEC RTDBMS has several features to ensure high throughput and accurate predictability

    • no page fault

    • in-memory log buffer is nonvolatile

    • physical logging using deferred update

    • fuzzy checkpointing

    • no real-time characteristics such as transaction deadline and criticalness are utilized in the recovery components

Summary and conclusion

Summary and Conclusion

  • Discussed 3 logging rules

    • nonvolatile log buffer should be used to satisfy WAL without requiring I/O prior to transaction commit

    • LAW should be followed to reduce the amount of log needed to redo transactions after a system failure

  • described three groups of checkpointing

  • identified 3 issues about reloading

    • data should be prioritized for reload purposes

  • future research

    • investigate how real-time requirements such as transaction deadline and temporal data intervals can be incorporated into MMDB recovery

A crash recovery scheme for a memory resident database system

a crash recovery scheme for a memory-resident database system

Robert B. Hagmann

IEEE transactions on computers. Vol. C-35, No. 9, september 1986



  • Presents a method of doing recovery that uses the existing techniques of fuzzy dumps and log compression

  • design requirement

    • small system example

      • 2 pages/transaction *100 transactions/s * 3600s /h * 8h = 5,760,000 pages written to the log

    • transaction size must be short

    • checkpointed periodically every five minutes

Overview 2


  • The principal requirement of the system is “fast” recovery from a system crash

    • critical factor : transfer rate of the disk

    • can be improved by using several parallel processors

  • design overview

    • fuzzy dump

      • simply a copy of the database taken without any synchronization

    • If a DBMS uses a nonvolatile storage, some log compression can occur

    • else precommitting and group commits can be used to increase performance

  • Overview1


    • Design details

  • Login