Recovery in Main Memory Databases

Recovery in Main Memory Databases -Le Gruenwald, Jing Huang, Margaret H. Dunham el al - Engineering Intelligent Systems, Vol.4, No. 3, September 1996 이 인선 97/08/21

Introduction • General MMDB Architecture • Main Memory (MM) in RAM memory • Stable Memory(SM) • optional nonvolatile memory • used to hold log buffers(log tail) • avoid I/O actions when transaction are committed • essential to performance • Archive Memory(AM) holds a backup of the entire database • focus on logging, checkpointing, reloading

MMDB Logging(1) • physical logging • the state of the database modified by an operation are logged • it is recommended for MMDB systems • logical logging • contains descriptions of higher level operations and records the state transition of the database • the idempotent property does not hold

MMDB Logging(2) • Logging rules • Write Ahead Rule • undo-log data must be written to a nonvolatile memory prior to the updating in the database • Commit rule • if a DBMS allows a transaction to commit, the redo-log data of it should be ensured in nonvolatile storage • Logging After Writing • the after image of an updated item should be written to the log after its corresponding update is propagated to the database • simplifies the log processing with a fuzzy checkpointing MMDB

MMDB Logging(3) • MMDB logging differs from DRDB logging in three ways • a nonvolatile log buffer should be used to satisfy WAL without requiring I/O prior to transaction commit • physical logging is recommended as it is easier to use with fuzzy checkpointing • to reduce the amount of the log needed to redo transactions after a system failure, the LAW policy should be followed

Checkpointing DRDB • Commit consistent checkpointing • periodically stop processing transactions • flush all dirty cache slots and mark the log • cache consistent checkpointing • fuzzy checkpointing • only flushes those dirty slots that have not been flushed since before the previous checkpoint • normal replacement activity will flush most cache slots that were dirty since before the previous checkpoint • checkpoint won’t have much flushing to do and won’t delay active transaction for very long.

Checkpointing MMDBs(1) • Focuses on low-interference with normal transactions and supporting efficient recovery • Fuzzy checkpointing • Hagmann • first suggested using fuzzy checkpointing for MMDBs • “a crash recovery scheme for a memory-resident database system” • IEEE transactions on computers. Vol. C-35, No. 9, september 1986 • the checkpointer does not need to obtain the locks on the data items to be checkpointed • the database is dumped in sections • after dumping a section, the checkpointer writes a log record to the log • a section must not overwrite its previous image (sliding monoplexed backups)

LAW with fuzzy checkpointing

Checkpointing MMDBs(2) • Salem and Garcia-Molina • “checkpointing memory-resident databases”(‘89) • compared the fuzzy checkpointing scheme with two-non-fuzzy checkpointing schemes • fuzzy checkpointing is the most efficient one • ping-pong scheme • each dirty page is flushed twice • Lin and Dunham • “segmented fuzzy checkpointing for main memory databases”(‘94) • checkpoints one segment at a time in a round-robin fashion • automatically changes the segment boundaries based on the distribution of update operations

Checkpointing MMDBs(3) 1 2 3 4 Redo log size in the Segmented fuzzy checkpointing • Li et al • “checkpointing and recovery in partitioned main memory databases(‘95) • the database is divided into partitions, each of which has its own log disks • the time to recover from a system failure is reduced a1 b1 c1 a2 b2 c2 B C1 B C2

Checkpointing MMDBs(4) • Non-Fuzzy Checkpointing • overhead comes from locking the checkpointed objects to ensure transaction-consistency or action-consistency • Lehman and Carey • “a recovery algorithm for a high-performance memory-resident database system”(‘87) • transaction-consistent(at relation level)scheme • no need to maintain undo-log-records in nonvolatile storage • checkpointing increases the data contention with normal transaction

Checkpointing MMDBs(5) • Salem and Garcia-Molina • “checkpointing memory-resient databases” (‘89) • discuss two non-fuzzy checkpointing approaches • the first(black and white) one aborts some update transactions • the second(Copy-On-Update) one requires some update transactions storing the original values of data items to be updated • both have severe impact on the system performance • Jagadish et al • “recovering from main-memory lapses” (‘93) • propose an action-consistent checkpointing scheme • the undo-logs of active transactions are first written to the log, and then dirty pages are flushed to disk • during normal processing, the redo-logs of the committed transactions are written to the log • ping-pong update • this approach was originally used in Dali

Checkpointing MMDBs(6) • Log-driven checkpointing • applies the log to a previous dump to generate a new dump • originally used to generate remote backup of the database • is adopted to “incremental recovery in main memory database systems” (‘92) • with high transaction processing rate in MMDBs, the size of the log can increase rapidly • it is quite inefficient compared to fuzzy checkpointing

MMDB Reloading(1) • Issues • occurrence frequency of the reload process • on average, a system failure occurs once every few weeks • media failure, MM page faults • when the system should resume its execution after a failure • 28.43 minutes are needed to recover 1Giga DB [?] • if the system is not available at all during recovery, many transactions will be backlogged • reload prioritization • reload priority can be determined based on access frequency, transaction deadline(“MMDB reload algorithms”) or temporal data interval from real-time applications[?]

MMDB Reloading(2) • Existing reload schemes • simple reloading • the system can not be brought online until the entire database is memory-resident • concurrent reloading • Grenwald • “mmdb reload algorithms” (‘91) • two processors(RP & DP), nonvolatile shadow memory(SM) and dual address translation mechanism in the MARS system • ordered reload with prioritization/ smart reload/ frequency reload • the differences lie in the structure of AM, utilization of data access frequency, reload prioritization, and reload granularity • the frequency reload yields the best transaction response time and system throughput

MMDB Reloading(3) • Lehman • “a recovery algorithm for a high-preformance” • after the system catalogs and their indices are reloaded then regular transaction processing is allowed to resume • Levy and Silberschatz • “incremental recovery in main memory database systems”, (‘92) • resume transaction processing immediately after a system failure and recovers pages individually according to the demand of post-crash transaction. • Stale/fresh marking technique • in order to implement a page-based recovery, log records must be grouped together on a page basis during normal operation

Recovery with Existing MMDB Systems(1) • Dali from AT&T • the original recovery manager was implemented according to “recovering from main-memory lapses” (‘93) • logging only redo records during normal execution • segment-level action-consistent checkpoints • checkpointer write to the disk relevant parts of the undo log • recovery has only a single pass over the log • require no special h/w to preserve the data • test led to a restructuring of its recovery manager • “multi-level recovery in the Dali storage manager” (‘95) • multi-level logging, post-commit actions, dirty page detection, and fuzzy checkpoints

Recovery with Existing MMDB Systems(2) • Fast Path • supports the memory-resident data and disk-resident data • performs updates to memory resident data at commit time • no undo operations are required when a failure occurs • a group commit is adopted • transaction-consistent backup copy of the database is refreshed during system shutdown or infrequently checkpoints. • Two backup database with ping-pong backups

Recovery with Existing MMDB Systems(3) • two real-time system examples • NEC Real-Time DBMS • Stone RTDB • NEC RTDBMS has several features to ensure high throughput and accurate predictability • no page fault • in-memory log buffer is nonvolatile • physical logging using deferred update • fuzzy checkpointing • no real-time characteristics such as transaction deadline and criticalness are utilized in the recovery components

Summary and Conclusion • Discussed 3 logging rules • nonvolatile log buffer should be used to satisfy WAL without requiring I/O prior to transaction commit • LAW should be followed to reduce the amount of log needed to redo transactions after a system failure • described three groups of checkpointing • identified 3 issues about reloading • data should be prioritized for reload purposes • future research • investigate how real-time requirements such as transaction deadline and temporal data intervals can be incorporated into MMDB recovery

a crash recovery scheme for a memory-resident database system Robert B. Hagmann IEEE transactions on computers. Vol. C-35, No. 9, september 1986

overview • Presents a method of doing recovery that uses the existing techniques of fuzzy dumps and log compression • design requirement • small system example • 2 pages/transaction *100 transactions/s * 3600s /h * 8h = 5,760,000 pages written to the log • transaction size must be short • checkpointed periodically every five minutes

Overview(2) • The principal requirement of the system is “fast” recovery from a system crash • critical factor : transfer rate of the disk • can be improved by using several parallel processors • design overview • fuzzy dump • simply a copy of the database taken without any synchronization • If a DBMS uses a nonvolatile storage, some log compression can occur • else precommitting and group commits can be used to increase performance

overview • Design details

Recovery in Main Memory Databases

Recovery in Main Memory Databases

Presentation Transcript

Main Memory

Main Memory

Main Memory

In-memory Databases

Main Memory

Main Memory

Main Memory

Main Memory

Fuzzy Checkpointing Alternatives for Main Memory Databases

Main Memory

Main Memory

Main Memory

Identifying Hot and Cold Data in Main-Memory Databases

Main Memory Databases

Main Memory

Main Memory

Main Memory

Main Memory