Advanced Topics in Databases

Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Summer 2005 Technical Report II : Main memory Databases

Outline: • Introduction • MMDB vs. DRDB • Storage Issues • Reliability Issues • Data Structures • Concurrency Control • Commit Processing • Data Representation • Access Methods • T-tree definition & structure • Query Processing • Recovery

Introduction: • In a Main Memory DBMS (MMDBMS) there is no central role for I/O management. • Still, an important question in a MMDBMS is how to do transactions and recovery in an efficient way: •  Some of the proposed algorithms assume that a (small) stable subset of the main memory exists, • a piece of memory whose content will not be lost in a power outage through a battery backup. • These stable memories can be used to place, e.g., a redo log. • Others do not assume stable memories, and still use I/O to write transaction information to stable storage. • These algorithms hence do not eliminate I/O, but minimize it, as the critical path in a MMDBMS transaction only needs to write the log; not data pages from the buffer manager.

Introduction: • Motivations: • Requirement of short Access/Response time • Telecommunication applications • Applications handling high traffic of data, e.g. Router. • Real-Time applications • End of popularity of MMDBMS techniques in the early 1990s • MMDBMS were thereafter only considered of specific interest to real-time database applications, like above

Main Memory Databases Vs.Disk Resident Databases • M : the primary copy of data live permanently in main memory • D: the primary copy of data is permanently disk resident • M: there can be a backup copy ,resident on disk • D: data can be temporarily cached in main memory for access speed up.

Main Memory Databases Vs.Disk Resident Databases • Storage issues: • Access time of MM, orders of magnitude less than for disks (100 nsec vs. 10 msec) • MMDBMS footprints range between 200 KB and 2 MB • MM is normally volatile • stable memory still expensive! • Disks have high fixed dcost per access independent of the amount of the retrieved data (Block Oriented) • MM does not care of sequential access • MM data are more vulnerable to software errors, since they can be directly accessed by the processor.

Main Memory Databases Vs.Disk Resident Databases • Reliability issues: •  even if special hardware can enhance MM reliability, periodic backup is necessary • MM content is lost if system crashes • If a single memory board fails, the entire machine must be powered down •  loosing all data! • Whatever power backup for main memory is, in turn, is less reliable than passive magnetic media.

Main Memory Databases Vs.Disk Resident Databases • Data Structures: • MMDB are not DRDB with a very large cache! • DRDB: • Cached data are accessed through indexes designed for disk access. • Access is made through a buffer manager ,which, given the disk address, checks if the relevant block is in the MM-Cache and then copies it to the MM application working area . • MMDB: • data are accessed by directly referring to their memory address. The following pictures illustrate this fact. API for DRDB API for MMDB

MMDBMS Concurrency Control: • Lock duration is short : •  reduced contention • The advantage of small locking granules (fields or records) (when data are memory resident) ,is removed, • Very large lock granules (e.g., relations) are most appropriate (up to the entire database) for memory resident data. • Serial transaction processing • almost eliminate the need of concurrency control •  highly reduces cache flushes. • CC is still necessary when, • Mixed length transactions coexist • A multiprocessor system shares the DB among different units.

MMDBMS Concurrency Control: • Implementation : • Traditional: • a hash table that contains entries for the objects currently locked . • No lock information attached to data • MMDB: • Stuff some small number of locking information to data. • the first bit is set  the object is locked, • If it is locked and the second bit is set, • there are one or more waiting transaction • else it is free.

MMDBMS Commit Processing • necessary to have a backup copy and to keep a log of transaction activity • Before a transaction can commit, its activity records must be written to the log. • Logging can impact response time, since each transaction must wait for at least one stable write before committing. • Logging can also affect throughput if the log becomes a bottleneck • In MMDBs, the logging represents the only disk operation each transaction will require.

MMDBMS Commit Processing • Typical log length 400 bytes • 40 bytes for Begin/End • 360 bytes for Old/New values •  stable the log tail (< 100 pages ) in a small amount of stable main memory • Reduces response time • Doesn’t affect bottlenecks • Pre-committing: • releases transaction’s locks as soon as its log record is placed in the log, without waiting for the information to be propagated to the disk. • Doesn’t affect serialization • Because the log is sequential

MMDBMS Commit Processing • Doesn’t reduce response time • Enhances concurrency • Response time of others. • Group committing: • accumulates enough commit records to fill up a log page and then flushes it to disk • Reduces the total # of disk accesses • can be used to relieve a log bottleneck .

MMDB Data Representation: • Relational data are usually represented as flat files • Tuples are stored sequentially • Attribute values are embedded in the tuples. • Access is local • Space consuming due to duplicate values • Inefficient due to sequentially of computations • Need of indexes : • Relational tuples can be represented as a set of pointers to data values. • use of pointers is space efficient when large values appear multiple times in the database, • (the actual value needs to only be stored once.) • Pointers also simplify the handling of variable length fields since variable length data can be represented using pointers into a heap.

MMDB Access Methods: • Hashing: • Fast lookup & updating • Not space efficient • Doesn’t support range queries • Tree indexing: • With a single pointer get access both to an attribute value and the entire tuple • Pointers are of fixed short lengths • Suited for range queries

MMDB Access Methods: • The T-Tree : • Is a data structure whose ancestors are B-Trees and AVL-trees • It is binary like AVL-trees •  search is essentially binary • A T-node contains many elements like B-tree •  storage & update are done efficiently • Insertions and deletions usually move data between a single node • like in B-trees • Rebalancing is done by node rotation • Like in AVL-trees, but much less frequent.

MMDB Access Methods: T-Tree

MMDB Query Processing: • Query processing for DRDB focus on reducing disk access costs • Query processing for MMDB must focus on reducing processing costs • Operation costs vary from system to system • No general optimization technique • Implementation for relational operators should benefit of main memory data 7 index representation • Nested loop-join is preferred to sort-merge join: •  Although the sorted relations could be represented easily in a main memory database using pointer lists, there is really no need for this since much of the motivation for sorting is already lost .

MMDBMS Backup and Recovery: • Backups of memory resident databases must be maintained on disk or other stable storage to insure against loss of the volatile data . • 1.the procedure used during normal database operation to keep the backup up-to-date (Checkpointing), • 2.the procedure used to recover from a failure. (Failure Recovery) • loading blocks of the database “on demand” until all of the data has been loaded. • using disk striping or disk array • there must be independent paths from the disks to memory

MMDBMS Backup and Recovery: 3 major steps in a recovery process: • Logging • log information can be recorded at two different levels of abstraction, • physical logging: • records, for each update, the state of the modified database and the physical location (such as page id and offset) to which the update is applied. • logical logging: records the state transition of the database and the logical location affected by the update. • The Write Ahead Logging (WAL) is the most popular and accepted type of logging rules

MMDBMS Backup and Recovery: 2.Checkpointing • MMDB checkpointing approaches fall into three categories: • non-fuzzy checkpoint • the checkpointer first obtains locks on the data items to be checkpointed so that consistency of the database is preserved • fuzzy checkpoint • the checkpointer does not secure any locks. To synchronize with normal transactions, the checkpointer writes a record to the log after a checkpoint is completed. In addition, the database is generally dumped in segments. • log-driven checkpoint • the checkpointer applies the log to a previous dump to generate a new dump rather than dumping from the main-memory database.

MMDBMS Backup and Recovery: 3. Reloading • is the last and the one of the most important parts of the recovery process. • Reloading must ensure that the database is consistent after a crash. • The database is loaded from the last checkpointed image and the undo and redo operations are applied to restore it to the most recent consistent state. • The existing reloading schemes can be divided into two classes: • simple reloading • the system does not resume its normal operation until the entire database is brought back into the main memory • concurrent reloading • reload activities and transaction processing are performed in parallel.

Advanced Topics in Databases