PMIT-6102 Advanced Database Systems

PMIT-6102Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University

Lecture -10Concurrency Control

Outline • Transaction • Transaction state • Schedule • serializability, • time-stamping, and • shadow-database scheme. • Concurrency Control • Concurrency • Three classic problems and example of solution • Deadlock

Transaction State • Active, the initial state; the transaction stays in this state while it is executing • Partially committed, after the final statement has been executed. • Failed, after the discovery that normal execution can no longer proceed. • Aborted, after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. Two options after it has been aborted: • restart the transaction – only if no internal logical error • kill the transaction • Committed, after successful completion.

Transaction State (Cont.)

Shadow-database scheme • The recovery-management component of a database system implements the support for atomicity and durability. • The shadow-database scheme: • assume that only one transaction is active at a time. • a pointer called db_pointer always points to the current consistent copy of the database. • all updates are made on a shadow copy of the database, and db_pointer is made to point to the updated shadow copy only after the transaction reaches partial commit and all updated pages have been flushed to disk. • in case transaction fails, old consistent copy pointed to by db_pointer can be used, and the shadow copy can be deleted.

Shadow-database scheme The shadow-database scheme: • Assumes disks to not fail • Useful for text editors, but extremely inefficient for large databases: executing a single transaction requires copying the entire database.

Schedules • Schedules – sequences that indicate the chronological order in which instructions of concurrent transactions are executed • a schedule for a set of transactions must consist of all instructions of those transactions • must preserve the order in which the instructions appear in each individual transaction. • Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. The schedule in the figure is a serial schedule (Schedule 1 in the text), in which T1 is followed by T2.

Schedule (Cont.) • Let T1 and T2 be the transactions defined previously. The following schedule (Schedule 3 in the text) is not a serial schedule, but it is equivalent to Schedule 1. In both Schedule 1 and 3, the sum A + B is preserved.

Schedules (Cont.) • The following concurrent schedule (Schedule 4 in the text) does not preserve the value of the the sum A + B.

Serializability • A schedule is serializable if it is equivalent to a serial schedule. Different forms of schedule equivalence give rise to the notions of: 1. conflict serializability • view serializability • Our simplified schedules consist of only read and write instructions.

Conflict Serializability • Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some item Q accessed by both li and lj, and at least one of these instructions wrote Q. 1. li = read(Q), lj = read(Q). li and ljdon’t conflict.2. li = read(Q), lj = write(Q). They conflict.3. li = write(Q), lj = read(Q). They conflict4. li = write(Q), lj = write(Q). They conflict

Conflict Serializability (Cont.) • If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting instructions, we say that S and S´ are conflict equivalent. • We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule • Example of a schedule that is not conflict serializable: T3T4read(Q)write(Q)write(Q)We are unable to swap instructions in the above schedule to obtain either the serial schedule < T3, T4 >, or the serial schedule < T4, T3 >.

Conflict Serializability (Cont.) • Schedule 3 below can be transformed into Schedule 1, a serial schedule where T2 follows T1, by series of swaps of non-conflicting instructions. Therefore Schedule 3 is conflict serializable. Schedule 1 Schedule 3

View Serializability • Let S and S´ be two schedules with the same set of transactions. S and S´ are view equivalentif the following three conditions are met: 1. For each data item Q, if transaction Tireads the initial value of Q in schedule S, then transaction Ti must, in schedule S´, also read the initial value of Q. 2. For each data item Q if transaction Tiexecutes read(Q) in schedule S, and that value was produced by transaction Tj(if any), then transaction Ti must in schedule S´ also read the value of Q that was produced by transaction Tj . 3. For each data item Q, the transaction (if any) that performs the final write(Q) operation in schedule S must perform the finalwrite(Q) operation in schedule S´.

View Serializability (Cont.) • A schedule S is view serializableif it is view equivalent to a serial schedule. • Every conflict serializable schedule is also view serializable. • Schedule 9 (from text) — a schedule which is view-serializable but not conflict serializable. • Every view serializable schedule that is not conflictserializable has blind writes.

Recoverableschedule • Recoverableschedule — if a transaction Tj reads a data items previously written by a transaction Ti , the commit operation of Ti appears before the commit operation of Tj. • The following schedule (Schedule 11) is not recoverable if T9commits immediately after the read • If T8 should abort, T9 would have read (and possibly shown to the user) an inconsistent database state. Hence database must ensure that schedules are recoverable.

Recoverableschedule • Cascading rollback – a single transaction failure leads to a series of transaction rollbacks. Consider the following schedule where none of the transactions has yet committed (so the schedule is recoverable)If T10 fails, T11 and T12 must also be rolled back. • Can lead to the undoing of a significant amount of work

Recoverableschedule • Cascadelessschedules — cascading rollbacks cannot occur; for each pair of transactions Tiand Tj such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the read operation of Tj. • Every cascadeless schedule is also recoverable

Concurrency Control • To ensure that it is, the system must control the interaction among the concurrent transactions. • This control is achieved through one of a variety of mechanisms called concurrency control schemes. • the most frequently used schemes are • two-phase locking • snapshot isolation.

Lock-Based Protocols • One way to ensure isolation is to require that data items be accessed in a mutually exclusive manner; that is, while one transaction is accessing a data item, no other transaction can modify that data item. • The most common method used to implement this requirement is to allow a transaction to access a data item only if it is currently holding a lock on that item.

A lock is a mechanism to control concurrent access to a data item Data items can be locked in two modes : 1. exclusive (X) mode. Data item can be both read as well as written. X-lock is requested using lock-X instruction. 2. shared (S) mode. Data item can only be read. S-lock is requested using lock-S instruction. Lock requests are made to concurrency-control manager. Transaction can proceed only after request is granted. Lock-Based Protocols

Lock-compatibility matrix A transaction may be granted a lock on an item if the requested lock is compatible with locks already held on the item by other transactions Any number of transactions can hold shared locks on an item, but if any transaction holds an exclusive on the item no other transaction may hold any lock on the item. If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks held by other transactions have been released. The lock is then granted. Lock-Based Protocols (Cont.)

Example of a transaction performing locking: T2: lock-S(A); read (A); unlock(A); lock-S(B); read (B); unlock(B); display(A+B) Locking as above is not sufficient to guarantee serializability — if A and B get updated in-between the read of A and B, the displayed sum would be wrong. A locking protocol is a set of rules followed by all transactions while requesting and releasing locks, indicating when a transaction may lock and unlock each of the data items. The set of all such schedules is a proper subset of all possible serializable schedules Lock-Based Protocols (Cont.)

If we do not use locking, or if we unlock data items too soon after reading or writing them, we may get inconsistent states. On the other hand, if we do not unlock a data item before requesting a lock on another data item, deadlocks may occur. Consider the partial schedule Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for T3 to release its lock on B, while executing lock-X(A) causes T3 to wait for T4 to release its lock on A. Such a situation is called a deadlock. To handle a deadlock one of T3 or T4 must be rolled back and its locks released. Pitfalls of Lock-Based Protocols

Starvation is also possible if concurrency control manager is badly designed. For example: A transaction may be waiting for an X-lock on an item, while a sequence of other transactions request and are granted an S-lock on the same item. Concurrency control manager can be designed to prevent starvation. Pitfalls of Lock-Based Protocols (Cont.)

One protocol that ensures serializability is the two-phase locking protocol. This protocol requires that each transaction issue lock and unlock requests in two phases: Phase 1: Growing Phase transaction may obtain locks transaction may not release locks Phase 2: Shrinking Phase transaction may release locks transaction may not obtain locks The protocol assures serializability. It can be proved that the transactions can be serialized in the order of their lock points (i.e. the point where a transaction acquired its final lock). The Two-Phase Locking Protocol

Initially, a transaction is in the growing phase. The transaction acquires locks as needed. Once the transaction releases a lock, it enters the shrinking phase, and it can issue no more lock requests. Two-phase locking does not ensure freedom from deadlocks transactions T3 and T4 are two phase, but, in schedule 2 (Figure 15.7), they are deadlocked. The Two-Phase Locking Protocol (Cont.)

Cascading roll-back is possible under two-phase locking. Each transaction observes the two-phase locking protocol, but the failure of T5 after the read(A) step of T7 leads to cascading rollback of T6 and T7. The Two-Phase Locking Protocol (Cont.)

To avoid Cascading roll-back, follow a modified protocol called strict two-phase locking. Here a transaction must hold all its exclusive locks till it commits/aborts. Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this protocol transactions can be serialized in the order in which they commit. The Two-Phase Locking Protocol (Cont.)

Review Problem with example and solved with Two Phase Locking

Three classic problems Although transactions execute correctly, results may interleavein diff ways => 3 classic problems. • Lost Update • Uncommitted Dependency • Inconsistent Analysis

Lost Update problem t is a tuple in a table retrieved by both users in the course of both transactions. Transaction A loses an update at time 4. The update at t3 by transaction A is lost (overwritten) at t4 by B.

Uncommitted Dependency One trans is allowed to retrieve/update) a tuple updated by another, but not yet committed. Trans A is dependent at time t2 on an uncommitted change made by Trans B, which is lost on Rollback.

Inconsistent Analysis Initially: Acc 1 = 40; Acc2 = 50; Acc3 = 30; Trans A sees inconsistent DB state after B updated Accumulator => performs inconsistent analysis.

Why these problems? • Retrieve : ‘read’ (R) • Update : ‘write’ (W). • interleaving two transactions => 3 PBS: RR – no problem WW – lost update WR – uncommitted dependency RW – inconsistent analysis

How to prevent such problems? • IF risk of interference = low => concurrency control schemes ~ two-phase locking ~ common approach • although it requires deadlock avoidance!! • Lock applies to a tuple : • exclusive (write; X) or • shared(read; S).

Lost Update ‘solved’ No update lost but => deadlock

Uncommitted Dependency solved A is prevented from seeing change at time t2. Because transaction B acquires an X-lock on p at time t1, transaction A has to wait until transaction B has either committed or rollbacked. Transaction A is prevented from seeing an uncommitted change at time t2. A resolution of the first version of the problem. Second version is similar (check this!).

Inconsistent Analysis ‘solved’

Deadlock • Deadlock occurs when 2 or more transaction are in a simultaneous wait state. • It is desirable to conceal deadlocks from the user.

Deadlock Resolution • The system must detect and break deadlocks by: • Choosing one trans as a victimand rolling it back. • Timing out the trans and returning an error. • automatically restarting the transaction hoping not to get deadlock again. • Return an error code back to the victim and leaving it up to program to handle situation.

Implementation of Locking • A Lock manager can be implemented as a separate process to which transactions send lock and unlock requests • The lock manager replies to a lock request by sending a lock grant messages (or a message asking the transaction to roll back, in case of a deadlock) • The requesting transaction waits until its request is answered • The lock manager maintains a datastructure called a lock table to record granted locks and pending requests • The lock table is usually implemented as an in-memory hash table indexed on the name of the data item being locked

Thank you

PMIT-6102 Advanced Database Systems