Database Principles

Database Principles College of Computer Science and Technology Chongqing University of Posts & Telecom.

Transactions • Serializability & Atomicity • Scheduling & Concurrency Control • Recovery

Transactions • Database systems are normally being accessed by many users or processes at the same time. (e.g. Airline reservations) • Both queries and modifications. • Unlike Operating Systems, which support interaction of processes, a DMBS needs to keep processes from troublesome interactions. (e.g. Banking)

Transactions • The solution to the problems of serialization and atomicity is to group database operations into transactions. • A transaction is a collection of one or more operations on the database that must be executed atomically, that is, either all operations are performed or none are. All-or-None原则

ACID Transactions • A DBMS is expected to support “ACID transactions”, which are: • Atomic: Either the whole process is done or none is. • Consistent: Database constraints are preserved. • Isolated: It appears to the user as if only one process executes at a time. • Durable: Effects of a process do not get lost if the system crashes.

Transactions in SQL • SQL supports transactions, often behind the scenes. • Each statement issued at the generic query interface is a transaction by itself. • In programming interfaces like Embedded SQL, a transaction begins the first time an SQL statement is executed and ends with the program or an explicit end.

COMMIT • The SQL statement COMMIT causes a transaction to complete. • It’s database modifications are now permanent in the database.

ROLLBACK • The SQL statement ROLLBACK also causes the transaction to end, but by aborting. • No effects on the database. • Failures like division by 0 can also cause rollback, even if the programmer does not request it.

Problems hiding in concurrent operations • Lose update （丢失修改） • Dirty Read （读脏数据） • Unrepeatable read（不可重复读）

Example: Lose update • You and your spouse each take $100 from different ATM’s at about the same time. • The DBMS better make sure one account deduction doesn’t get lost. • Compare: An OS allows two people to edit a document at the same time. If both write, one’s changes get lost.

T1 读A=16 A=A-1 写回A＝15 T2 读A=16 A=A-1 写回A＝15 如：并发操作引起的丢失修改 • 事务T1对数据的修改被事务T2的修改覆盖

Example: Dirty read • Dirty data is common term for data written by a transaction that has not yet committed. • A dirty read is a read of dirty data written by another transaction.

如：并发操作引起的读脏数据 T1 读C=1 C=C*2 写回C=2 ROLLBACK C恢复为1 T2 读C=2 • 事务T1 修改了某数据并写回磁盘，事务T2 读取了同一数据后，T1由于某种原因被撤销，被修改的值复原，此时T2读到的数据与数据库中的数据不一致

Example: Unrepeatable Read • The same query issued more than once is not quite guaranteed to get the same answer. • It is possible that a second or subsequent execution of the same query will retrieve phantom（幻影） tuples.

如：并发操作引起的不可重复读 T1 读A=1,B=2 求A+B=3 读A=1,B=4 求A+B=5 T2 读B=2 B=B*2 写回B=4 • 事务T1读取某一数据后，事务T2对其做了修改，当T1按同样条件再读时得到不同的值 • 事务T1读取某些数据后，事务T2删除(或插入)了一些记录，当T1按同样条件再读时发现少(或多)了一些记录

分析 • 产生上述三类问题的主要原因 • 并发操作影响了事务的隔离性，事务间相互干扰 • 并发控制的主要技术 • 封锁技术 (Locking) ★ • 时间戳技术(Timestamp) • 其他

Isolation Levels of Transactions • Isolation levels = choices about what interactions are allowed by transactions that execute at about the same time. • How a DBMS implements these isolation levels is highly complex, and a typical DBMS provides its own options.

Isolation Levels of Transactions Total 4 levels: • Serializable (all or none) • SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; • Read-uncommitted (dirty reads allowed) • SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; • Read-committed (only committed update can be seen) • SET TRANSACTION ISOLATION LEVEL READ COMMITTED; • Repeatable-read (tuples reappear when query repeated) • SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

Summary • Transaction • Properties of Transaction • Atomic, Consistent, Isolated, Durable. • Isolation Levels of Transactions

Transaction Scheduling &Concurrency Control

Transaction Scheduling • Schedules – sequences that indicate the chronological order in which instructions of concurrent transactions are executed • a schedule for a set of transactions must consist of all instructions of those transactions • must preserve the order in which the instructions appear in each individual transaction. • Serial Schedule – instruction sequences from one by one transactions

Serial Scheduling(1) • Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. • Initial status : A ($1,000), B($2,000)  total : $3,000 1) T1  T2 2) T2  T1 • A: 1,000  950  855 • B: 2,000  2,050  2,145 • 3,000  3,000 • A: 1,000  900  850 • B: 2,000  2,100  2,150 • 3,000  3,000 Database

Serial Scheduling(2) 1,000 1,000 3) T1 T2 T1 T24) T1  T2  T1  T2 1,000-50 1,000 950 950 1,000-50 950 - 95 900 865 2,000 2,000 950 2,000+50 2,000 2,050 2,000-100 2,050 2,050 2,050+95 2,050 2,100 • A: 1,000  950  855 • B: 2,000  2,050  2,145 • 3,000  3,000 • A: 1,000  900  950 • B: 2,000  2,050  2,100 • 3,000  3,050 Database

This is a protocol which ensures serializable schedules. Phase 1: Growing Phase transaction may obtain locks transaction may not release locks Phase 2: Shrinking Phase transaction may release locks transaction may not obtain locks The protocol assures serializability. It can be proved that the transactions can be serialized in the order of their lock points (i.e. the point where a transaction acquired its final lock). The Two-Phase Locking Protocol(2PL)

T1 Time T2 lock(x) read(x) x  x + 100 write(x) lock(y) unlock(x) read(y) y y + 100 write(y) unlock(y) lock(x) read(x) x  x * 2 write(x) lock(y) unlock(x) read(y) y y * 2 write(y) unlock(y) Example - 2PL scheduling T2: X = X * 2 Y = Y * 2 T1: X = X + 100 Y = Y + 100 Growing Phase Growing Phase Shrinking Phase Shrinking Phase

Two Phase lock(2PL) • There can be conflict serializable schedules that cannot be obtained if two-phase locking is used. • Other Concurrent Control protocol • Timestamp-Ordering • Optimistic • Multiversion Serializable schedules Serializable schedules by 2PL Serializable schedules by other CC protocol

Recovery

Example of Data Access Buffer Database Buffer Block A input(A) x Buffer Block B Y A output(B) B read(X) write(Y) x2 x1 y1 work area of T2 work area of T1 Memory DISK

A log is kept on stable storage. The log is a sequence of log records, and maintains a record of update activities on the database. Log Records When transaction Tistarts, it registers itself by writing a <Tistart> log record Before Tiexecutes write(X), a log record <Ti, X, V1, V2>is written, where V1 is the value of X before the write, andV2is the value to be written to X. Log record notes that Ti has performed a write on data item X. X had value V1before the write, and will have value V2after the write. When Tifinishes it last statement, the log record <Ticommit> is written. We assume for now that log records are written directly to stable storage (that is, they are not buffered) Recovery procedure has two operations instead of one: undo(Ti) restores the value of all data items updated by Tito their old values, going backwards from the last log record for Ti redo(Ti) sets the value of all data items updated by Tito the new values, going forward from the first log record for Ti Log-Based Recovery

Example of Data Access Buffer Database Buffer Block A input(A) x Buffer Block B Y A output(B) B read(X) write(Y) Log file x2 Log records <T1, Y, Y0, Y1> x1 y1 work area of T2 work area of T1 Memory DISK

Example - log • example transactions T0and T1(T0executes before T1): T0: read (A) T1: read (C) A = A - 50C = C - 100 Write (A) write (C) read (B) B = B + 50 write (B)

Example - recovery Below we show the log as it appears at three instances of time. Recovery actions in each case above are: • undo (T0): B  2000 and A  1000 • undo (T1) and redo (T0): • Redo (T0) : A 950 and B 2050 • Undo (T1) : C  700 • redo (T0) and redo (T1): • Redo(T0,,T1 ) A 950 and B 2050, C 600 Failure Failure Failure

Problems in recovery procedure as discussed earlier : searching the entire log is time-consuming we might unnecessarily redo transactions which have already output their updates to the database. Streamline recovery procedure by periodically performing checkpointing Output all log records currently residing in main memory onto stable storage. Output all modified buffer blocks to the disk. Write a log record < checkpoint> onto stable storage. Checkpoints

During recovery we need to consider only the most recent transaction Ti that started before the checkpoint, and transactions that started after Ti. Scan backwards from end of log to find the most recent <checkpoint> record Continue scanning backwards till a record <Ti start> is found. Need only consider the part of log following above start record. Earlier part of log can be ignored during recovery, and can be erased whenever desired. For all transactions (starting from Ti or later) with no <Ticommit>, execute undo(Ti). (Done only in case of immediate modification.) Scanning forward in the log, for all transactions starting from Tior later with a <Ticommit>, execute redo(Ti). Checkpoints (Cont.)

T1 can be ignored (updates already output to disk due to checkpoint) Redo T2 and T3. Undo T4 Example of Checkpoints T1, T2, T3, T4…. Tf Tc T1 T2 T3 T4 system failure checkpoint

Architecture of DBMS(p.11)

Architecture of ORACLE

Introduction of Commercial Database Management System

CRM ERP Data Warehouse MIS, ERP OLTP EDPS Database Database Database Database Modern Database World • Most (all?) computing applications use some type of a database • All data are handled by DBMS

Database Companies in the World First commercial system Main memory DBMS Free Software, Microsoft UNIX, open system

Thank you!!

Database Principles