ICS 214B: Transaction Processing and Distributed Data Management

ICS 214B: Transaction Processing and Distributed Data Management Lecture 11: Concurrency Control and Distributed Commits Professor Chen Li

Overview • Concurrency Control • Schedules and Serializability • Locking • Timestamp Control • Deadlocks Notes 11

In centralized db T1 T2 … Tn DB (consistency constraints) Notes 11

In distributed db T1 T2 Z Y X Notes 11

Concepts (similar to centralize db) Transaction: sequence of ri(x), wi(x) actions Conflicting actions: r1(A) w2(A) w1(A) w2(A) r1(A) w2(A) Schedule: represents chronological order in which actions are executed Serial schedule: no interleaving of actions or transactions Notes 11

Example constraint: X=Y X Y Node 1 Node 2 T1T2 1 (T1) a  X 5 (T2) c  X 2 (T1) X  a+100 6 (T2) X  2c 3 (T1) b  Y 7 (T2) d  Y 4 (T1) Y  b+100 8 (T2) Y  2d Precedence relation Notes 11

Precedence: intra-transactioninter-transaction Schedule S1 (node X) (node Y) 1 (T1) a  X 2 (T1) X  a+100 5 (T2) c  X 3 (T1) b  Y 6 (T2) X  2c 4 (T1) Y  b+100 7 (T2) d  Y 8 (T2) Y  2d If X=Y=0 initially, X=Y=200 at end Notes 11

Enforcing Serializability • Locking • Timestamp Ordering Schedulers Notes 11

Locking Rules in centralized db (2-phase locking) • Well-formed transactions • Legal schedulers • Two-phase transactions These rules guarantee serializable schedules Notes 11

# locks time Strict 2PL • Hold all locks until transaction commits • Called “Strict 2-phase locking” • Strict 2PL automatically avoids cascading rollbacks Notes 11

access & lock data access & lock data T (release all locks at end) Two-phase Locking in distributed db • Just like in a centralized system • But with multiple lock managers scheduler 1 scheduler 2 ... locks for D1 locks for D2 D1 D2 node 1 node 2 Notes 11

Replicated data T1 T2 scheduler 1 scheduler 2 ... locks for D1 locks for D2 X X node 1 node 2 Notes 11

Replicated data • Simplest scheme (read all, write all) • If T wants to read (write) data item X, T obtains read (write) locks for X at all sites that have X • Better scheme (Read one, write all) • If T wants to read X, T obtains read lock at any one site that has X • If T wants to write X, T obtains write locks at all sites that have X • More sophisticated schemes possible Notes 11

Timestamp Ordering Schedulers • Basic idea: - assign timestamp as transaction begins - if ts(T1) < ts(T2) … < ts(Tn), then scheduler produces history equivalent to T1,T2, ... Tn Notes 11

TO Rule If pi[x] and qj[x] are conflicting operations, then pi[x] is executed before qj[x] (pi[x] <S qj[x]) IFF ts(Ti) < ts(Tj) Notes 11

reject! abort T1 abort T1 abort T2 abort T2 Example: schedule S2 ts(T1) < ts(T2) (Node X) (Node Y) (T1) a  X (T2) d  Y (T1) X  a+100 (T2) Y  2d (T2) c  X (T1) b  Y (T2) X  2c (T1) Y  b+100 Notes 11

Strict T.O. • Lock written items until it is certain that writing transaction has been successful (avoid cascading rollbacks) Notes 11

abort T1 (T2) c  X (T2) X  2c Example Revisited ts(T1) < ts(T2) (Node X) (Node Y) (T1) a  X (T2) d  Y (T1) X  a+100 (T2) Y  2d (T2) c  X (T1) b  Y reject! delay abort T1 Notes 11

Enforcing T.O. • For each data item X: MAX_R[X]: maximum timestamp of a transaction that read X MAX_W[X]: maximum timestamp of a transaction that wrote X rL[X]: # of transactions currently reading X (0,1,2,…) wL[X]: # of transactions currently writing X (0 or 1) Notes 11

T.O. Scheduler - Part 1 ri[X] arrives IF (ts(Ti) < MAX_W[X]) THEN { ABORT Ti } ELSE { IF (ts(Ti) > MAX_R[X]) THEN MAX_R[X]  ts(Ti); IF (queue is empty AND wL[X] = 0) THEN { rL[X]  rL[X]+1; START READ OF X } ELSE add (r, Ti) to queue } Notes 11

T.O. Scheduler - Part 2 Wi[X] arrives IF (ts(Ti) < MAX_W[X] OR ts(Ti) < MAX_R[X]) { ABORT Ti } ELSE { MAX_W[X]  ts(Ti); IF (queue is empty AND wL[X]=0 AND rL[X]=0) { wL[X]  1; WRITE X; // WAIT FOR Ti TO FINISH } ELSE add (w, Ti) to queue } Notes 11

T.O. Scheduler - Part 3 When o finishes (o is r or w) on X oL[X]  oL[X] - 1; NDONE  TRUE WHILE NDONE DO { Let head of queue be (q, Tj); (smallest timestamp) IF (q=w AND rL[X]=0 AND wL[X]=0) { Remove (q,Tj); wL[X]  1; WRITE X; // WAIT FOR Tj TO FINISH } ELSE IF (q=r AND wL[X]=0) { Remove (q,Tj); rL[X]  rL[X] +1; START READ OF X } ELSE NDONE  FALSE } Notes 11

ts(T)=11  Starvation possible  If a transaction is aborted, it must be retried with a new, larger timestamp MAX_R[X]=10 T ts(T)=8 MAX_W[X]=9 read X . . . X . . . Notes 11

Theorem If S is a schedule representing an execution by a T.O. scheduler, then S is serializable Notes 11

Improvement: Thomas Write Rule MAX_R[X] MAX_W[X] ts(Ti) Ti wants to write X Notes 11

Change in T.O. Scheduler MAX_R[X] MAX_W[X] ts(Ti) Ti wants to write X When Wi[X] arrives IF ts(Ti)<MAX_R[X] THEN ABORT Ti ELSE IF (ts(Ti)<MAX_W[X]) { IGNORE THIS WRITE (tell Ti it was OK) } ELSE { process write as before… } Notes 11

2PL  TO: Example 1 T1: r1[X] r1[Y] w1[Z] ts(T1) < ts(T2) T2: w2[X] S: r1[X] w2[X] r1[Y] w1[Z] S could be produced with T.O. but not with 2PL Notes 11

2PL  TO: Example 2 T1: r1[X] r1[Y] w1[Z] ts(T1) < ts(T2) T2: w2[Y] S: r1[X] w2[Y] r1[Y] w1[Z] S could be produced with 2PL but not with TO Notes 11

Relationship between 2PL and TO Serializable schedules T.O. schedules 2PL schedules Notes 11

access data access data T Distributed T.O. Scheduler scheduler 1 scheduler 2 ... D1 ts cache D2 ts cache D1 D2 node 1 node 2 • Each scheduler is “independent” • At end of transaction, signal all schedulers involved to release all wL[X] locks Notes 11

Next: Deadlocks • If nodes use 2P locking, global deadlocks possible Local wait-for graph (WFG): no cycles T1 T2 T1 T2 Notes 11

Need to “combine” WFGs to discover global deadlock T1 T2 T1 T2 T1 T2 e.g., central detection node Notes 11

Deadlocks • Local vs. Global • Deadlock detection • Waits-for graph • Timeouts • Deadlock prevention • Wound-wait • Wait-die • Covered in ICS214A Notes 11

Summary • 2PL - the most popular - deadlocks possible - useful in distributed systems • T.O. - aborts more likely - no deadlocks - useful in distributed systems Notes 11

Next: • Reliable distributed database management • Dealing with failures • Distributed commit algorithms • The “two generals” problem Notes 11

Reliability • Correctness • Serializability • Atomicity • Persistence • Availability Notes 11

Types of failures • Processor failures • Halt, delay, restart, berserk, ... • Storage failures • atomic write, transient errors, disk crash • Communication (network) failures • Lost message, out-of-order messages, partitions Notes 11

Failure models • Cannot protect against everything • Unlikely failures (e.g., flooding in the Sahara) • Expensive to protect failures (e.g., earthquake) • Failures we know how to protect against (e.g., message sequence numbers; stable storage) Notes 11

Failure model: Desired Events Expected Undesired Unexpected Notes 11

Node models (1) Fail-stop nodes time perfect halted recovery perfect Volatile memory lost Stable storage ok Notes 11

Node models (2) Byzantine nodes A Perfect Perfect Arbitrary failure Recovery B C At any given time, at most some fraction f of nodes failed (typically f < 1/2 or f < 1/3) Notes 11

Network models (1) Reliable network - in-order messages - no spontaneous messages - timeout TD I.e., no lost messages, except for node failures Destination down (not paused) If no ack in TD sec. Notes 11

Variation of reliable net • Persistent messages • If destination down, net will eventually deliver message • Simplifies node recovery, but leads to inefficiencies • Just moves the problem one level lower down the stack • Not considered here Notes 11

Network models (2) Partitionable network - In order messages - No spontaneous messages - nodes can have different views of failures Notes 11

Scenarios • Reliable network • Fail-stop nodes • No data replication (1) • Data replication (2) • Partitionable network • Fail-stop nodes (3) Notes 11

No Data Replication • Reliable network, fail-stop nodes • Basic idea: node P controls X P net Item X - Single control point simplifies concurrency control and recovery - Note availability hit: if P down, X unavailable too! Notes 11

“P controls X” means - P does concurrency control for X - P does recovery for X Notes 11

Say transaction T wants to access X: req PT is a process that represents T at this node PT Local DMBS X Lock mgr LOG Notes 11

Distributed commit problem . Transaction T Action: a1,a2 Action: a3 Action: a4,a5 Commit must be atomic Notes 11

Distributed commit problem • Commit must be atomic • Solution: Two-phase commit (2PC) • Centralized 2PC • Distributed 2PC • Linear 2PC • Many other variants… Notes 11

ICS 214B: Transaction Processing and Distributed Data Management