16 concurrency control and recovery only for dbs with updates review
Download
1 / 53

- PowerPoint PPT Presentation


  • 229 Views
  • Uploaded on

16. Concurrency Control and Recovery (only for DBs with updates…..!)- Review. Concurrency Control Transaction ACID Isolation Schedules Guaranteeing isolation Serializability Serializability ⇔ Isolation Locking Strict Two Phase Locking Strict 2PL ⇒ Serializable. Learning Objectives.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - benjamin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
16 concurrency control and recovery only for dbs with updates review l.jpg
16. Concurrency Control and Recovery(only for DBs with updates…..!)- Review

  • Concurrency Control

    • Transaction

    • ACID

    • Isolation

      • Schedules

      • Guaranteeing isolation

        • Serializability

        • Serializability ⇔ Isolation

        • Locking

        • Strict Two Phase Locking

        • Strict 2PL ⇒Serializable


Learning objectives l.jpg
Learning Objectives

  • Define ACID, schedule, isolated, equivalent, serializable, S2PL, conflict serializable, precedence graph, recoverable.

  • Know the implications on slide 25 and when the converses hold

  • Explain lock management, multiple granularity locks, phantoms, locking in BTrees, optimistic concurrency control


Example transaction l.jpg
Example Transaction

  • Transfer $100 from A to B

    • Read A; Verify A; Write A-100; then

    • Read B; Verify B; Write B+100

  • Are all 6 steps necessary?

  • Which steps require disc access?

  • When can an abort occur without damage?

    • Write is as in a program’s write

  • What damage can an abort cause?

  • How can you avoid such damage?


Transaction cont l.jpg
Transaction (cont.)

User (application developer) must indicate:

  • Begin transaction

  • read/write/modify statements intermixed with other programming language statements

  • plus either

    • commit - indicates successful completion or

    • abort - indicates program wants to roll back (erase the transaction)

  • All or nothing! (Atomic)


  • Supporting the acid properties of transactions l.jpg
    Supporting the ACID Properties of Transactions

    Recovery

    System

    • Atomicity: All actions in a transaction happen in their entirety or not at all.

    • Consistency: If the DB starts in a consistent state, (this notion is defined by the user; some of it may be enforced by integrity constraints) and if a transaction executes with no other queries active, then the DB ends up in a consistent state.

    • Isolation: Each transaction is isolated from other transactions. The effect on the DB is as if each transaction executed by itself.

    • Durability: If a transaction commits, its changes to the database state persist.

    Programmers

    Concurrency

    Control

    System

    Recovery

    System


    Isolation concurrency control l.jpg
    Isolation/Concurrency Control

    T1: BEGIN A+=100, B-=100 END

    T2: BEGIN A=1.06*A, B=1.06*B END

    • What is each of these transactions doing?

    • A schedule of T1 and T2 is an interleaving of the steps of these transactions so that each transaction’s order is preserved.


    Which of these is a schedule of t1 and t2 l.jpg
    Which of these is a Schedule of T1 and T2?

    T1: A+=100, B-=100

    T2: A=1.06*A, B=1.06*B

    T1: A+=100, B-=100

    T2: A=1.06*A, B=1.06*B

    T1: A+=100, B-=100

    T2: A=1.06*A, B=1.06*B

    T1: A+=100, B-=100

    T2: B=1.06*B, A=1.06*A

    T1: A+=100, B-=100

    T2: A=1.06*A,B=1.06*B


    Isolated schedules l.jpg
    Isolated Schedules

    • A schedule is isolated if its effect on the DB is as if each transaction executed by itself, serially.

    • Which of the schedules on the next page is isolated?

    • Hint: Calculate the effect of the schedule on a sample state of the DB, for example A has $1,000, B has $500. This won’t tell you the effect on all states, but it’s helpful information.


    Which schedules are isolated l.jpg
    Which Schedules are Isolated?

    T1: A+=100, B-=100

    T2: A=1.06*A, B=1.06*B

    T1: A+=100, B-=100

    T2: A=1.06*A, B=1.06*B

    • Goal of Concurrency Control subsystem: Guarantee only isolated schedules.

    T1: A+=100, B-=100

    T2: A=1.06*A, B=1.06*B

    T1: A+=100, B-=100

    T2: A=1.06*A,B=1.06*B

    T1: A+=100,B-=100

    T2: A=1.06*A, B=1.06*B


    Equivalent schedules l.jpg
    Equivalent Schedules

    • Two schedules are equivalent if given any starting DB state, they produce the same result.

    • Which of these schedules is equivalent?

    T1: A+=100, B-=100

    T2: A=1.06*A, B=1.06*B

    T1: A+=100, B-=100

    T2: A=1.06*A, B=1.06*B

    T1: A+=100, B-=100

    T2: A=1.06*A, B=1.06*B

    T1: A+=100, B-=100

    T2: A=1.06*A,B=1.06*B


    Serializable schedules l.jpg
    Serializable Schedules

    • A schedule is serializable if it is equivalent to a serial schedule.

    • Which of these schedules is serializable?

    T1: A+=100, B-=100

    T2: A=1.06*A, B=1.06*B

    T1: A+=100, B-=100

    T2: A=1.06*A, B=1.06*B

    T1: A+=100, B-=100

    T2: A=1.06*A, B=1.06*B

    T1: A+=100, B-=100

    T2: A=1.06*A,B=1.06*B


    The goal of concurrency control l.jpg
    The goal of Concurrency Control

    • Recall the goal of concurrency control: To ensure that all schedules are isolated

    • Theorem: A schedule is Serializable⇔ it is Isolated

      ⇒: Serializable ⇒ equivalent to some serial schedule, and in a serial schedule, each Xact. is isolated

      ⇐: If each xact runs alone, the schedule must be serial

    • But serializability is hard to verify

      • How can we, in real time, check each schedule?

    • So the Concurrency Control Subsystem needs more work.


    Locking l.jpg
    Locking

    • Transaction must get a lock – before it can read or update data

    • There are two kinds of locks: shared (S) locks and exclusive (X) locks

    • To read a record you MUST get an S lockTo modify or delete a record you MUST get an X lock

    • Lock info maintained by a “lock manager”


    How locks work l.jpg

    S

    X

    --

    ok

    ok

    ok

    --

    S

    ok

    ok no

    X

    ok nono

    How Locks Work

    • If a Xact has an S lock on a data object , new transactions can get S locks on that object, but not X locks.

    • If a Xact has an X lock, no other Xact can get any lock (S or X) on that data object.

    • If a transaction can’t get a lock, it waits (in a queue).

    lock on data item

    lock you want

    Lock compatibility


    Strict two phase locking protocol s2pl l.jpg
    Strict Two Phase Locking Protocol (S2PL)

    Strict 2PL is a way of managing locks during a transaction

    • A Xact gets (S and X) locks gradually, as needed

    • The Xact holds all locks until end of transaction (commit/abort)

    All locks

    are released

    at the end,

    upon commit or abort

    5

    # of locks

    held by a

    transaction T

    4

    3

    2

    1

    0

    time 


    Strict 2pl guarantees serializability l.jpg
    Strict 2PL guarantees serializability

    • Idea of the Proof: a Strict 2PL schedule is equivalent to the serial schedule in which each transaction runs instantaneously at the time that it commits

    • This is huge: A property of each transaction (S2PL) implies a property of any set of transactions (serializability)

      • No need to check serializability of any schedules

    • Real DBMSs use S2PL to enforce serializability

    • In reality, users can and do choose lower levels of concurrency for all but the most sensitive transactions


    17 concurrency control l.jpg
    17. Concurrency Control

    • Locks

      • Management

      • Deadlocks

        • Waits-for

      • Multiple Granularity

    • Phantoms

      • Predicate, Index locking

  • Locking in B+ Trees

  • Optimistic CC

    • Inefficiency of locking

    • Optimistic CC idea

    • Conflicts

      • Conflicting Actions

      • Conflict Equivalent

      • Conflict Serializable

      • Conf. Ser. ⇒ Serializable

    • Precedence Graph

      • Conf. Serializable ⇔Precedence graph is acyclic

    • Strict 2PL ⇒Recoverable

    • 2PL ¬⇒ Recoverable


    17 1 conflict serializable schedules l.jpg

    17. CC

    17.1 Conflict Serializable Schedules

    • Conflicting actions: Actions that access the same data and at least one of which is a write

      • Note that changing the order of these two actions might yield different results.

    • Two schedules are conflict equivalent if:

      • They involve the same actions of the same transactions in the same order

      • Every pair of conflicting actions is ordered the same way

    • Schedule S is conflict serializable if S is conflict equivalent to some serial schedule


    Which are conflict serializable l.jpg

    17. CC

    Which are conflict serializable?

    T1: R(A),W(A), R(B),W(B)

    T2: R(A),W(A), R(B),W(B)

    T1: R(A),W(A)

    T2: R(A), W(A), R(B)

    T1: R(B),W(A), W(B)

    T2: R(A), W(A), R(B)

    T1: R(A), W(A)

    T2: W(A)

    T3: W(A)


    Conflict serializable serializable l.jpg
    Conflict Serializable ⇒ Serializable

    • If two actions do not conflict, then commuting them results in an equivalent schedule.

    • Suppose S is conflict serializable. Then there is a sequence of commuting actions I = {I1,…,In} so that

      • Each of the Ii commutes nonconflicting actions

      • I applied to S is a serial schedule

    • Because of (a), I does not change the state of any database. Thus S, and I applied to S, are equivalent and I applied to S is serial (b), so S is serializable.


    Serializable does not imply conflict serializable l.jpg
    Serializable does NOT imply Conflict Serializable

    T1: R(A), W(A)

    T2: W(A)

    T3: W(A)

    • Equivalent to what serial schedule?

      • Therefore it is a serializable schedule

    • Why is it not conflict serializable? (for now just give an intuitive reason, later we will have a proof)


    Precedence graphs l.jpg

    17. CC

    A

    T1

    T2

    Precedence graph

    B

    Precedence graphs

    • Why is this graph not conflict serializable?

    • The cycle in the graph illustrates the problem. T1 must precede T2, and T2 must precede T1, in any conflict equivalent serial schedule.

    T1: R(A), W(A), R(B), W(B)

    T2: R(A), W(A), R(B), W(B)


    Precedence graph l.jpg

    17. CC

    Precedence Graph

    • Precedence graph: One node per Xact; edge from Ti to Tj if an action of Ti precedes and conflicts with an action of Tj.

    • Theorem: Schedule is conflict serializable if and only if its precedence graph is acyclic

      • ⇒If there is a cycle in the graph, it cannot be serializable (see previous page & generalize)

      • ⇐ If the graph is acyclic, the schedule is equivalent to a topologically sorted order of the actions.


    Example of acyclic graph l.jpg

    T1

    T2

    T4

    T3

    Example of acyclic graph

    • Is this graph acyclic?

    • What is a topological sort of it?

    • Is a schedule, for which this is a precedence graph, equivalent to a serial schedule?

      • Can we move all actions of T4 to occur before T2, without reversing conflicting actions?

      • How about T1 before T4?


    Summary l.jpg
    Summary

    Isolated Xact: same results as if it ran alone

    Each Xact in a schedule is Isolated

    

    Serializable Schedule: Same result as a serial schedule

    The schedule is Serializable

    The schedule is Conflict Serializable

    Conflict Serializable : Conflict Equivalent to a Serializable Schedule

    

    The Schedule’s Precedence Graph is Acyclic

    The schedule is consistent with Strict 2PL.

    Strict 2PL: There is a locking schedule where all locks are held until EOT

    Deadlock: There is a cycle in the Waitsfor graph.

    Deadlock is possible


    Strict 2pl recoverable l.jpg
    Strict 2PL⇒Recoverable

    • A schedule is recoverable if, during it, all transactions commit only after all transactions whose data they have read commit.

    • Why is recoverability desirable? Otherwise, T1 may read the data of T2 ( a so-called dirtyread), then T1 commit, then T2 abort and roll back. Then T1 has read a value that does not exist.


    Two phase locking 2pl l.jpg

    17. CC

    Two-Phase Locking (2PL)

    • Two-Phase Locking Protocol

      • Each Xact must obtain a S (shared) lock on object before reading, and an X (exclusive) lock on object before writing.

      • A transaction can not request additional locks once it releases any locks.

    • 2PL implies that all schedules have acyclic precedence graphs, so are serializable.

    • However, they are not recoverable, so Strict 2PL is used in practice.


    Lock management l.jpg

    17. CC

    Lock Management

    • Lock and unlock requests are handled by the lock manager

    • Lock table entry:

      • IDs of transactions currently holding a lock

      • Type of lock held (shared or exclusive)

      • Pointer to queue of lock requests

        • If there is an S lock on an object O and T1 requests an X lock, what happens? What if then T2 requests an S lock?

    • Locking and unlocking have to be atomic operations

      • How is this enforced?

    • Lock upgrade: transaction that holds a shared lock can be upgraded to hold an exclusive lock if no one else has a shared lock.


    Slide29 l.jpg

    Managing a new lock (simplified)

    New Lock

    Type of lock?

    S

    X

    Y

    Queue empty?

     lock?

    EnQ

    N

    N

    Y

    Grant X lock

    EnQ

     lock?

    N

    Y

    Grant S lock

    S

    Type of lock?

    X

    EnQ


    Slide30 l.jpg

    Managing a lock release (simplified)

    Release Lock

    Y

    Exit

     Other locks?

    N

    DeQ xact from Q, give it a lock

    Y

    Is there an S lock on top of the Q?

    S

    What type of lock was it?

    N

    X

    Exit


    Deadlocks l.jpg

    17. CC

    Deadlocks

    • Deadlock: Cycle of transactions waiting for locks to be released by each other.

    • Two ways of dealing with deadlocks:

      • Deadlock prevention

      • Deadlock detection


    Deadlock prevention l.jpg

    17. CC

    Deadlock Prevention

    • Theory

      • Assign priorities based on timestamps.

        • Older transactions get higher priority.

      • Assume Ti wants a lock that Tj holds. Two policies are possible:

        • Wait-Die: It Ti has higher priority, Ti waits for Tj; otherwise Ti aborts

        • Wound-wait: If Ti has higher priority, Tj aborts; otherwise Ti waits

      • If a transaction re-starts, make sure it has its original timestamp

    • Practice: http://dev.mysql.com/doc/refman/5.0/en/innodb-deadlocks.html


    Deadlock detection l.jpg

    17. CC

    Deadlock Detection

    • Create a waits-for graph:

      • Nodes are transactions

      • There is an edge from Ti to Tj if Ti is waiting for Tj to release a lock

    • Periodically check for cycles in the waits-for graph

    • Note that waits-for graph is opposite direction of precedence graph.


    Deadlock detection continued l.jpg

    17. CC

    T1

    T2

    T4

    T3

    Deadlock Detection (Continued)

    Example:

    T1: S(A), R(A), S(B)

    T2: X(B),W(B) X(C)

    T3: S(C), R(C) X(A)

    T4: X(B)

    T1

    T2

    T4

    T3


    Multiple granularity locks l.jpg

    17. CC

    Database

    Tables

    Pages

    Tuples

    Multiple-Granularity Locks

    • Hard to decide what granularity to lock (tuples vs. pages vs. tables).

    • Shouldn’t have to decide!

    • Data “containers” are nested:

    contains


    Solution new lock modes protocol l.jpg

    17. CC

    IS

    IX

    S

    X

    --

    Ö

    Ö

    Ö

    Ö

    Ö

    --

    IS

    Ö

    Ö

    Ö

    Ö

    IX

    Ö

    Ö

    Ö

    S

    Ö

    Ö

    Ö

    Ö

    X

    Solution: New Lock Modes, Protocol

    • Allow Xacts to lock at each level, but with a special protocol using new “intention” locks:

    • Before locking an item, Xact must set “intention locks” on all its ancestors.

    • IX(IS): Intend to X(S) lock a subset.

    • SIX: S & IX at the same time. Used to scan and update selected records.


    Multiple granularity lock protocol l.jpg

    17. CC

    Multiple Granularity Lock Protocol

    • Each Xact starts from the root of the hierarchy.

    • To get S or IS lock on a node, must hold IS or IX on parent node.

    • To get X or IX or SIX on a node, must hold IX or SIX on parent node.

    • Must release locks in bottom-up order.

    • Sometimes hard to decide granularity of locks. Can start small and use lock escalation.


    Examples l.jpg

    17. CC

    IS

    IX

    S

    X

    --

    Ö

    Ö

    Ö

    Ö

    Ö

    --

    IS

    Ö

    Ö

    Ö

    Ö

    IX

    Ö

    Ö

    Ö

    Ö

    S

    Ö

    Ö

    Ö

    X

    Examples

    • T1 scans R, and updates a few tuples:

      • T1 gets an SIX lock on R, then repeatedly gets an S lock on tuples of R, and occasionally upgrades to X on the tuples.

    • T2 uses an index to read only part of R:

      • T2 gets an IS lock on R, and repeatedly gets an S lock on tuples of R.

    • T3 reads all of R:

      • T3 gets an S lock on R.

      • OR, T3 could behave like T2; can use lock escalation to decide which.


    Dynamic databases phantoms l.jpg

    17. CC

    Dynamic Databases: Phantoms

    • If we allow updates, even Strict 2PL will not assure serializability:

      • T1 finds oldest sailor in each rank

      • T2 inserts(Rohi,1,27) and deletes John

      • Schedule is

    • This schedule is Strict 2PL, but not serializable!

      • Result of this schedule is (1,Pehr)(2,Lorr)

      • Result of T1;T2 is (1,Pehr)(2,John)

      • Result of T2;T1 is (1,Rohi)(2,Lorr)

    T1 rank 1 T1 rank 2

    T2 inserts Rohi, deletes John


    The problem l.jpg

    17. CC

    The Problem

    • When T1 retrieved the oldest sailor of rank 1, it locked each sailor of rank 1 with a read lock.

    • None of these locks applied to the new record (a phantom) inserted by T2.

    • We need a mechanism to prevent phantoms; to allow T1 to lock present and future sailors with rank 1.

    • There are two such mechanisms, index locking and predicate locking.


    Index locking l.jpg

    Data

    Index Locking

    Index

    r=1

    • If there is a dense index on the rating field using Alternative (2), T1 should lock the index page(s) containing the data entries with rating = 1.

      • If there are no records with rating = 1, T1 must lock the index page where such a data entry would be, if it existed!

    • If there is no suitable index, T1 must lock all pages, and lock the file/table to prevent new pages from being added, to ensure that no new records with rating = 1 are added.


    Predicate locking l.jpg

    17. CC

    Predicate Locking

    • Grant lock on all records that satisfy some logical predicate, e.g. age > 2*salary.

    • Index locking is a special case of predicate locking for which an index supports efficient implementation of the predicate lock.

      • What is the predicate in the sailor example?

    • In general, predicate locking has a lot of locking overhead.


    Locking in b trees l.jpg

    17. CC

    Locking in B+ Trees

    • How can we efficiently lock a B+ tree?

      • Btw, don’t confuse this with multiple granularity locking!

    • One solution: Ignore the tree structure, just lock pages while traversing the tree, following 2PL.

    • This has terrible performance!

      • Root node (and many higher level nodes) become bottlenecks because every tree access begins at the root. This single threads all updates to the tree.


    Two useful observations l.jpg

    17. CC

    Two Useful Observations

    • Higher levels of the tree only direct searches for leaf pages.

    • For inserts/deletes, a node on a path from root to modified leaf must be locked (in X mode, of course), only if a split can propagate up to it from the modified leaf.

    • We can exploit these observations to design efficient locking protocols that guarantee serializability even though they violate 2PL.


    A tree locking algorithm l.jpg

    17. CC

    A Tree Locking Algorithm

    • Search: Start at root and go down; repeatedly, S lock child then unlock parent.

    • Insert/Delete: Start at root and go down, obtaining X locks as needed. Once child is locked, check if it is safe:

      • If child is safe, release all locks on ancestors.

    • Safe node: Node such that the change will not propagate up beyond this node.

      • Inserts: Node is not full.

      • Deletes: Node is not half-empty.


    Example l.jpg

    ROOT

    Do:

    1) Search 38*

    2) Delete 38*

    3) Insert 45*

    4) Insert 25*

    A

    20

    Example

    35

    B

    35

    23

    38 44

    35

    35

    F

    C

    23

    38

    44

    H

    G

    I

    D

    E

    20*

    22*

    23*

    24*

    35*

    36*

    38*

    41*

    44*


    Optimistic cc kung robinson l.jpg

    17. CC

    Optimistic CC (Kung-Robinson)

    • Locking is a conservative approach in which conflicts are prevented. Disadvantages:

      • Lock management overhead.

      • Deadlock detection/resolution.

      • Lock contention for heavily used objects.

    • If conflicts are rare, we might be able to gain concurrency by not locking, and instead checking for conflicts before Xacts commit.

    • A version of this optimistic approach is used by PostgreSQL and Oracle


    Kung robinson model l.jpg

    17. CC

    Kung-Robinson Model

    • Xacts have three phases:

      • READ: Xacts read from the database, but make changes to private copies of objects.

      • VALIDATE: Check for conflicts.

      • WRITE: Make local copies of changes public.

    old

    modified

    objects

    ROOT

    new


    Validation l.jpg

    17. CC

    Validation

    • Each Xact is assigned a numeric id.

      • Just use a timestamp.

    • Xact ids assigned at end of READ phase, just before validation begins.

    • ReadSet(Ti): Set of objects read by Xact Ti.

    • WriteSet(Ti): Set of objects modified by Ti.


    Test 1 l.jpg

    17. CC

    Test 1

    • For all i and j such that TSi < TSj, check that Ti completes before Tj begins.

    Ti

    Tj

    R

    V

    W

    R

    V

    W


    Test 2 l.jpg

    17. CC

    Test 2

    • For all i and j such that Ti < Tj, check that:

      • Ti completes before Tj begins its Write phase +

      • WriteSet(Ti) ReadSet(Tj) is empty.

    Ti

    R

    V

    W

    Tj

    R

    V

    W

    Does Tj read dirty data? Does Ti overwrite Tj’s writes?


    Test 3 l.jpg

    17. CC

    Ti

    R

    V

    W

    Test 3

    • For all i and j such that Ti < Tj, check that:

      • Ti completes Read phase before Tj does +

      • WriteSet(Ti) ReadSet(Tj) is empty +

      • WriteSet(Ti) WriteSet(Tj) is empty.

    Tj

    R

    V

    W

    Does Tj read dirty data? Does Ti overwrite Tj’s writes?


    Overheads in optimistic cc l.jpg

    17. CC

    Overheads in Optimistic CC

    • Must record read/write activity in ReadSet and WriteSet per Xact.

      • Must create and destroy these sets as needed.

    • Must check for conflicts during validation, and must make validated writes ``global’’.

      • Critical section can reduce concurrency.

      • Scheme for making writes global can reduce clustering of objects.

    • Optimistic CC restarts Xacts that fail validation.

      • Work done so far is wasted; requires clean-up.


    ad