Review concurrency
This presentation is the property of its rightful owner.
Sponsored Links
1 / 52

Review Concurrency PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on
  • Presentation posted in: General

Review Concurrency. Enterprise Systems architecture and infrastructure DT211 4 . Desirable Properties of Transactions. 1. Atomicity : A transaction is an atomic unit of processing; it is either performed in its entirety or not performed at all.

Download Presentation

Review Concurrency

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Review concurrency

Review Concurrency

Enterprise Systems architecture and infrastructure

DT211 4


Desirable properties of transactions

Desirable Properties of Transactions

  • 1. Atomicity: A transaction is an atomic unit of processing; it is either performed in its entirety or not performed at all.

  • 2. Consistency preservation: A correct execution of the transaction must take the database from one consistent state to another.

  • 3. Isolation: A transaction should not make its updates visible to other transactions until it is committed; this property, when enforced strictly, solves the temporary update problem and makes cascading rollbacks of transactions unnecessary.

  • 4. Durability or permanency: Once a transaction changes the database and the changes are committed, these changes must never be lost because of subsequent failure.


Why concurrency control

Why Concurrency Control?

  • A Transaction: logical unit of database processing that includes one or more access operations (read ­retrieval, write ­ insert or update, delete).

  • Concurrency control is used to ensure the isolation property of concurrently executing transactions via protocols such as locking, timestamping, optimistic concurrency control…


Concurrency problems

Concurrency problems

  • 2 general problems can occur when there is no proper concurrency control:

    • Lost update:

    • Temporary update (dirty read)

  • Essentially they both break the isolation property of database transactions

    • Make updates visible to other transactions before they are committed to the database

    • There is a conflict in the schedule


Example 1 lost updates

Example 1: Lost Updates

Transactions

  • User X: Updating Customer A account with withdrawal of $50

  • User Y: Updating Customer A account with deposit of $25

  • Customers balance should be (100 –50 + 25 = 75)

    USER XUSER Y

    1Read Cust A record (Balance = $100)1 Read Cust A record(Balance = $100)

    2 Bal = Bal - 50 (Balance = $50)2 Bal = Bal + 25 (Balance = $125)

    3 Write Cust A record (Balance = $50)3 Write Cust A record (Balance = $125)

Time


Example 3 temporary update problem

Example 3:Temporary Update Problem

Transactions

  • A update to a product that that falters

  • A product update that completes

    USER X USER Y

    1Read Prod A record(QOH = 35)

    2 Update QOH (+100)

    3 Write Prod A record (QOH = 135)1 Read Prod A QOH (QOH = 135)

    2 Update QOH (-30) (QOH = 105)

    4 Failure: Rollback

    3 Write Prod A QOH (QOH = 105)

    Begin Recovery…4 ….commit;


Schedule conflict

Schedule Conflict

  • Two operations in a schedule are said to conflict if they satisfy all three of the following conditions:

  • 1) They belong to different transactions

  • 2) They access the same item X.

  • 3) At least one of the operation is a write_item(X).

  • Non conflicting schedules are classes as serial or more serialisibility (equivalent to serial schedule)

  • A serializable schedule gives the benefits of concurrency without giving up correctness.


Classification of concurrency control techniques

CLASSIFICATION OF CONCURRENCY CONTROL TECHNIQUES

  • 1. Locking data items to prevent multiple transactions from accessing the items concurrently; a number of locking protocols have been proposed.

  • 2. Use of timestamps. A timestamp is a unique identifier

  • for each transaction, generated by the system.

  • 3. Optimistic Concurrency Control: based on the concept of validation or certification of a transaction after it executes its operations; these are sometimes called optimistic protocols. They proceed optimistically; back up and repair if needed

  • 4. Pessimistic protocol: do not proceed until knowing that no back up is needed.


Two phase locking

Two-Phase Locking

  • transaction divided into 2 phases:

  • – growing - new locks acquired but none released

  • – shrinking - existing locks released but no new ones acquired


Two phase locking cont

Two-Phase Locking (cont.)

  • If every transaction in a schedule follows the two­phase locking protocol, the schedule is guaranteed to be serializable i.e. no concurrency problems will occur.

  • The two­phase locking protocol guarantees serializability however the use of locks can cause two additional problems: deadlock and starvation.


Deadlock prevention

DEADLOCK PREVENTION:

  • Use of transaction timestamp TS(T)

  • Two protocols can be used to prevent or more precisely roll-back one transaction in the case of deadlock.

    • Wait-die (older transaction waits for a younger one…)

    • Wound –wait protocol (younger waits for older transaction to finish)


Timestamping protocol

Timestamping protocol

Transaction Timestamps: the time the transaction starts

Data Timestamps

  • Read-Timestamp is timestamp of largest timestamp to read that data item

  • Write-Timestamp is timestamp of largest timestampto write (update) that data item

  • Timestamping prevents deadlock and starvation


  • Concurrency control using timestamp

    Concurrency control using Timestamp

    • Basic timestamp methods

      • Write- operation

        • When a Transaction attempts a write operation on a data item X it must first check that X has not been read or updated by a younger transaction: proceeds if no and rolled back if yes.

      • Read operation

        • When a Transaction attempts a read operation on a data item X it must first check that X has not been updated by a younger transaction.proceeds if no and rolled back if yes.


    Concurrency control based on timestamps

    Concurrency Control based on Timestamps

    • The basic idea or rules are as follows:

      • 1. Each transaction receives a timestamp when it is initiated at its site of origin.

      • 2. Each read or write operation which is required by a transaction has the timestamp of the transaction.

      • 3. For each data item x, the largest timestamp of a read operation and the largest timestamp of a write operation are recorded; they will be indicated as TRD(x) and TWR(x)

      • 4. Let T be the timestamp of a read operation on data item x. If T < TWR(x), the read operation is rejected and the issuing transaction restarted with a new timestamp; otherwise, the read is executed, and TRD(x) = max(TRD(x), T).

      • 5. Let T be the timestamp of a write operation on data item x. If T < TWR(x) or T < TRD(x), then the operation is rejected and the issuing transaction is restarted; otherwise, the write is executed, and TWR(x) = T.


    Optimistic currency control

    Optimistic Currency Control

    Conditions

    • Small number of conflicts likely – no ‘hotspots’ for writing

      Technique

      Each transaction has up to 3 phases.

    • Read phase

      • Execution phase of transaction.

      • All writes are done in private buffer space.

        • RS(T): all items read by transaction T

        • WS(T): all items written by transaction T

    • Validation Phase

      • Check is performed to ensure that writes can be installed in the database without violating serializability.

    • Write Phase

      • If validation succeeds, actual updates are applied to the database.

      • If validation fails, the transaction is aborted.


    Question to consider

    Question to consider

    • Explain / illustrate each of the concurrency control methods prevents the violation of the ACID properities.


    Review recovery

    Review Recovery

    Denis Manley

    Enterprise Systems

    DT211 4


    Recovery techniques are needed because transactions may fail

    RECOVERY TECHNIQUES ARE NEEDED BECAUSE TRANSACTIONS MAY FAIL

    • A computer failure or system crash: A hard ware or software error occurs during transaction execution.

    • Concurrency control enforcement: The concurrency control method may decide to abort the transaction

    • Disk failure: Some disk blocks may lose their data because of a read or write malfunction or because of a disk read/write head crash.

    • Physical problems and catastrophes: This refers to an endless list of problems that includes power or air­conditioning failure, fire. Need disaster recovery as well for such problems .


    The system log

    The System Log

    • T is the system generated transaction-id.

    • 1. [start_transaction,T]: Records that transaction T has started execution.

    • 2. [write_item,T,X,old_value,new_value]: Records that transaction T has changed the value of database item X from old_value to new_value.

    • 3. [read_item,T,X]: Records that transaction T has read the value of database item X

    • 4. [commit,T]: Records that transaction T has completed successfully, and affirms that its effect can be committed to the DB.

    • 5. [abort,T]: Records that transaction T has aborted.


    Recovery using log records

    RECOVERY USING LOG RECORDS

    • If the system crashes, we can recover to a consistent database state by examining the log.

      • It is possible to undo the effect of these WRITE operations of a transaction T by tracing backward through the log and resetting all items changed by a WRITE operation of T to their old_values.

      • We can also redo the effect of the WRITE operations of a transaction T by tracing forward through the log and setting all items changed by a WRITE operation of T to their new values.


    Commit point

    Commit Point

    • A transaction T reaches a commit point when all its operations that access the DB have been executed successfully and the effect of all the transaction operations on the DB have been recorded in the log.


    Undo redo

    Undo/redo

    • UNDO/REDO (Immediate update):

      • write-ahead to log on disk

      • update database anytime

      • commit allowed before database is completely updated

    • Goal: Maximize efficiency during normal operation.

      • Some extra work is required during recovery time.

    • Following a failure, the following is done.

      • Redo all transactions for which the log has both “start” and “commit” entries.

      • Undo all transactions for which the log has “start” entry but no “commit” entry.

    24


    Example of undo redo

    Example of undo/redo

    • We consider two transactions executed sequentially by the system.

    • T1:Read(A)T2:Read(A)A  A + 50A  A +10Read(B)Write(A)B  B + 100Read(D)Write(B)D  D -10Read(C)Read(E)C  2CRead(B)Write(C)E  E + BA  A + B + CWrite(E)Write (A)D  D + EWrite(D)

    • The initial values are:

      A=100B=300C=5D=60E=80


    Example cont

    Example (cont)

    • The Log

      1. <T1 starts>

      2. <T1, B, old: 300, new: 400>

      3. <T1, C, old: 5, new: 10>

      4. <T1, A, old: 100, new: 560>

      5. <T1 commits>

      6. <T2 starts>

      7. <T2, A, old: 560, new: 570>

      8. <T2, E, old: 80, new: 480>

      9. <T2, D, old: 60, new: 530>

      10. <T2 commits>

    • Output of B can occur anytime after entry 2 is output to the log, etc. Determine action at T = 1, 1 =< T =< 4, 5= < T=< 9, T =10


    Example cont1

    Example (cont)

    • Assume a system crash occurs. The log is examined. Various actions are taken depending on the last instruction (actually) written on it.


    No undo redo

    No-Undo/Redo

    • NO-UNDO/REDO (Deferred update):

      • don’t change database until ready to commit

      • write-ahead to log to disk

      • change the database after commit is recorded in the log

    • Advantages

      • Faster during recovery: no undo.

      • No before images needed in log.

    • Disadvantages

      • Database outputs must wait.

      • Lots of extra work at commit time.


    Undo no redo

    Undo/No-Redo

    • UNDO/NO-REDO (Immediate update):

      • All changed data items need to be output to the disk before commit.

        • Requires that the write entry first be output to the (stable) log.

      • At commit:

        • Output (flush) all changed data items in the cache.

        • Add commit entry to log.

    • Advantages

      • No after images are needed in log.

      • No transactions need to be redone.

    • Disadvantages

      • data requires a flush for each committed write.

        • Implies lots of I/O traffic.


    No undo no redo

    No-Undo/No-Redo

    • NO-UNDO/NO-REDO (shadow paging):

      • No-undo  don't change the database during a transaction

      • No-redo  on commit, write changes to the database in a single atomic action

    • Advantages

      • Recovery is instantaneous.

      • No recovery code need be written.


    Shadow paging

    y

    x

    x

    z

    z

    z

    y

    x

    z

    y

    x

    z

    y

    x

    z

    x

    y

    y

    Last committed value of x

    Last committed value of x

    Last committed value of x

    Last committed value of y

    Last committed value of y

    Last committed value of y

    Last committed value of z

    Last committed value of z

    Last committed value of z

    New version of x

    New version of x

    New version of x

    New version of y

    New version of y

    New version of y

    Master

    Master

    Master

    Shadow paging

    • During a transaction:

    • After preparing new directory for commit:

    • After committing:


    Checkpointing

    Checkpointing

    • Checkpointing speeds up recovery by flushing pages to disk.

    • During the execution in addition to the activities of the previous method, periodically perform checkpointing.

      • Output the log buffers to the log.

      • Force database buffers to the disk.

      • Output an entry <checkpoint> on the log.

    • During recovery

      • Undo all transactions that have not committed.

      • Redo all transactions that have committed after checkpoint.


    Recovery with checkpoints

    T1

    T2

    T3

    Recovery with Checkpoints

    Tc

    Tf

    Time

    • If the protocol is undo/redo then:

      • T1 is ok.

      • T2 and T3 are redone.

      • T4 is undone

    T4

    Checkpoint

    System Failure


    Question to consider1

    Question to consider

    • How might you determine the most appropriate recovery method for your database system?


    Denis manley enterprise systems dt211 4

    Denis Manley

    Enterprise Systems

    DT2114

    Review Query Optimisation


    Selection

    Selection

    • Primary key, pointsFilmID = 000002 (Film)

    • PointsTitle = ‘Terminator’ (Film)

    • Ranges1 < RentalPrice < 4 (Film)

    • ConjunctionsType = ‘M’  (AND) Distributor = ‘MGM’ (Film)

      • Selection contains only tuples that satisfy all conjuncts

    • DisjunctionsPubDate < 1990  (OR) Distributor = ‘MGM’ (Film)

      • Selection contains tuples formed by the union of all tuples that satisfy the disjuncts


    Query optimization

    Query Optimization

    • Transform query into faster, equivalent query

    equivalent query 1

    equivalent query 2

    faster

    query

    query

    ...

    equivalent query n

    • Heuristic (logical) optimization

      • Query tree (relational algebra) optimization

      • Query graph optimization

    • Cost-based (physical) optimization


    Steps in typical heuristics optimisation

    Steps in typical Heuristics Optimisation

    Step 1: Decompose s operations.

    Step 2: Move s as far down the query tree as possible.

    Step 3: Rearrange leaf nodes to apply the most restrictive s operations first.

    Step 4: Form joins from  and subsequent s operations.

    Step 5: Decompose p and move down the query tree as far as possible.

    Step 6: Identify candidates for combined operations.


    Query tree optimization example

    Query Tree Optimization Example

    • What are the names of customers living on Elm Street who have checked out “Terminator”?

    • SQL query:SELECT NameFROM Customer CU, CheckedOut CH, Film FWHERE T.Title = ’Terminator’ AND F.FilmId = CH.FilmIDAND CU.CustomerID = CH.CustomerID and CU.Street = ‘Elm’


    Canonical query tree

    Canonical Query Tree

    pName

    sTitle = ‘Terminator’  F.FilmId = CH.FilmID  CU.CustomerID = CH.CustomerID  CU.Street = ‘Elm’

    F

    CH

    CU


    Apply selections early

    Apply Selections Early

    pName

    s F.FilmId = CH.FilmID

    sCU.CustomerID = CH.CustomerID

    sTitle = ‘Terminator’

    F

    sStreet = ‘Elm’

    CH

    CU


    Apply more restrictive selections early

    Apply More Restrictive Selections Early

    pName

    s CU.CustomerID = CH.CustomerID

    s F.FilmId = CH.FilmID

    sStreet = ‘Elm’

    CU

    sTitle = ‘Terminator’

    CH

    F


    Form joins

    Form Joins

    pName

    ⋈CU.CustomerID = CH.CustomerID

    ⋈F.FilmId = CH.FilmID

    sStreet = ‘Elm’

    CU

    CH

    sTitle = ‘Terminator’

    F


    Apply projections early

    Apply Projections Early

    pName

    ⋈CU.CustomerID = CH.CustomerID

    ⋈F.FilmId = CH.FilmID

    pFilmID, CustomerID

    sStreet = ‘Elm’

    pFilmID

    pFilmID, CustomerID

    CU

    sTitle = ‘Terminator’

    CH

    F


    Example identify combined operations

    Example: Identify Combined Operations

    4

    pName

    nCU.CustomerID = CH.CustomerID

    nF.FilmId = CH.FilmID

    pFilmID, CustomerID

    2

    3

    sStreet = ‘Elm’

    pFilmID

    pFilmID, CustomerID

    1

    CU

    sTitle = ‘Terminator’

    CH

    F


    Cost based optimization

    Cost-Based Optimization

    • Use transformations to generate multiple candidate query trees from the canonical query tree.

    • Measuring Cost

      • Typically disk access is the predominant cost, and is also relatively easy to estimate.

      • Therefore number of block transfers from disk is used as a measure of the actual cost of evaluation.

      • It is assumed that all transfers of blocks have the same cost.

    • Do not include cost to writing output to disk.

    • Cost formulas estimate the cost of executing each operation in each candidate query tree.

    • The candidate query tree with the least total cost is selected for execution.


    Relevant statistics

    Relevant Statistics

    • Per relation

      • Tuple size

      • Number of tuples (records): r

    • Per attribute

      • Attribute average size

      • Number of distinct values for attribute A: dA


    Cost estimation example

    Cost Estimation Example

    4

    pName

    ⋈CU.CustomerID = CH.CustomerID

    ⋈F.FilmId = CH.FilmID

    pFilmID, CustomerID

    2

    3

    sStreet = ‘Elm’

    pFilmID

    pFilmID, CustomerID

    1

    CU

    sTitle = ‘Terminator’

    CH

    F


    Operation 1 s followed by a p

    Operation 1: s followed by a p

    • Statistics

      • Relation statistics: rFilm= 5,000

      • Attribute statistics: sTitle= 1

    • Result relation size: 1 tuple.

    • Cost (in disk accesses): C1 = 1

    • Statistics

      • Relation statistics: rCheckedOut= 40,000

      • Attribute statistics: sFilmID= 8

    • Result relation size: 8 tuples.

    • Cost: C2 = 8


    Operation 3 s followed by a p

    Operation 3: s followed by a p

    • Statistics

      • Relation statistics: rCustomer= 200

      • Attribute statistics:sStreet= 10

    • Result relation size: 10 tuples.

    • Cost: C3 = 10

    • Operation: Main memory join on relations in main memory.

    • Cost: C4 = 0

      Total cost: 19


    Operation 4 followed by a p

    Operation 4: ⋈ followed by a p

    • Operation: Main memory join on relations in main memory.

    • Cost: C4 = 0

      Total cost:


    Summary

    Summary

    • Query optimization is the heart of a relational DBMS.

    • Heuristic optimization is more efficient to generate, but may not yield the optimal query evaluation plan.

    • Cost-based optimization relies on statistics gathered on the relations.

    • Note: Query optimisation is more critical for larger distributed database systems


  • Login