Optimistic Concurrency Control and Time-stamp Ordering

Optimistic Concurrency Control and Time-stamp Ordering PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

2. Optimistic Concurrency Control. The problems of 2PLThe overheads for locking and maintaining the lock tableDeadlock and distributed deadlock problemsLong lock holding time implies long blocking time (chain of blocking)Thrashing problem due to high data contention (Strict 2PL)Optimistic Concu

Download Presentation

Optimistic Concurrency Control and Time-stamp Ordering

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

1. 1 Optimistic Concurrency Control and Time-stamp Ordering Optimistic Concurrency Control (OCC) Forward Vs backward validation Basic Time-stamp ordering (TO) Conservative and Multiversion TO Comparisons

2. 2 Optimistic Concurrency Control The problems of 2PL The overheads for locking and maintaining the lock table Deadlock and distributed deadlock problems Long lock holding time implies long blocking time (chain of blocking) Thrashing problem due to high data contention (Strict 2PL) Optimistic Concurrency Control (OCC) An aggressive approach: delay data conflict checking until the transaction wants to commit while 2PL checks conflict before an operation is allowed to access a data item, I.e., a pessimistic approach) It assumes that the probability of data conflict is low (so it is called optimistic)

3. 3 Optimistic Concurrency Control OCC Allow a transaction (operation) to access a data item without getting any lock (permission) Checking (validation) is performed at the appropriate time (I.e., just before the commit of a transaction) to ensure that the schedule is serializable It resolves data conflicts by restarting (abort, undo and then redo) transactions (2PL resolves conflicts by blocking) For validation purposes, the system maintains a record for each transaction to define what are the data items that the transaction has accessed, I.e., the read set and write set of a transaction The design of validation conditions: similar to 2PL, it is more restrictive than using SG(H) although it adopts an optimistic approach for checking (when to perform checking and checking conditions)

4. 4 Optimistic Concurrency Control The execution of a transaction under OCC is divided into: read phase validation phase write phase Read Phase During the read phase, the operations of a transaction will be processed (TM->scheduler->DM). Operation execution order is the same as its arrival order to the scheduler (no blocking or delay at the scheduler) For write operations, the new values will not be written into the database immediately. They will be stored in a private workspace for the transaction (deferred update) (If a transaction reads the same data item, it will read the old value, I.e, the value before the update of the transaction) For read operations, they read (committed) data from the database

5. 5 Optimistic Concurrency Control Validation Phase When all the operations of a transaction have been processed, a validation test will be performed to ensure that if the transaction is allowed to commit, serializability of the schedule can still be maintained In case of a data conflict, one of the conflicting transactions will be restarted Write Phase After a transaction has passed the validation test, it enters into the write phase and then commits The new values (from write operations) in its private workspace will be copied to the database (permanent data updates) Other transactions observe the new values only after the write phase (permanent updates)

6. 6 Optimistic Concurrency Control Pessimistic execution Optimistic execution

7. 7 Optimistic Concurrency Control Purposes and principles of validation The validation test is based on the possible data conflicts of other transactions with the validation transaction To maintain serializability, the scheduler (at validation) needs to follow an assumed execution order of the conflicting transactions, I.e., Ti is always before Ti+1 for all their conflicts The question is therefore how to define the orders for the transactions, I.e., which one is Ti and which one is Ti+1 The execution order in OCC is assigned following their times when they enter the validation phase In this way, conceptually, we may consider the transactions are executed at the moment of validation

8. 8 Optimistic Concurrency Control We do not want to keep the whole serialization graph as this is too complex and could incur a heavy overhead to the system Basically, we check the read/write sets of the transactions when a transaction enters the validation phase We need to maintain what are the data items in the read set and write set of a transaction while the transaction is executing If any possibility of having a non-serializable schedule, the validating transaction (or the conflicting transactions) will be restarted (abort and execute again) In the design of validation algorithm, we also need to identify to which transactions the validating transaction needs to compare with (not all conflicts are significant, I.e., the transaction has completed before the start of the read phase of the validating transaction) The identification is based on transaction time-stamps

9. 9 Optimistic Concurrency Control START(T): For each active transactions, the system (scheduler) maintains when they start execution, START(Ti) VAL(T): For the set of transactions which are in validation phase or in the write phase, the system maintains when they enter the validation phase, VAL(Ti), which is also the time which Ti imagined to execute in hypothetical serial order of execution If VAL(Tk) < VAL(Ti) => Tk -> Ti Tk may affect the results of Ti but Ti will not affect the results of Tk since Tk is before Ti in the serialization graph if they have any data conflict Read/Write conflict: Tk reads first before Ti updates Write/Read conflict: Tk updates first and Ti reads the value from Tk Write/Write conflict: Tk updates first and then Ti updates overwrites Tk’s value FIN(T): The set of transactions that have finished the write phase (and committed)

10. 10 Optimistic Concurrency Control Validation Problem (1) We want to maintain the serializable order to be U->T since U enters the validation first U is in VAL(T) or FIN(T), I.e., U has validated FIN(U) > START(T), I.e., U did not finish before T started RS(T) ? WS(U) is not empty, I.e., conflicting item is x It becomes possible that U wrote x after T read x. However, in fact, U may not even have written x yet The dotted lines (in next slide) connect the events in real time with the time at which they would have occurred had transactions been executed at the moment they validated

11. 11 Optimistic Concurrency Control

12. 12 Optimistic Concurrency Control Validation Problem (2) U is in VAL(T) FIN(U) > VAL(T), I.e., U did not finish before T entered the validation phase WS(T) ? WS(U) is not empty, I.e., the conflicting item is x It becomes possible that T wrote x before U (a problem case). To ensure the correctness, we need to restart T Remember we want to maintain the serializable order to be U-> since U We require the last value of x after the completion of both T and U should come from T (the latter transaction)

13. 13 Optimistic Concurrency Control

14. 14 Optimistic Concurrency Control Validation rules While a transaction is executing (read phase), its read set and write set have to be maintained (I.e., by the scheduler) For validation of transaction Ti Rule 1: If all transactions Tk (ts(Tk) < ts(Ti)) have completed their write phase before Ti has started its read phase, validation successes (ts(Ta) is the time-stamp when Ta enters the validation phase) Rule 2: If all transactions Tk (ts(Tk) < ts(Ti)) which completes its write phase while Ti is in its read phase, validation successes if WS(Tk) ? RS(Ti) = ? Rule 3: If there is any transaction Tk (ts(Tk) < ts(Ti)) which completes its read phase before Ti completes its read phase, validate successes if WS(Tk) ? RS(Ti) = ? and WS(Tk) ? WS(Ti) = ?

15. 15 Optimistic Concurrency Control Rule 1: no interleaving in execution and Ti will not affect the results of Tk Rule 2: To ensure that none of the data to be updated by Tk have been read by Ti before the update (Ti has to read from Tk) Rule 3: In addition to Rule 2, to ensure that none of the data to be updated by Tk are update by Ti first (The last value of a conflicting data item should come from Ti) Note: It is more restrictive than SG(H), I.e., no concurrent access of data items in conflicting mode is allowed

16. 16 Optimistic Concurrency Control

17. 17 Optimistic Concurrency Control The validation of a data item has to be performed in a critical section until after its write operation (if it can pass the validation). Otherwise, it is difficult to determine the set of conflicting operations and transactions While a transaction is in validation phase, no transaction is allowed to access the data items in its read set and write set To prevent the deadlock problem, we may only allow one transaction in validation phase each time If both the validation phase and write phase are performed in a critical section, rule 3 can be simplified to: WS(Tk) ? RS(Ti) = ? Performing the write phase in a critical section can simplify the testing (no W/W conflict) but will degrade system concurrency since only one transaction can be in the validation and write phases

18. 18 Optimistic Concurrency Control Backward validation Vs forward validation Backward validation (last one): Check again with committed transactions (having a smaller validation time-stamp) Those transactions’ commit time (FIN) is smaller than the begin of the earliest transaction in the system can be removed (WHY?) Forward validation: validating transaction checks again with executing transactions Forward validation May restarts the executing transactions which are in conflict with the validating transactions and then the validating transaction will commit The validation of transaction Tj checks whether its write set overlaps with the any of the read sets of the active transactions. (W/W conflicts also if WP is not in critical section)

19. 19 Optimistic Concurrency Control Forward validation If there is any overlap, the validation fails. Then, either the validating transaction will be restarted or the active transactions will be restarted Usually, we restart the active transactions The benefit is fruitful restart The restart of a transaction results in the commit of a transaction

20. 20 Comparison between FV and BV FV: allows a higher flexibility in resolution of conflicts Restart the validation transaction or the executing transactions In general, a read set is larger than a write set Therefore, BV compares a possibly large read set against the old write sets. FV checks a small write set against the set of read sets of active transactions. The read set may not be large since the transactions are executing BV needs to store the write sets of committed transactions The restart cost is usually lower if an active transaction is restarted comparing to restarting a validating transaction

21. 21 OCC in distributed database system Each transaction Ti is sub-divided into a number of sub-transactions, Tij (j indicates at site j) At validation phase, a time-stamp is assigned to Ti and the time-stamp will propagate to all its sub-transactions Validation consists of two parts: local and global Local validation is the same as before Global validation is required to ensure global serializability, I.e., Ti,1 > Tk,1 at site 1, and Ti,2 > Tk,2 at site 2 No good method until now Some suggested methods require synchronization between the processes of a transaction at different sites (know the serialization order, or dependency at all other sites)

22. 22 Timestamp Ordering (TO) Similar to locking, detect data conflict while a transaction is executing (a pessimistic method) Similar to OCC, it resolves data conflict by restarting transactions (and blocking for dirty read) Each transaction is assigned a unique time-stamp at its start time The time-stamp is an integer and assigned in ascending order (I.e. using system clock) In a distributed database system, the site ID may be added to the time-stamp (lower order) to make it globally unique The system maintains to table of currently active transactions and their time-stamps Transaction manager attaches the transaction timestamp to each operation of the transaction The serialization orders of conflicting transactions follow their time-stamps, I.e., ts(Ti) < ts(Tj) => Ti -> Tj

23. 23 Timestamp Ordering (TO) Each data item is associated with two time-stamps: read-time-stamp (rts) and write-time-stamp (wts) Initially, the time-stamps of all the data items are zero After a transaction, T, has read a data item, x, rts(x) will set to be the time-stamp of T, ts(T) After a transaction, T, has update a data item, x, wts(x) will set to be the time-stamp of T, ts(T) In TO, the access of a data item has to be ordered according to its time-stamps TO rules: Given two conflicting operations Oij and Okl belonging to transaction Ti and Tk respectively. Oij is executed before Okl if only if ts(Ti) < ts(Tk) Why the rule can ensure that all schedules will be serializable?

24. 24 Timestamp Ordering (TO) A scheduler that enforces TO rules checks: Each new operation against conflicting operations that have already scheduled If it can pass all the rules (below), the operation is accepted (scheduled) Otherwise, it is rejected and the transaction issuing the operation has to be restarted with a larger time-stamp Rules in Basic TO: Rule1: A transaction’s write request to a data item is allowed only if that data item was last read and written by earlier transactions (smaller TS) Rule 2: A transaction’s read request to a data item is allowed only if that data item was last written by an earlier transaction (smaller TS)

25. 25 Timestamp Ordering (TO) Basic TO: For read operation Ri(x) : If ts(Ti) < wts(x) Then reject Ri(x) Else accept Ri(x) rts(x) ? ts(Ti) For write operation Wi(x) : If ts(Ti) < rts(x) and ts(Ti) < wts(x) Then reject Wi(x) Else accept Wi(x) wts(x) ? ts(Ti) How about if ts(Ti) > rts(x) and ts(Ti) < wts(x)?

26. 26 Timestamp Ordering (TO)

27. 27 Timestamp Ordering (TO)

28. 28 Timestamp Ordering (TO)

29. 29 Strict Timestamp Ordering (TO) Reading dirty data The read on an uncommitted data The writer may abort later and the consequence is a cascading abort problem To prevent chain of abort, we may need to assign a commit bit to each dirty data item If a data item is dirty, I.e., commit bit is false, the read or update of the data item has to be delayed (the read/write operation is blocked until the commit is true)

30. 30 Strict Timestamp Ordering (TO)

31. 31 Strict Timestamp Ordering (TO) Writing dirty data The a write operation is on a dirty data If the first one aborts after the second one, the second update may lose Solution: when T wants to write, do nothing (skip the write operation) Why? No other transaction V that should read have read T’s value of x got U’s value. If V tried to read x it would have aborted because of a too-late read. Further reads of x will want U’s value or a later value of x, not T’s value Thomas write rule: a write operation may be skipped if a later write-time operation is already processed

32. 32 Strict Timestamp Ordering (TO)

33. 33 Timestamp Ordering (TO) Advantages of TO: no deadlock no blocking Disadvantages of TO: potentially large number of restarts (the restart overhead of a distributed transaction can be very expensive since a transaction may have multiple sub-transactions at several sites) management of time-stamps for each data

34. 34 Conservative TO Conservative TO tries to minimize the number of transaction restarts The scheduler uses a conservative approach and delay an operation until there is no risk that no operation with a smaller time-stamp will arrive later How does the system know this? It needs to know the latest time-stamp of the transaction at a site? It buffers the operations of each transaction from each site until an order can be established Each scheduler maintains one queue for each sites The scheduler at site i stores all the operations that it receives from the TM at site j in queue Qij

35. 35 Conservative TO When an operation is received from a TM, it is placed in its appropriate queue in increasing time-stamp order The scheduler executes operations from these queues in increasing time-stamp order What will be the problem if Qij is empty? Transaction restart may occur To totally eliminate the probability of transaction, a site need to send dummy message to other sites to inform them what are the smallest TS at its site

36. 36 Multiversion TO The problem of Basic TO is that if a read or write operation arrives late, it has to be restarted and Conservative TO may create a lot of blocking To reduce the number of transaction restarts and do not want to introduce blocking, we may use multiversion, e.g., each data item has several versions from different write operations (at the expense of storage and multiversion data management) In multiversion TO, a write operation does not over-written the old version, it creates a new version with a new version number (the time-stamp of its transaction)

37. 37 Multiversion TO When a read operation arrives late, it can read from an old committed version Thus, read operations are always permitted There is no conflicts between different write operations as each write creates its own version For the write/read conflict, there may be a problem, e.g., the read time-stamp of the latest version is greater than the time-stamp of the requesting transaction

38. 38 Multiversion TO Conflict Rules A Ri(X) is translated into a read on one version of X. This is done by finding a version of X, I.e., Xv, such that ts(Xv) is the largest timestamp less than ts(Ti). A Wi(X) is translated into Wi(Xw) so that ts(Xw) = ts(Ti) and sent to the scheduler if and only if no other transaction with a time-stamp greater than ts(Ti) has read the value of a version of X such that ts(Xr) > ts(Xw) Multi-version TO needs to perform purging to remove old versions. How to determine which versions can be removed? No chance to be required by any transaction

39. 39 Timestamp Ordering (TO)

40. 40 Timestamp Ordering (TO)

41. 41 Comparison of CC Methods Pessimistic approach: Time-stamp ordering and 2PL Optimistic approach: OCC Blocking: 2PL and partly in TS Restart: OCC and TS Main overheads: 2PL: locking overhead, management of lock table, deadlock resolution TS: restart of transactions, checking for time-stamps OCC: maintenance of read set and write set, restart of transactions, validation test overhead

42. 42 Comparison of CC Methods Extension for DDBS 2PL: location of lock scheduler and detection of distributed deadlock TS: synchronization of ts (a unique time-stamp with total ordering) OCC: synchronization of ts (a unique time-stamp with total ordering) and how to ensure the performance of validation in critical section

  • Login