1 / 40

Algorithms for Validating Transactional Data

presented by: Dmitri Perelman. Algorithms for Validating Transactional Data. Agenda. Intro “Don’t touch my read-set” approach “Precedence graphs” approach On avoiding spare aborts Your questions. Modularity of TM algorithms. Validation algorithm:

jessie
Download Presentation

Algorithms for Validating Transactional Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. presented by: Dmitri Perelman Algorithms for Validating Transactional Data

  2. Agenda • Intro • “Don’t touch my read-set” approach • “Precedence graphs” approach • On avoiding spare aborts • Your questions

  3. Modularity of TM algorithms • Validation algorithm: • assumes non-overlapping transactional operations (transactions themselves do overlap) • responsible for guaranteeing correctness criterion • the main topic of our lecture Not serializable execution! T1: read(x,0) write(y,1) T2: read(y,0) write(x,1)

  4. Modularity of TM algorithms, cont. • Concurrency control: • supplies the “illusion” of non-overlapping transactional operations for the Validation module • depends on the needed progress guarantees (wait-free, lock-free, obstruction-free, blocking) • may use locks, CPU strong primitives • helping techniques • beyond the scope of the lecture

  5. Modularity of TM algorithms, cont. • Contention manager: • called in the cases, in which a set of transactions may not continue without violating the correctness criterion • “contention” is reported by the validation algorithm • beyond the scope of the lecture

  6. Correctness criteria (not formal) • Serializability – equivalence to some sequential history • equivalent: same invocations, same responses • no demand of real-time order • only committed transactions care • Linearizability – serializability + real-time order • Opacity – linearizability + all the transactions care T1 T2 o1 T1 T2 o1 C o2 o2 T3 C o3 o3 A o4 C C

  7. Measures for validation algorithms • The number of “unnecessary aborts” • TM satisfies permissiveness1if itaccepts every input pattern satisfying the correctness criterion. T1 o1 do we need to abort T1? T2 o2 C • Time complexity • Space complexity • related closely to garbage collection rules R.Guerraoui et al. Permissiveness in Transactional Memories. DISC 2008

  8. Agenda • Intro • “Don’t touch my read-set” approach • “Precedence graphs” approach • On avoiding spare aborts • Your questions

  9. Part 1: “don’t touch my read-set” approach • We will see now several TMs providing opacity • These TMs succeed to proceed if no concurrent transactions write to their read-set • if the read-set remains unchanged from the beginning of transaction, consistent snapshot is guaranteed

  10. DSTM – a straightforward approach • DSTM1 - the first and the straightforward implementation of untouchable read-set approach • Each object is accessed through a special “object handler”. obj locator txn descriptor txn status object handler current read-set previous data data’ M. Herlihy et al. STM for dynamic-sized data structures. PODC, 2003.

  11. DSTM – cont. • Write operation installs the new object locator • Whenever Ti accesses the object with active owner transaction - abort. • don’t touch my read-set! txn descriptor commit txn object handler read-set current data previous my descriptor txn active current read-set new previous

  12. DSTM – cont. • Reads are “invisible” • read does not let other transactions know about it • the writing txn cannot inform concurrent reader about the contention • Do not want to see illegal state: revalidate the read-set at each load operation • O(read-set) for each read operation. T2 T1 o1 o2 C

  13. TL2 – global clock + obj versioning • Solves the problem of costly read-set revalidation • Global version clock (GVC) counts the number of updating committed transactions • when writing transactions successfully commits, it increments GVC • Each object has an associated version value • o.version is equal to the value of GVC at the moment of writing that version D.Dice et al. Transactional Locking II. DISC, 2006.

  14. TL2 – cont. • Transaction remembers the value of GVC upon the startup in the local rv variable • Read operation of object o checks that o.version ≤ rv. If not, abort. T1 T2 T1 ver=0 ver=1 ver=0 o1 o1 GC=0 GC=1 GC=0 GC=1 T2 ver=1 ver=0 ver=0 ver=1 o2 o2 C A C A need to abort clearly spare abort

  15. TL2 – cont. • Writes are postponed till the commit (“invisible writes”) • allows concurrent writers • Commit of read-only transaction always returns success • Commit of updating transaction: • revalidate the read-set (no need if rv = GVC) • increment GVC, write the new values, update objects’ versions T1 A o1 The reason for revalidating the read-set T2 o2 C

  16. TL2 vs. DSTM • Invisible writes (TL2) allow more concurrency than the visible ones (DSTM). • Global Version Clock – no need to revalidate the read-set at every load operation. • This comes at the cost of being blocking (DSTM is obstruction-free)

  17. LSA - Multi-versioning • Lazy snapshot algorithm (LSA) – similar to TL2 • But now the objects are multi-versioned: • each object keeps the list of versions • the writer installs the new version (instead of overriding the old one) object handler vn vn-1 T.Riegel et al. A Lazy Snapshot Algorithm with Eager Validation. DISC, 2006.

  18. LSA – validity ranges • As in TL2, each object version i keeps its installation time o.vi • The validity range of object version i is the time range [o.vi, o.vi+1). • The validity range (vr) of the transaction is the intersection of validity ranges of the read-set • initialized to [GVC, ∞) • is updated after every read • should stay non-empty

  19. LSA – validity ranges, cont. • Read operation: • traverses the versions list from the latest one till finding the suitable version to read • version is suitable if its validity range has a nonempty intersection with Tivr. T1 T2 o1 o2 C • Validity range of the latest version is still not known • it is temporarily assigned [o.vj, GVC], and marked as open • open objects’ ranges may be extended on demand • transaction is open while all the objects from its read-set are open

  20. LSA – validity ranges, examples vr = [x-10,x+1] vr = [x-10,x] T1 o1 Expanding validity range on demand GVC = x+1 GVC = x T2 o2 C vr = [x+1,x+1] vr = [x-10,x] T1 T2 o1 Expanding range is not possible – reading previous version GVC = x GVC = x+1 o2 C vr = [x+1,x+1]

  21. LSA – commit • Read-only transactions always commit successfully • If the transaction has a non-empty write-set: • the commit succeeds if the txn succeeds to increment GVC • the incremented GVC should have a nonempty intersection with txn’s validity range • transaction should be open for successful commit • transaction is open if no concurrent transactions touch its read-set

  22. LSA commit – examples T1 o1 T2 T1 commits, because it is open (T1 would have to abort in TL2) o2 C o3 C T1 T2 o1 T1 commits, because it is read-only (any single versioned algorithm would have to abort) o2 C T1 T2 T1 aborts, because it is not open (T2 has written to its read-set). Opacity is not violated o1 C o2 A

  23. LSA – conclusions • Multi-versioning: all the read-only transactions commit • Multi-versioning: additional level of indirection for object data access • increases the number of cache misses • Expanding validity ranges on demand may be costly • O(read-set) T1 o1 o2 C o3 C …

  24. An alternative for global clock • A global clock may be a scalability limitation when the number of cores goes large • Thread Local Clock (TLC)1 comes to solve these limitations • Object timestamp is appended with the tid of the writer. object object timestamp timestamp tid • Each thread has a thread local clock • incremented at the start of every transaction • Each thread has a local clocks array • entry i of array keeps the last timestamp of thread i seen by the thread H.Avni and N.Shavit. Maintaining consistent transactional states without a global clock. SIROCCO, 2008.

  25. TLC – cont. • Write operation: • update the timestamp & tid. • Validating object’s timestamp: • abort if object’s timestamp is greater than the local array entry for the writer T1 T1 o1 o1 T2 T2 o2 o2 C A C A tmstmp = 2, tid = 2 tmstmp = 2, tid = 2 1 2 1 2 thread 1 local array thread 1 local array 2 1 1 1 2 1

  26. Don’t touch my read-set approach – limitations • Too conservative • The algorithms do not distinguish between the following scenarios T1 C T1 T2 o1 o1 C T2 o2 o2 A A

  27. Agenda • Intro • “Don’t touch my read-set” approach • “Precedence graphs” approach • On avoiding spare aborts • Your questions

  28. Precedence graph (PG) • Nodes of the graph – transactions. • Edges – transactions precedence info • If Ti reads from Tj, there is an edge Tj →Ti • If Tj installs o.vn-1 and Ti installs o.vn, there is an edge Tj → Ti • If Tj reads from o.vn-1 and Ti installs o.vn, there is an edge Tj→Ti object handle writer writer o.vn o.vn-1 readers readers

  29. Precedence graph – cont. • A path from Ti to Tj in PG implies the order in the sequential history • The topological order on the graph gives a legal sequential history • If PG does not contain cycles, the history is serializable • It is sufficient to keep PG acyclic to ensure validity

  30. Pure precedence graph solution • J. Napper and L. Alvisi presented the TM satisfying serializability based upon precedence graphs • the main focus of the article was making the solution lock-free • Read operation: • looks for the latest version which does not introduce cycles in PG • Reads are invisible, consistent snapshot is not revalidated till commit (don’t need it in serializability) • Writes install the new version after the latest one (postponed till commit) • Commit operation checks the precedence graph for acyclity J.Napper and L.Alvisi. Lock-free serializable transactions. Tech. report, 2005.

  31. Napper & Alvisi TM – conclusions • Much more accurate than the “don’t touch my read-set” approach • We get it on the high computational cost • cycle detection takes O(|V|2) • Two questions remained opened: • garbage collection rules • path shortening techniques o1 o2 o3 o4 o5

  32. Agenda • Intro • “Don’t touch my read-set” approach • “Precedence graphs” approach • On avoiding spare aborts • Your questions

  33. TM providing opacity-permissiveness • Opacity-permissiveness – every input pattern satisfying opacity is accepted • e.g. the history r1(01),w2(o1),c1,c2 • Unfortunately, no online TM may do it1 T2 T2 o1 o1 T3 T1 T3 T1 T4 o2 o2 o3 o3 C A C A what value should be returned to T2? I. Keidar and D. Perelman. On avoiding spare aborts in TM. Tech. report, 2009.

  34. Strict online opacity-permissiveness • Strict online opacity-permissiveness – abort only if cannot continue without violating opacity • Unfortunately, implies solving NP-complete problems

  35. Online opacity-permissiveness (not formal) • Intuitively, the NP-completeness derives from the situations, in which there are several ways to serialize already committed transactions • We demand from the TM to define the serialization order of the committed transactions • this order should be persistent in every extension of the run • all the TMs we have seen so far do so implicitly • however, theoretically, TMs may not define serialization point of the committed transactions (e.g. commutative txns) T1 T2 T3 T4 What is the order of T1,T2,T3?

  36. TM satisfying online opacity-permissiveness • The same story of building the precedence graph • The write operation has an option to install the version before the latest one • as if the write had happened in past • possible only for “blind writes” • Read operation is looking for the latest possible version to read without creating a cycle • Write operations are postponed till commit

  37. TM satisfying OOP – commit • Commit operation should choose the “appropriate places” to install the new versions • greedy algorithm will not work C A C o1 o1 T1 T2 T1 T2 o2 o2 o3 o3 T3 C T3 C choosing to read from T2 leads to spare abort

  38. TM satisfying OOP – GC rules • Intuitively, may remove the transactional node from the precedence graph if it cannot participate in the cycles any more • For example, if the node has no incoming edges and will not have the new ones in the future • This lecture was not intended to be sadistic • interested students are welcome to read the tech report :)

  39. TM without spare aborts – opened questions • Even after all the optimizations, avoiding spare aborts implies a high cost • and what is the inherent lower bound? • Weakening correctness criterion can help • there exists TM satisfying causal serializability that uses vector clocks • causal serializability is weaker than serializability • different processors may perceive some events in different order as long as the individual views preserve the causality relation

  40. Thanks

More Related