1 / 18

Pessimistic Software Lock-Elision

Pessimistic Software Lock-Elision. Nir Shavit (Joint work with Yehuda Afek Alexander Matveev ). Read-Write Locks. One of the most prevalent lock forms in concurrent applications 80/20 rule applies to reading vs writing of data

delta
Download Presentation

Pessimistic Software Lock-Elision

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)

  2. Read-Write Locks • One of the most prevalent lock forms in concurrent applications • 80/20 rule applies to reading vs writing of data • Mutex between write calls and between writes and read-only calls • Allow read-only calls to proceed in parallel with one another

  3. Coming Next Year: HTM and Hardware Lock Elision

  4. Speculative Lock Elision (SLE) Thread 1 Thread 2 • Rajwar and Goodman: speculative execution of locks by optimistic hardware transactions (Haswell) • Roy, Hand, and Harris: software implementation of SLE, transactions executed speculatively in software. Start Acquire Start Acquire Speculate: try to execute the critical sections concurrently using transactions Lock Elided Lock Elided On failure: revert back to the lock Start Release Start Release

  5. SLE: Good and Bad • Advantages: Concurrency among writes and among reads and writes -- as long as they do not share/contend for memory • Disadvantages: • Contention implies defaulting to lock • Reads delayed by writes • System calls and I/O cannot be used • will cause trans to fail • Debugging hard due to the speculative non-deterministic behavior Speculative execution breaks the lock semantics – you need to rewrite the code

  6. Pessimistic Lock Elision (PLE) • Non-speculatively replace read-write locks • By pessimistic software transactions • In a way that: • Preserves the lock semantics • No code rewriting • Allows I/O in transactions • Allows read-write concurrency always! • Disadvantage: • Does not allow concurrency among writes • How important is this for RW-locked code?

  7. Pessimistic STM [MatveevShavit2011] • A commit-time privatizing STM in which all transactions execute once and never abort • And read-only transactions run in parallel with themselves and writes • To create PLE, we designed a new encounter-order version of this pessimistic STM that wait-free read-only trans

  8. Encounter Order Pessimistic STM • Quiescence mechanism [MatveevShavit2010] to tell when reads terminate • Write transactions execute sequentially (commits are serialized) by “passing a baton” • Writes maintain a public undo log • Wait-free reads collect a snapshot of the memory using undo log

  9. Pessimistic Read-Write Interaction • Write transactions must not write to locations being read by overlapping reads • Solution: • On a write, the old value is logged publically before writing the new value • In read phase, logged values of concurrent writes are read • In the commit-phase, the old values are discarded after it is ensured using the quiesencemechanism that no-one reads them

  10. Why does this work well? • No need for CAS or even memory barriers in common case • Even though logging is public, its only by one transaction at a time so very easy to implement

  11. Applying Pessimistic Lock-Elision Point 1 The semantics are not changed with PLE addition Program with RW-Locks input STM Compiler (Intel STM Compiler with PLE Transactions) Point 2 Concurrency between read and write critical sections output Program with PLE Point 3 HLE has limitations, but HLE + PLE does not have execute execute Processor with HLE (Intel’s Haswell) (HLE code is executed with software fallback to PLE) Point 4 PLE works on current processors Standard Processor (PLE code is executed)

  12. HYPERTHREADS NUMA NORMAL Performance • We empirically evaluated our algorithm on an Intel 40-way machine with 2 Xeon E7-4870 chips in a NUMA setup. • PLE:Our fully pessimistic encounter-time STM • RW_Lock_Egress:An ingress-egress counter based reader-writer mutex implementation for Intel platform. • MCS-Lock: Michael and Scott's MCS Lock • RW_Lock_SPAA: The new RWLock proposal from SPAA 2012

  13. Three Ways to Elide Locks • Software-only lock elision • If you don’t have hardware support • A fall back (slow path) for the hardware HLE • Intel’s SLE • A fall back using HTM • Intel’s RTM

  14. If Your Machine Doesn’t Have Hardware Support • Automatically replace at compile time all read-write locked code with PLE STM code • As easy as STM in new C++ compiler • This will improve on your RW-locks because it will allow read-only calls to proceed in parallel with writes • Write calls are sequential, but they were sequential anyhow…

  15. If Your Machine Has SLE • There is an XTEST instruction which returns true if the thread is currently executing in SLE • Execute XTEST after the XACQUIRE instruction (the HLE transaction start instruction) • At compile time create a duplicate PLE code path. If the XTEST fails, then the duplicate PLE path is executed

  16. If Your Machine Has RTM • Two copies: one copy is PLE path, the other is RTM code path: • RTM Hardware fall-back routine is PLE code path start • After the XBEGINadd a read (load) instruction of is_abortvariable • PLE code path first executes small RTM transaction that updates is_abort • Causing all concurrently executing RTM transactions will fail

  17. Lock-Elision Theory • We are going to see a lot of use of lock elision in industry… • So, what are the inherent costs of lock-elision using STMs? • What are the inherent costs of pessimistic STM implementations? • Can we quantify the interaction between hardware and software transactions (or with locks)

  18. Thanks

More Related