1 / 25

Read-Write Lock Allocation in Software Transactional Memory

Read-Write Lock Allocation in Software Transactional Memory. Amir Ghanbari Bavarsad and Ehsan Atoofian Lakehead University. Transactional Memory. Software transactional memory (STM) exploits a global clock to validate transactional data Pros: reduces validation overhead Cons: contention

satin
Download Presentation

Read-Write Lock Allocation in Software Transactional Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Read-Write Lock Allocation in Software Transactional Memory Amir Ghanbari Bavarsad and Ehsan Atoofian Lakehead University

  2. Transactional Memory • Software transactional memory (STM) exploits a global clock to validate transactional data • Pros: reduces validation overhead • Cons: contention • Alternate: Read Write Lock Allocation (RWLA) • Pros: no central clock • Cons: overhead if a TX aborts • Speculative RWLA: changes validation policy dynamically → Speedup: up to 66% P P n 1 $ $ Global Clock

  3. Outline • Background • RWLA • Speculative RWLA • Conclusion

  4. Counter in STM TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); T1

  5. Validation in STM • Transactional data are validated using: • Global clock • Shared variable • Timestamp for transactions • Lock • Memory is mapped to Lock Table • Each entry of the table: • Version # Global Clock … Version # Lock Table … Memory

  6. Version # Updating Global Clock & Lock • Increment Global Clock • Version # = global_clock Global Clock … counter Lock Table … Memory

  7. Validation in STM • rv (read version) is set to global_clock T1 TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); rv Metadata for TX1 Global Clock

  8. Successful Read Validation • rv >= version# • The most recent write to counter, occurred before TM_BEGIN() T1 TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); rv Metadata for TX1 Global Clock

  9. Failed Read Validation • rv < version# • The most recent write to counter, occurred after TM_BEGIN() T1 TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); rv Metadata for TX1 Global Clock

  10. Overhead of Validation • This method, called GV4, results in many cache coherence misses if transactions commit frequently P P n 1 $ $ Global Clock

  11. Outline • Background • RWLA • Speculative RWLA • Conclusion

  12. Read Write Lock Allocation (RWLA) • Lock • Memory is mapped to Lock Table • Each entry of the table: • Lock bit • Read bits … lock bit Read bits Pn-1 … P1 P0 Lock Table … Memory

  13. TM_READ TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); 0 0 0 ….. 0 0 0

  14. TM_READ TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); TM_READ() Lock bit is free? Yes Set read bit in the corresponding lock entry lock bit 1 0 0 0 ….. 0 0 0

  15. TM_READ TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); TM_READ() No Lock bit is free? Abort Yes Set read bit in the corresponding lock entry 0 0 0 ….. 0 0 1

  16. TM_WRITE TM_WRITE TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); All read bits are clear? No Abort 0 0 1 ….. 0 0 0

  17. TM_WRITE TM_WRITE TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); All read bits are clear? No Abort Yes Acquire lock failed 0 0 0 ….. 0 0 1

  18. TM_WRITE TM_WRITE TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); All read bits are clear? No Abort Yes Acquire lock failed 1 0 0 0 ….. 0 0 0

  19. Experimental Framework • Benchmarks: Stamp v0.9.7 • Run up to competition • Measured statistics over 10 runs • TL2 as an STM framework • Two Intel Xeon E5660, 6-way CMP

  20. Performance of RWLA better

  21. Speculative RWLA • Conflict occurs frequently → select GV4 • Conflict occurs rarely → select RWLA • How to predict conflict?

  22. 1 X1 Xn Contention Predictor xi: global transaction history, bipolar value • Prediction: • y≥0 →predict commit • y<0 →predict abort • Update • If outcome of current TX and TXi agree/disagree →increment/decrement wi wi: weight vector … w0 wn w1 y

  23. Performance of Speculative RWLA • # of threads changes between 2 and 16 • On average, performance changes from 21% in Bayes to 47% in Labyrinth better

  24. Conclusion • RWLA to overcome contentions over global clok • Applications react differently to GV4 and RWLA • Speculative RWLA changes validation policy dynamically • Speculative RWLA performance of STMs up to 66%

  25. Thank You! Questions?

More Related