1 / 23

Extending Open64 with Transactional Memory features

Extending Open64 with Transactional Memory features. Jiaqi Zhang Tsinghua University. Contents. Background Design Implementation Optimization Experiment Conclusion. Transactional Memory Background. Trend to concurrent programming Current solution: Lock Flaws:

chavi
Download Presentation

Extending Open64 with Transactional Memory features

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extending Open64 withTransactional Memory features Jiaqi Zhang Tsinghua University

  2. Contents • Background • Design • Implementation • Optimization • Experiment • Conclusion

  3. Transactional Memory Background • Trend to concurrent programming • Current solution: • Lock • Flaws: • Association between locks and data • Deadlock • Not composable

  4. Transactional Memory Background bool credit(int amount){ acquire(mylock); balance+=amount; release(mylock); } bool debit(int amount){ acquire(mylock); balance-=amount; release(mylock); } class Account{ int balance; lock mylock; bool credit(int amount); bool debit(int amount); }; transfer(Account a, Account b, int amount){ } acquire(a.mylock); acquire(b.mylock); release(a.mylock); release(b.mylock); atomic{ a.credit(amount); b.debit(amount); } inconsistent state a.credit(amount); b.debit(amount); Poor abstraction of class Account Deadlock Exposed implementation details

  5. Transactional Memory Background • Current Implementations • TM libraries • DSTM • DracoSTM • TL2 • TinySTM • …….. Function calls: TM_INIT()/TM_SHUTDOWN() TM_ATOMIC_BEGIN()/TM_ATOMIC_END() TM_SHARED_READ()/TM_SHARED_WRITE() Explicit Transaction

  6. Transactional Memory Background • Current Implementations • Compilers • Intel C++ STM Compiler • Tanger • OpenTM • GCC

  7. Design • Programming Interfaces readonly #pragma tm atomic [clause] structured block private(var list) shared(var list) #pragma tm abort #pragma tm function function declaration #pragma tm waiver function declaration

  8. Design • TM runtime interfaces (TL2)

  9. Design • Wrapper functions • To ease the process of integrating new TM libraries tm_init()/tm_finalize() tm_thread_start()/tm_thread_end() __tm_atomic_begin()/__tm_atomic_end() __tm_shared_read()/__tm_shared_read_float() __tm_shared_write()/__tm_shared_write_float() __tm_local_write()/__tm_local_write_float() by programmers by compiler more wrapper functions are needed for other data types, and additional TM semantics

  10. Design • Optimization • Eliminate redundant calls to runtime libraries

  11. Implementation • General Transformation

  12. Implementation • General Transformation • #pragma tm atomic • simple statements • control flow statements • IF • WHILE_DO PARM #address of I CALL <__tm_shared_read> LDID <return_offset> STID #tm_preg_num_0 WHILE_DO LDID #tm_preg_num_0 INTCONST 9 LE BODY BLOCK ……………. PARM #address of I CALL <__tm_shared_read> LDID <return_offset> STID #tm_preg_num_0 END_BLOCK setjmp(); __tm_atomic_begin(); PARM #address of c CALL <__tm_shared_read> LDID <return_offset> STID #tm_preg_num_0 PARM #address of b CALL <__tm_shared_read> LDID <return_offset> STID #tm_preg_num_1 LDID #tm_preg_num_0 LDID #tm_preg_num_1 ADD PARM PARM #address of a CALL <__tm_shared_write> a = b+c; for(;i<10;i++){ }

  13. Implementation • General Transformation

  14. Implementation • Functions • clone and instrument void calculate() __tm_cloned__calculate() //instrumented #pragma tm function void calculate(){} #pragma tm atomic { calculate(); } #pragma tm atomic { __tm_cloned__calculate(); }

  15. Implementation • Optimization Transaction local variables : detected by the frontend

  16. Implementation • Optimization Barrier Free variables : detected according to its storage class

  17. Implementation • Optimization

  18. Implementation • Optimization • Optimization opportunities detection strategy • Pthread parallel task • transaction local: declared in tm atomic scope • barrier free: auto variables • Cloned transactional function • transaction local: declared in the function • OpenMP parallel task • transaction local: declared in tm atomic scope • barrier free: declared in micro task, marked in openmp private clause • Checking readonly transactions • Limitation • Reserved design for pointers • Needs programmers to participate in optimization

  19. Preliminary Experiments • Compare with fine-grained lock based application

  20. Preliminary Experiments • Compare with manually instrumented application

  21. Preliminary Experiments private(feature) #pragma tm atomic { int j; *new_centers_len[index] ++; for(j=0;j<nfeatures;j++){ new_centers[index][j]+=feature[i][j]; } }

  22. Conclusion & Future work • A infrastructure for TM on Open64 • Replaceable TM implementation • Optimization • More experiments on non-trivial applications are desired • Nested transaction • Signal processing • Event handler • Indirect calls • Dealing with legacy code • … FastDB: 8 out of 75 critical regions contain nested transactions FastDB: 28 out of 75 critical regions contain signal processing PARSEC: 20 out of 55 critical regions contain signal processing

  23. Thanks

More Related