1 / 34

Programming, Debugging, Profiling and Optimizing Transactional Memory Applications

Programming, Debugging, Profiling and Optimizing Transactional Memory Applications. PhD Thesis Proposal. Department of Computer Architecture Universitat Politècnica de Catalunya – BarcelonaTech Barcelona Supercomputing Center. Ferad Zyulkyarov. 01 July 2010. Publications.

tameka
Download Presentation

Programming, Debugging, Profiling and Optimizing Transactional Memory Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming, Debugging, Profiling and Optimizing Transactional Memory Applications PhD Thesis Proposal Department of Computer Architecture Universitat Politècnica de Catalunya – BarcelonaTech Barcelona Supercomputing Center Ferad Zyulkyarov • 01 July 2010

  2. Publications • Ferad Zyulkyarov, SrdjanStipic, Tim Harris, Osman Unsal, Adrian Cristal, Ibrahim Hur, Mateo Valero, Discovering and Understanding Performance Bottlenecks in Transactional Applications, PACT'10 • Ferad Zyulkyarov, Tim Harris, Osman Unsal, Adrian Cristal, Mateo Valero, Debugging Programs that use Atomic Blocks and Transactional Memory, PPoPP'10 • Vladimir Gajinov, Ferad Zyulkyarov, Osman Unsal, Adrian Cristal, Eduard Ayguade, Tim Harris, Mateo Valero, QuakeTM: Parallelizing a Complex Serial Application Using Transactional Memory , ICS'09 • Ferad Zyulkyarov, Vladimir Gajinov, Osman Unsal, Adrian Cristal, Eduard Ayguade, Tim Harris, Mateo Valero, Atomic Quake: Using Transactional Memory in an Interactive Multiplayer Game Server , PPoPP’09 • Ferad Zyulkyarov, SanjaCvijic,Osman Unsal, Adrian Cristal, Eduard Ayguade, Tim Harris, Mateo Valero, WormBench - A Configurable Workload for Evaluating Transactional Memory Systems, MEDEA '09 • Ferad Zyulkyarov, MilosMilovanovic, Osman Unsal, Adrian Cristal, Eduard Ayguade, Tim Harris, Mateo Valero, Memory Management for Transaction Processing Core in Heterogeneous Chip-Multiprocessors, OSHMA '09 • MilosMilovanovic, Osman Unsal, Adrian Cristal, Ferad Zyulkyarov, Mateo Valero, Compiler Support for Using Transactional Memory in C/C++ Applications, INTERACT’07

  3. Work Plan 12m 11m 21m 10m 15m 9.5m 7m 2m 01/10/2010

  4. Transactional Memory atomic { statement1; statement2; statement3; statement4; ... }

  5. The Big Questions • Is programming with TM easy? • Is TM competitive with locks? • Are existing development tools sufficient?

  6. Atomic Quake • Parallel Quake game server • All locks are replaces with atomic blocks • 27,400 LOC of C code in 56 files • Rich transactional application • 63 atomic blocks • Rich uses of atomic blocks • Library calls, I/O, error handling, memory allocation, failure atomicity • Various transactional characteristics • A workload to drive research in TM

  7. Is programming with TM easy? • Yes. • In large applications where we have many shared objects and want to provide efficient fine grain synchronization • Example: region based locking in tree data structure and graphs.

  8. Where Transactions Fit? Guarding different types of objects with separate locks. 1 switch(object->type) { /* Lock phase */ 2 KEY: lock(key_mutex); break; 3 LIFE: lock(life_mutex); break; 4 WEAPON: lock(weapon_mutex); break; 5 ARMOR: lock(armor_mutex); break 6 }; 7 8 pick_up_object(object); 9 10 switch(object->type) { /* Unlock phase */ 11 KEY: unlock(key_mutex); break; 12 LIFE: unlock(life_mutex); break; 13 WEAPON: unlock(weapon_mutex); break; 14 ARMOR: unlock(armor_mutex); break 15 }; Lock phase. atomic { pick_up_object(object); } Unlock phase.

  9. Is TM Competitive to Locks? • No. • 4-5x slowdown on single threaded version. • But it is promising to be competitive because of the obtained good scalability. Scales OK up to 4 threads. Sudden increase in aborts.

  10. Are Existing Tools Sufficient? • No • We need: • Richer language level primitives and integration. • Mechanisms to handle I/O. • Dynamic error handling. • Debuggers. • Profilers.

  11. Unstructured Use of Locks Atomic Block 1 boolfirst_if = false; 2 boolsecond_if = false; 3 for (i=0; i<sv_tot_num_players/sv_nproc; i++){ 4 <statements1> 5 atomic { 6 <statemnts2> 7 if (!c->send_message) { 8 <statements3> 9 first_if = true; 10 } else { 11 <stamemnts5> 12 if (!sv.paused && !Netchan_CanPacket(&c->netchan)){ 13 <statmenets6> 14 second_if = true; 15 } else { 16 <statements8> 17 if (c->state == cs_spawned) { 18 if (frame_threads_num > 1) { 19 atomic { 20 <statements9> 21 } 22 } else { 23 <statements9>; 24 } 25 } 26 } 27 } 28 } 29 if (first_if) { 30 <statements4>; 31 first_if = false; 32 continue; 33 } 34 if (second_if) { 35 <statements7>; 36 second_if = false; 37 continue; 38 } 39 <statements10> 40 } Locks 1 for (i=0; i<sv_tot_num_players/sv_nproc; i++){ 2 <statements1> 3 LOCK(cl_msg_lock[c - svs.clients]); 4 <statemnts2> 5 if (!c->send_message) { 6 <statements3> 7 UNLOCK(cl_msg_lock[c - svs.clients]); 8 <statements4> 9 continue; 10 } 11 <stamemnts5> 12 if (!sv.paused && !Netchan_CanPacket (&c->netchan)) { 13 <statmenets6> 14 UNLOCK(cl_msg_lock[c - svs.clients]); 15 <statements7> 16 continue; 17 } 18 <statements8> 19 if (c->state == cs_spawned) { 20 if (frame_threads_num > 1) LOCK(par_runcmd_lock); 21 <statements9> 22 if (frame_thread_num > 1) UNLOCK(par_runcmd_lock); 23 } 24 UNLOCK(cl_msg_lock[c - svs.clients]); 25 <statements10> 26 } Extra variables and code Solution explicit “commit” Complicated Conditional Logic

  12. Various Transactional Characteristics Per-atomic block runtime statistics from Atomic Quake. Different execution frequency -> Phased behavior. Very small transactions Very large transactions Most frequent atomic block is read-only. Control flow does not reach all atomic blocks.

  13. Debugging Transactional Applications • Existing debuggers are not aware of atomic blocks and transactional memory • New principles and approaches: • Debugging atomic blocks atomically • Debugging at the level of transactions • Managing transactions at debug-time • Extension for WinDbg to debug programs with atomic blocks

  14. Atomicity in Debugging • Step over atomic blocks as if single instruction. • Abstracts weather atomic blocks are implemented with TM or lock inference • Good for debugging sync errors at granularity of atomic blocks vs. individual statements inside the atomic blocks. Non-TM Aware Debugger TM Aware Debugger <statement 1> <statement 2> atomic { <statement 3> <statement 4> <statement 5> <statement 6> } <statement 7> <statement 8> <statement 1> <statement 2> atomic { <statement 3> <statement 4> <statement 5> <statement 6> } <statement 7> <statement 8> Debugging becomes frustrating when transaction aborts.

  15. Isolation in Debugging • What if we want to debug wrong code within atomic block? • Put breakpoint inside atomic block. • Validate the transaction • Step within the transaction. • The user does not observe intermediate results of concurrently running transactions • Switch transaction to irrevocable mode after validation. atomic { <statement 1> <statement 2> <statement 3> <statement 4> }

  16. Debugging at the Level of Transactions • Assumes that atomic blocks are implemented with transactional memory. • Examine the internal state of the TM • Read/write set, re-executions, status • TM specific watch points • Break when conflict happens • Filters • Concurrent work with Herlihy and Lev [PACT’ 09].

  17. TM Specific Watchpoints Filter: Break if Address = reservation@04 Thread = T2 Break when conflict happens AND atomic { <statement 1> <statement 2> <statement 3> <statement 4> } Conflict Information Conflicting Threads: T1, T2 Address: 0x84D2F0 Symbol: reservation@04 Readers: T1 Writers: T2

  18. Managing Transactions at Debug-Time • At the level of atomic blocks • Debug time atomic blocks • Splitting atomic blocks • At the level of transactions • Changing the state of TM system (i.e. adding and removing entries from read/write set, change the status, abort) • Analogous to the functionality of existing debuggers to change the CPU state

  19. Example Debug Time Atomic Blocks <statement 1> <statement 2> <statement 3> <statement 4> <statement 5> <statement 6> <statement 7> <statement 8> <statement 9> <statement 10> <statement 11> <statement 12> <statement 13> <statement 14>

  20. Example Debug Time Atomic Blocks <statement 1> <statement 2> <statement 3> StartDebugAtomic <statement 4> <statement 5> <statement 6> <statement 7> <statement 8> <statement 9> EndDebugAtomic <statement 10> <statement 11> <statement 12> <statement 13> <statement 14> User marks the start and the end of the transactions

  21. Issues of Profiling TM Programs • TM applications have unanticipated overheads • Problem raised by Pankratius [talk at ICSE’09] and Rossbach et al. [PPoPP’10] • Difficult to profile TM applications without profiling tools and without knowing the implementation of the TM system • Experience of optimizing QuakeTM, Gajinov et al. [ICS’2009]

  22. Profiling TM Programs • Design principles • Report results at source language constructs • Abstract the underlying TM system • Low probe effect and overhead • Profiling techniques • Conflict point discovery • Identifying conflicting data structures • Visualizing transactions

  23. Conflict Point Discovery • Identifies the statements involved in conflicts • Provides contextual information • Finds the critical path

  24. Call Context increment() { counter++; } Thread 1 for (inti = 0; i < 100; i++) { probability80(); probability20(); } Bottom-up view + increment (100%) |---- probability80 (80%) |---- probability20 (20%) Top-down view + main (100%) |---- probability80 (80%) |---- increment (80%) |-----probability20 (20%) |---- increment (20%) probability20 { probability = random() % 100; if (probability >= 80) { atomic { increment(); } } } Thread 2 for (inti = 0; i < 100; i++) { probability80(); probability20(); } probability80 { probability = random() % 100; if (probability < 80) { atomic { increment(); } } }

  25. Aborts Graph (Bayes) There are 15 atomic blocks and only one of them aborts most. Which atomic blocks cause AB3 to abort? AB1 AB2 Conf: 73% Wasted: 63% Conf: 20% Wasted: 29% AB3 72% of wasted work

  26. Indentifying Conflicting Objects 1: List list = new List(); 2: list.Add(1); 3: list.Add(2); 4: list.Add(3); ... atomic { list.Replace(2, 33); } Per-Object View + List.cs:1 “list” (42%) |--- ChangeNode (20 %) +---- Replace (12%) +---- Add (8%) List 1 2 3 0x08 0x10 0x18 0x20 GC Root 0x08 Object Addr 0x20 InstrAddr 0x446290 GC Memory Allocator DbgEng List.cs:1

  27. Transaction Visualizer (Genome) Garbage Collection Wait on barrier Aborts occur at the first and last atomic blocks in program order.

  28. Overhead and Probe Effect Process data offline or during GC. + Profiling Enabled - Profiling Disabled Normalized Execution Time Standard deviation for the difference 27% Abort Rate in % Standard deviation for the difference 3.88%

  29. Optimization Techniques • Moving statements • Atomic block scheduling • Checkpoints and nested atomic blocks • Pessimistic reads • Early release

  30. Moving Statements No! atomic { counter++; <statement1> <statement2> <statement3> } atomic { <statement1> <statement2> <statement3> counter++; } Will this code execute the same?

  31. Checkpoints atomic { <statement1> <statement2> <statement3> <statement4> <statement5> <statement6> <statement7> } Conflicts 2% 15% 4% 79% Insert Checkpoint

  32. Checkpoints atomic { <statement1> <statement2> <statement3> <statement4> <statement5> <statement6> <checkpoint> <statement7> } Conflicts 2% 15% 4% 79% Reduced wasted work for the atomic block with 40%. Insert Checkpoint

  33. Conclusion • Study the programmability aspects of TM • New debugging principles and approaches for TM applications • New profiling techniques for TM applications • Profile-guided optimization approaches for TM applications

  34. Край

More Related