1 / 31

DECS: A Dynamic Elimination-Combining Stack Algorithm

DECS: A Dynamic Elimination-Combining Stack Algorithm. Gal Bar-Nissan, Danny Hendler , Adi Suissa. OPODIS 2011. Stack data-structure. We focus on the stack data-structure which supports two operations: push(v) – adds a new element (with value v) to the top of the stack

koto
Download Presentation

DECS: A Dynamic Elimination-Combining Stack Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DECS: A Dynamic Elimination-Combining Stack Algorithm Gal Bar-Nissan, Danny Hendler, AdiSuissa OPODIS 2011

  2. Stack data-structure • We focus on the stack data-structure which supports two operations: • push(v) – adds a new element (with value v) to the top of the stack • pop – removes the top element from the stack and returns it

  3. Previous work – IBM/Treiber algorithm [1986] • Linked-list based • Shared top pointer next next next pop operation new top top old push operation top  Non-blocking algorithm  Poor scalability (essentially sequential) new

  4. Previous work – Flat-combining [Hendler, Incze, Shavit, Tzafrir, 2010] • A list of operations to be performed • Each thread adds its operation to the list • One of the threads acquires a global lock and performs the combined operation • Other threads spin and wait for their operation to be performed push pop push push pop push  Minimizes synchronization  Blocking algorithm  Limited scalability (essentially sequential)

  5. Previous work – Elimination Backoff (HSY)[Hendler, Shavit, Yerushalmi, 2004] • Eliminating reverse semantics operations • A thread attempts its operation: • On the central stack (IBM/Treiber algorithm) • Elimination Backoff – Eliminate with another thread  pop T1 T1 push( ) T2   Non-blocking algorithm  Provides parallelism – if workloads are symmetric pop T3  Central Stack

  6. Our contributions • DECS – A Dynamic Elimination-Combining Stack algorithm • Dynamically employs either of two techniques: • Elimination • Combining • A non-blocking version (NB-DECS)

  7. DECS – Dynamic Elimination-Combining Stack • Employs IMB/Treiber’s algorithm as a central stack • A thread attempts its operation: • On the central stack • Elimination-Combining Backoff – Eliminate or Combine with another thread

  8. Elimination-Combining layer 1 T1 A thread attempts its operation on the central stack op1 2 If that fails, it registers itself in a publication array 3 It then chooses a random index from the publication array, and looks for another thread Central Stack If no other thread is found, the thread waits T1

  9. Elimination-Combining layer(cont'd) 4 T2 Another thread that fails operating on the central stack also registers in the array and tries to find Another thread op2 5 If it finds another thread with a reverse semantics operation, the operations are eliminated op1 != op2 Central Stack   T1 T2

  10. Elimination-Combining layer(cont'd) 6 T2 If both threads have identical operation semantics, one thread delegates its operation to the other thread op2 op1 == op2 Central Stack T1 T2 delegate thread T1

  11. Multi-Push T1 push T1 Ta Tb    Central Stack

  12. Multi-Pop T1 M = min{stack_size, multi_op_size} pop T1 Ta Tb Tc M     Central Stack

  13. Multi-Eliminate T1    push T1 Ta Tb pop Retry! T2 Tc Td Te T2   

  14. Data-structures

  15. Push & Pop operations

  16. MultiPop function

  17. Collide function

  18. ActiveCollide, Combine functions

  19. MultiEliminate function

  20. PassiveCollide

  21. Experimental Evaluation • Evaluated on an UltraSPARC T2+ – 8 cores CPU (each with 8 hardware threads)  64 hardware threads • Compared DECS with: • Treiber (with exponential backoff) • HSY (elimination backoff) algorithm • Flat-Combining (FC) stack

  22. Symmetric workload50% push – 50% pop Throughput Threads

  23. Moderately Asymmetric75% push – 25% pop Throughput Threads

  24. Fully Asymmetric100% push – 0% pop Throughput Threads

  25. DECS summary  Scalable Provides parallelism even for asymmetric workloads  Blocking

  26. Non-blocking DECS • A non-blocking algorithm is more robust to thread failures • Similar to DECS, but threads that delegate an operation do not wait indefinitely • A thread stops waiting by signaling its delegate thread

  27. NB-DECS - example A thread may stop waiting after some timeout T1 push X T1 Ta Tb X Central Stack

  28. NB-DECS - overhead • Test-and-set validation of each popped element from the central stack • Elements must be popped from the central stack one-by-one • Test-and-set validation on eliminated operations

  29. Symmetric workload50% push – 50% pop Throughput Threads

  30. Moderately Asymmetric75% push – 25% pop Throughput Threads

  31. Moderately Asymmetric25% push – 75% pop Throughput Threads

More Related