wait free queues with multiple enqueuers and dequeuers n.
Skip this Video
Loading SlideShow in 5 Seconds..
Wait-Free Queues with Multiple Enqueuers and Dequeuers PowerPoint Presentation
Download Presentation
Wait-Free Queues with Multiple Enqueuers and Dequeuers

Loading in 2 Seconds...

  share
play fullscreen
1 / 51
Download Presentation

Wait-Free Queues with Multiple Enqueuers and Dequeuers - PowerPoint PPT Presentation

totie
176 Views
Download Presentation

Wait-Free Queues with Multiple Enqueuers and Dequeuers

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Wait-Free Queues with Multiple Enqueuers and Dequeuers Alex KoganErezPetrank Computer Science, Technion, Israel

  2. FIFO queues • One of the most fundamental and common data structures enqueue dequeue 5 3 2 9

  3. Concurrent FIFO queues • Concurrent implementation supports “correct” concurrent adding and removing elements • correct = linearizable • The access to the shared memory should be synchronized enqueue dequeue 3 2 9 dequeue dequeue empty! dequeue

  4. Non-blocking synchronization • No thread is blocked in waiting for another thread to complete • e.g., no locks / critical sections • Progress guarantees: • Obstruction-freedom • progress is guaranteed only in the eventual absence of interference • Lock-freedom • among all threads trying to apply an operation, one will succeed • Wait-freedom • a thread completes its operation in a bounded number of steps

  5. Lock-freedom • Among all threads trying to apply an operation, one will succeed • opportunistic approach • make attempts until succeeding • global progress • all but one threads may starve • Many efficient and scalable lock-free queue implementations

  6. Wait-freedom • A thread completes its operation in a bounded number of steps • regardless of what other threads are doing • A highly desired property of any concurrent data structure • but, commonly regarded as inefficient and too costly to achieve • Particularly important in several domains • real-time systems • operating under SLA • heterogeneous environments

  7. Related work: existing wait-free queues • Limited concurrency • one enqueuer and one dequeuer • multiple enqueuers, one concurrent dequeuer • multiple dequeuers, one concurrent enqueuer • Universal constructions • generic method to transform any (sequential) object into lock-free/wait-free concurrent object • expensive impractical implementations • (Almost) no experimental results [Lamport’83] [David’04] [Jayanti&Petrovic’05] [Herlihy’91]

  8. Related work: lock-free queue [Michael & Scott’96] • One of the most scalable and efficient lock-free implementations • Widely adopted by industry • part of Java Concurrency package • Relatively simple and intuitive implementation • Based on singly-linked list of nodes 12 4 17 head tail

  9. MS-queue brief review: enqueue CAS 12 17 9 4 CAS head tail enqueue 9

  10. MS-queue brief review: enqueue CAS 12 17 9 5 4 CAS head tail CAS enqueue enqueue 9 5

  11. MS-queue brief review: dequeue 12 17 9 4 12 CAS head tail dequeue

  12. Our idea (in a nutshell) • Based on the lock-free queue by Michael & Scott • Helping mechanism • each operation is applied in a bounded time • “Wait-free” implementation scheme • each operation is applied exactly once

  13. Helping mechanism • Each operation is assigned a dynamic age-based priority • inspired by the Doorway mechanism used in Bakery mutex Each thread accessing a queue • chooses a monotonically increasing phase number • writes down its phase and operation info in a special state array • helps all threads with a non-larger phase to apply their operations phase: long pending: boolean state entry per thread enqueue: boolean node: Node

  14. Helping mechanism in action 4 phase 9 3 9 false true true pending false true enqueue true true true null ref ref ref node

  15. Helping mechanism in action 4 phase 9 10 9 I need to help! true true false pending true enqueue true true true true null ref ref ref node

  16. Helping mechanism in action 4 phase 10 9 9 I do not need to help! true pending false true true true true true true enqueue ref null ref node ref

  17. Helping mechanism in action 4 phase 10 11 9 I need to help! I do not need to help! false true pending true true true true true enqueue false ref null ref null node

  18. Helping mechanism in action • The number of operations that may linearize before any given operation is bounded • hence, wait-freedom phase 10 9 11 4 pending true true false true true true enqueue false true ref null null ref node

  19. Optimized helping • The basic scheme has two drawbacks: • the number of steps executed by each thread on every operation depends on n (the number of threads) • even when there is no contention • creates scenarios where many threads help same operations • e.g., when many threads access the queue concurrently • large redundant work • Optimization: help one thread at a time, in a cyclic manner • faster threads help slower peers in parallel • reduces the amount of redundant work

  20. How to choose the phase numbers • Every time tichooses a phase number, it is greater than the number of any thread that made its choice before ti • defines a logical order on operations and provides wait-freedom • Like in Bakery mutex: • scan through state • calculate the maximal phase value + 1 • requires O(n) steps • Alternative: use an atomic counter • requires O(1) steps 4 3 5 true false true true true true ref null ref 6!

  21. “Wait-free” design scheme • Break each operation into three atomic steps • can be executed by different threads • cannot be interleaved • Initial change of the internal structure • concurrent operations realize that there is an operation-in-progress • Updating the state of the operation-in-progress as being performed (linearized) • Fixing the internal structure • finalizing the operation-in-progress

  22. Internal structures 1 2 4 head tail 9 phase 4 9 false false false pending true false enqueue true null node null null state

  23. Internal structures these elements were enqueued by Thread 0 this element was enqueued by Thread 1 enqTid: int 2 4 1 holds ID of the thread that performs / has performed the insertion of the node into the queue 0 1 0 -1 1 -1 head tail 9 phase 9 4 false pending false false true true false enqueue node null null null state

  24. Internal structures this element was dequeued by Thread 1 deqTid: int 1 4 2 holds ID of the thread that performs / has performed the removal of the node into the queue 0 1 0 1 -1 -1 head tail 9 4 phase 9 false false pending false true false true enqueue null null null node state

  25. enqueue operation Creating a new node 12 6 4 17 0 1 2 0 -1 -1 -1 -1 head tail phase 4 9 9 false false pending false enqueue true true false enqueue node null null null 6 state ID: 2

  26. enqueue operation Announcing a new operation 6 17 4 12 2 0 0 1 -1 -1 -1 -1 head tail 4 9 10 phase pending true false false enqueue false true enqueue true null null node 6 state ID: 2

  27. enqueue operation Step 1: Initial change of the internal structure CAS 17 4 12 6 0 0 1 2 -1 -1 -1 -1 head tail 4 10 9 phase true false false pending enqueue true true false enqueue node null null 6 state ID: 2

  28. enqueue operation Step 2: Updating the state of the operation-in-progress as being performed 6 17 4 12 0 2 0 1 -1 -1 -1 -1 head tail CAS 4 9 phase 10 pending false false false enqueue enqueue true true false null null node 6 state ID: 2

  29. enqueue operation Step 3: Fixing the internal structure 6 17 4 12 1 0 2 0 -1 -1 -1 -1 CAS head tail phase 4 9 10 pending false false false enqueue false true true enqueue null null node 6 state ID: 2

  30. enqueue operation Step 1: Initial change of the internal structure 6 17 4 12 2 0 0 1 -1 -1 -1 -1 head tail 10 4 phase 9 false pending false true enqueue enqueue true enqueue true false null null node 3 6 state ID: 2 ID: 0

  31. enqueue operation Creating a new node Announcing a new operation 3 6 17 4 12 0 1 0 2 0 -1 -1 -1 -1 -1 head tail phase 10 11 4 true false true pending enqueue enqueue true true enqueue true node null 3 6 state ID: 0 ID: 2

  32. enqueue operation Step 2: Updating the state of the operation-in-progress as being performed 3 17 4 12 6 0 0 0 1 2 -1 -1 -1 -1 -1 head tail phase 10 11 4 true false true pending enqueue enqueue true enqueue true true null node 3 6 state ID: 0 ID: 2

  33. enqueue operation Step 2: Updating the state of the operation-in-progress as being performed 3 12 4 17 6 0 1 0 0 2 -1 -1 -1 -1 -1 head tail CAS 4 10 11 phase false true pending false enqueue enqueue true enqueue true true node null 3 6 state ID: 0 ID: 2

  34. enqueue operation Step 3: Fixing the internal structure 3 12 17 4 6 0 0 0 1 2 -1 -1 -1 -1 -1 CAS head tail phase 11 10 4 pending false false true enqueue enqueue true enqueue true true null node 3 6 state ID: 0 ID: 2

  35. enqueue operation Step 1: Initial change of the internal structure CAS 12 17 3 4 6 0 0 1 0 2 -1 -1 -1 -1 -1 head tail 11 4 phase 10 pending false false true enqueue enqueue true enqueue true true node null 3 6 state ID: 0 ID: 2

  36. dequeue operation 17 4 12 0 1 0 -1 -1 -1 head tail 9 4 phase 9 pending false false false dequeue true false enqueue true null node null null state ID: 2

  37. dequeue operation Announcing a new operation 17 4 12 0 1 0 -1 -1 -1 head tail 4 10 phase 9 pending true false false dequeue true false enqueue false null null node null state ID: 2

  38. dequeue operation Updating state to refer the first node 4 17 12 0 1 0 -1 -1 -1 head tail phase 10 9 4 pending true false false dequeue false enqueue true false CAS null null node state ID: 2

  39. dequeue operation Step 1: Initial change of the internal structure 17 4 12 CAS 0 1 0 -1 2 -1 head tail 9 phase 10 4 pending false true false dequeue false true false enqueue null null node state ID: 2

  40. dequeue operation Step 2: Updating the state of the operation-in-progress as being performed 17 4 12 0 1 0 -1 -1 2 head tail CAS 10 4 9 phase false false false pending dequeue true false false enqueue null node null state ID: 2

  41. dequeue operation Step 3: Fixing the internal structure 17 4 12 0 1 0 -1 -1 2 head CAS tail 9 10 phase 4 false false false pending dequeue false enqueue true false null null node state ID: 2

  42. Performance evaluation

  43. Benchmarks • Enqueue-Dequeue benchmark • the queue is initially empty • each thread iteratively performs enqueue and then dequeue • 1,000,000 iterations per thread • 50%-Enqueuebenchmark • the queue is initialized with 1000 elements • each thread decides uniformly and random which operation to perform, with equal odds for enqueue and dequeue • 1,000,000 operations per thread

  44. Tested algorithms Compared implementations: • MS-queue • Base wait-free queue • Optimized wait-free queue • Opt 1: optimized helping (help one thread at a time) • Opt 2: atomic counter-based phase calculation • Measure completion time as a function of # threads

  45. Enqueue-Dequeuebenchmark • TBD: add figures

  46. The impact of optimizations • TBD: add figures

  47. Optimizing further: false sharing • Created on accesses to state array • Resolved by stretching the state with dummy pads • TBD: add figures

  48. Optimizing further: memory management • Every attempt to update state is preceded by an allocation of a new record • these records can be reused when the attempt fails • (more) validation checks can be performed to reduce the number of failed attempts • When an operation is finished, remove the reference from state to a list node • help garbage collector

  49. Implementing the queue without GC • Apply Hazard Pointers technique [Michael’04] • each thread is associated with hazard pointers • single-writer multi-reader registers • used by threads to point on objects they may access later • when an object should be deleted, a thread stores its address in a special stack • once in a while, it scans the stack and recycle objects only if there are no hazard pointers pointing on it • In our case, the technique can be applied with a slight modification in the dequeue method

  50. Summary • First wait-free queue implementation supporting multiple enqueuers and dequeuers • Wait-freedom incurs an inherent trade-off • bounds the completion time of a single operation • has a cost in a “typical” case • The additional cost can be reduced and become tolerable • Proposed design scheme might be applicable for other wait-free data structures