fast and lock free concurrent priority queues for multi thread systems l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems PowerPoint Presentation
Download Presentation
Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems

Loading in 2 Seconds...

play fullscreen
1 / 36

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems - PowerPoint PPT Presentation


  • 175 Views
  • Uploaded on

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems. Håkan Sundell Philippas Tsigas. Outline. Synchronization Methods Priority Queues Concurrent Priority Queues Lock-Free Algorithm: Problems and Solutions Experiments Conclusions. Synchronization.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems' - brinly


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
fast and lock free concurrent priority queues for multi thread systems

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems

Håkan Sundell

Philippas Tsigas

outline
Outline
  • Synchronization Methods
  • Priority Queues
  • Concurrent Priority Queues
    • Lock-Free Algorithm: Problems and Solutions
  • Experiments
  • Conclusions
synchronization
Synchronization
  • Shared data structures needs synchronization
  • Synchronization using Locks
    • Mutually exclusive access to whole or parts of the data structure

P1

P2

P3

P1

P2

P3

blocking synchronization
Blocking Synchronization
  • Drawbacks
    • Blocking
    • Priority Inversion
    • Risk of deadlock
  • Locks: Semaphores, spinning, disabling interrupts etc.
    • Reduced efficiency because of reduced parallelism
non blocking synchronization
Non-blocking Synchronization
  • Lock-Free Synchronization
    • Optimistic approach
      • Assumes it’s alone and prepares operation which later takes place (unless interfered) in one atomic step, using hardware atomic primitives
      • Interference is detected via shared memory and the atomic primitives
      • Retries until not interfered by other operations
        • Can cause starvation
non blocking synchronization6
Non-blocking Synchronization
  • Lock-Free Synchronization
    • Avoids problems with locks
    • Simple algorithms
    • Fast when having low contention
  • Wait-Free Synchronization
    • Always finishes in a finite number of its own steps.
      • Complex algorithms
      • Memory consuming
      • Less efficient in average than lock-free
priority queues
Priority Queues
  • Fundamental data structure
  • Works on a set of <value,priority> pairs
  • Two basic operations:
    • Insert(v,p): Adds a new element to the priority queue
    • v=DeleteMin(): Removes the element <v,p> with the highest priority
sequential priority queues
Sequential Priority Queues
  • All implementations involves search phase in either Insert or DeleteMin
    • Arrays. Maximum complexity O(N)
    • Ordered Lists. O(N)
    • Trees. O(log N)
      • Heaps. O(log N)
    • Advanced structures (i.e. combinations)
randomized algorithm skip lists
Randomized Algorithm: Skip Lists
  • William Pugh: ”Skip Lists: A Probabilistic Alternative to Balanced Trees”, 1990
    • Layers of ordered lists with different densities, achieves a tree-like behavior
    • Time complexity: O(log2N) – probabilistic!

Head

Tail

25%

50%

1

2

3

4

5

6

7

why skip lists for concurrent priority queues
Why Skip Lists for Concurrent Priority Queues?
  • Ordered Lists is simpler than Trees
    • Easier to make efficient concurrently
  • Search complexity is important
    • Skip Lists is an alternative to Trees
  • Lotan and Shavit: “Skiplist-Based Concurrent Priority Queues”, 2000
    • Implementation using multiple locks

L

L

L

L

L

L

L

L

L

L

L

L

L

L

1

L

2

L

3

L

4

L

5

L

6

L

7

L

our lock free concurrent skip list
Our Lock-Free Concurrent Skip List
  • Define node state to depend on the insertion status at lowest level as well as a deletion flag
  • Insert from lowest level going upwards
  • Set deletion flag. Delete from highest level going downwards

1

D

2

D

3

D

4

D

5

D

6

D

7

D

3

2

1

p

3

2

1

p

D

overlapping operations on shared data
Overlapping operations on shared data

Insert 2

2

  • Example: Insert operation- which of 2 or 3 gets inserted?
  • Solution: Compare-And-Swap atomic primitive:CAS(p:pointer toword, old:word, new:word):booleanatomic doif *p = oldthen *p := new; returntrue;elsereturnfalse;

1

4

3

Insert 3

dynamic memory management
Dynamic Memory Management
  • Problem: System memory allocation functionality is blocking!
  • Solution (lock-free), IBM freelists:
    • Pre-allocate a number of nodes, link them into a dynamic stack structure, and allocate/reclaim using CAS

Allocate

Head

Mem 1

Mem 2

Mem n

Reclaim

Used 1

concurrent insert vs delete operations
Concurrent Insert vs. Delete operations

b)

1

2

4

a)

  • Problem:- both nodes are deleted!
  • Solution (Harris et al): Use bit 0 of pointer to mark deletion status

Delete

3

Insert

b)

1

2

*

4

a)

c)

3

the aba problem
The ABA problem
  • Problem: Because of concurrency (pre-emption in particular), same pointer value does not always mean same node (i.e. CAS succeeds)!!!

Step 1:

1

6

7

4

Step 2:

2

3

7

4

the aba problem16
The ABA problem
  • Solution: (Valois et al)Add reference counting to each node, in order to prevent nodes that are of interest to some thread to be reclaimed until all threads have left the node

1

*

6

*

New Step 2:

1

1

CAS Failes!

2

3

7

?

?

?

4

1

helping scheme
Helping Scheme
  • Threads need to traverse safely
  • Need to remove marked-to-be-deleted nodes while traversing – Help!
  • Finds previous node, finish deletion and continues traversing from previous node

or

2

*

4

2

*

4

1

1

?

?

2

*

4

1

back off strategy
Back-Off Strategy
  • For pre-emptive systems, helping is necessary for efficiency and lock-freeness
  • For really concurrent systems, overlapping CAS operations (caused by helping and others) on the same node can cause heavy contention
  • Solution: For every failed CAS attempt, back-off (i.e. sleep) for a certain duration, which increases exponentially
our lock free algorithm
Our Lock-Free Algorithm
  • Based on Skip Lists
    • Treated as layers of ordered lists
  • Uses CAS atomic primitive
  • Lock-Free memory management
    • IBM Freelists
    • Reference counting
  • Helping scheme
  • Back-Off strategy
  • All together proved to be linearizable
experiments
Experiments
  • 1-30 threads on platforms with different levels of real concurrency
  • 10000 Insert vs. DeleteMin operations by each thread. 100 vs. 1000 initial inserts
  • Compare with other implementations:
    • Lotan and Shavit, 2000
    • Hunt et al “An Efficient Algorithm for Concurrent Priority Queue Heaps”, 1996
conclusions
Conclusions
  • Our work includes a Real-Time extension of the algorithm, using time-stamps and a time-stamp recycling scheme
  • Our lock-free algorithm is suitable for both pre-emptive as well as systems with full concurrency
    • Will be available as part of NOBLE software library, http://www.noble-library.org
  • See Technical Report for full details,http://www.cs.chalmers.se/~phs
questions
Questions?
  • Contact Information:
    • Address: Håkan Sundell vs. Philippas Tsigas Computing Science Chalmers University of Technology
    • Email: <phs , tsigas> @ cs.chalmers.se
    • Web: http://www.cs.chalmers.se/~phs/warp
the algorithm in more detail
The algorithm in more detail
  • Insert:
    • Create node with random height
    • Search position (Remember drops)
    • Insert or update on level 1
    • Insert on level 2 to top (unless already deleted)
    • If deleted then HelpDelete(1)
  • All of this while keeping track of references, help deleted nodes etc.
the algorithm in more detail30
The algorithm in more detail
  • DeleteMin
    • Mark first node at level 1 as deleted, otherwise HelpDelete(1) and retry
    • Mark next pointers on level 1 to top
    • Delete on level top to 1 while detecting helping, indicate success
    • Free node
  • All of this while keeping track of references, help deleted nodes etc.
the algorithm in more detail31
The algorithm in more detail
  • HelpDelete(level)
    • Mark next pointer at level to top
    • Find previous node (info in node)
    • Delete on level unless already helped, indicate success
    • Return previous node
  • All of this while keeping track of references, help deleted nodes etc.
correctness
Correctness
  • Linearizability (Herlihy 1991)
    • In order for an implementation to be linearizable, for every concurrent execution, there should exist an equal sequential execution that respects the partial order of the operations in the concurrent execution
correctness33
Correctness
  • Define precise sequential semantics
  • Define abstract state and its interpretation
    • Show that state is atomically updated
  • Define linearizability points
    • Show that operations take effect atomically at these points with respect to sequential semantics
  • Creates a total order using the linearizability points that respects the partial order
    • The algorithm is linearizable
correctness34
Correctness
  • Lock-freeness
    • At least one operation should always make progress
  • There are no cyclic loop depencies, and all potentially unbounded loops are ”gate-keeped” by CAS operations
    • The CAS operation guarantees that at least one CAS will always succeed
      • The algorithm is lock-free
real time extension
Real-Time extension
  • DeleteMin operations should ignore nodes that are inserted after the DeleteMin operation started
    • Nodes are inserted together with a timestamp
    • Because timestamps are only used for relative comparisons, no need for a real-time clock
      • Generate time-stamps by increasing function
real time extension36
Real-Time extension
  • Timestamps are potentially unbounded and will overflow
    • Recycle ”wrapped-over” timestamp values by having TagFieldSize=MaxTag*2
  • Timestamps at nodes can stay forever (MaxTag => unlimited)
    • Every operation traverses one step through the Skiplist and updates ”too old” timestamps