1 / 42

Lock-Free Locality-Conscious Linked Lists Design

This article discusses the design and implementation of lock-free, locality-conscious linked lists. It covers concepts such as non-blocking algorithms, existing lock-free list designs, list structure with memory chunks, merges and splits via freezing, and empirical results. The structure of memory chunks, entries, and freeze processes are detailed, showcasing a practical approach to scalable and efficient linked list management.

Download Presentation

Lock-Free Locality-Conscious Linked Lists Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Locality-Conscious Lock-Free Linked Lists Anastasia Braginsky & Erez Petrank 1

  2. Lock-Free Locality-Conscious Linked Lists 3 7 9 12 18 25 26 31 40 52 63 77 89 92  List of constant size ''containers", with minimal and maximal bounds on the number of elements in container  Traverse the list quickly to the relevant container  Lock-free, locality-conscious, fast access, scalable 2

  3. Non-blocking Algorithms  Ensures progress in finite number of steps.  A non-blocking algorithm is: ◦ wait-free if there is a guaranteed per-thread progress in bounded number of steps ◦ lock-free if there is a guaranteed system-wide progress in bounded number of steps ◦ obstruction-free if a single thread executing in isolation for a bounded number of steps will make progress. 3

  4. Existing Lock-Free Lists Designs  J. D. VALOIS, Lock-free linked lists using compare-and-swap, in Proc. PODC, 1995.  T.L. HARRIS, A pragmatic implementation of non- blocking linked-lists, in DISC 2001.  M.M. MICHAEL, Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects, in IEEE 2004.  M. FORMITCHEV, and E. RUPERT. Lock-free linked lists and skip lists, in Proc. PODC, 2004. 4

  5. Outline  Introduction  A list of memory chunks  Design of in-chunk list  Merges & Splits via freezing  Empirical results  Summary 5

  6. The List Structure  A list consists of ◦ A list of memory chunks ◦ A list in each chunk (chunk implementation)  When a chunk gets too sparse or dense, the update operations on the list are stopped and the chunk is split or merged with its preceding chunk. 6

  7. An Example of a List of Fixed-Sized Memory Chunks NULL HEAD Chunk B NextChunk Chunk A NextChunk EntriesHead EntriesHead Key: 89 Data: M Key: 67 Data: D Key: 25 Data: A Key: 3 Data: G Key: 14 Data: K 7

  8. When No More Space for Insertion NULL HEAD Freeze Chunk B NextChunk Chunk A NextChunk EntriesHead EntriesHead Key: 89 Data: M Key: 67 Data: D Key: 25 Data: A Key: 3 Data: G Key: 6 Data: B Key: 14 Data: K Key: 9 Data: C Key: 12 Data: H 8

  9. Split NULL HEAD Freeze Chunk B NextChunk Chunk A NextChunk EntriesHead EntriesHead Key: 89 Data: M Key: 67 Data: D Key: 25 Data: A Key: 3 Data: G Key: 6 Data: B Key: 14 Data: K Key: 9 Data: C Key: 12 Data: H Chunk C Chunk D NextChunk NextChunk EntriesHead EntriesHead Key: 3 Data: G Key: 6 Data: B Key: 9 Data: C Key: 14 Data: K Key: 12 Data: H 9

  10. Split NULL HEAD Freeze Chunk B NextChunk Chunk A NextChunk EntriesHead EntriesHead Key: 89 Data: M Key: 67 Data: D Key: 25 Data: A Key: 3 Data: G Key: 6 Data: B Key: 14 Data: K Key: 9 Data: C Key: 12 Data: H Chunk C Chunk D NextChunk NextChunk EntriesHead EntriesHead Key: 3 Data: G Key: 6 Data: B Key: 9 Data: C Key: 14 Data: K Key: 12 Data: H 10

  11. When a Chunk Gets Sparse NULL HEAD Freeze slave Chunk C Chunk B NextChunk NextChunk EntriesHead EntriesHead Key: 3 Data: G Key: 6 Data: B Key: 9 Data: C Key: 89 Data: M Key: 67 Data: D Key: 25 Data: A Chunk D NextChunk EntriesHead Key: 14 Data: K Freeze master 11

  12. Merge NULL HEAD Freeze slave Chunk C Chunk B NextChunk NextChunk EntriesHead EntriesHead Key: 3 Data: G Key: 6 Data: B Key: 9 Data: C Key: 89 Data: M Key: 67 Data: D Key: 25 Data: A Chunk E NextChunk Chunk D NextChunk EntriesHead EntriesHead Key: 3 Data: G Key: 6 Data: B Key: 14 Data: K Key: 9 Data: C Key: 14 Data: K Freeze master 12

  13. Merge NULL HEAD Freeze slave Chunk C Chunk B NextChunk NextChunk EntriesHead EntriesHead Key: 3 Data: G Key: 6 Data: B Key: 9 Data: C Key: 89 Data: M Key: 67 Data: D Key: 25 Data: A Chunk E NextChunk Chunk D NextChunk EntriesHead EntriesHead Key: 3 Data: G Key: 6 Data: B Key: 14 Data: K Key: 9 Data: C Key: 14 Data: K Freeze master 13

  14. Outline  Introduction  A list of memory chunks  Design of in-chunk list  Merges & Splits via freezing  Empirical results  Summary 14

  15. A List of Fixed-Sized Memory Chunks HEAD NULL Chunk B NextChunk Chunk A NextChunk EntriesHead EntriesHead Key: 89 Data: M Key: 67 Data: D Key: 25 Data: A Key: 3 Data: G Key: 14 Data: K 15

  16. The Structure of an Entry  2 machine words  Freeze bit: to mark chunk entries frozen.  A ┴ (bottom) value is not allowed as a key value. It means that entry is not allocated. Next entry pointer Data Key Freeze bit Delete bit Freeze bit 32 bit 31 bit 62 bit KeyData word NextEntry word 16

  17. The Structure of a Chunk Merge Buddy pointer Freeze State 2 bits Head: dummy entry Counter: 4 new pointer NextChunk pointer Key: 24 Data: 78 Deleted bit: 1 Key: 23 Data: 53 Deleted bit: 1 Key: 7 Data: 89 Key: 14 Data: 9 Key: 22 Data: 13 Key: 11 Data: 13 Key: ┴ Key: ┴ Key: ┴ An array of entries of size MAX 17

  18. Initiating a Freeze  When a process p realizes that ◦ A chunk is full, or ◦ A chunk is sparse, or ◦ A chunk is in progress of being frozen,  Then p starts a freeze or p helps another process that has already started a freeze. 18

  19. The Freeze Process Starts by:  Going over all the entries in the array and setting their freeze bit  Finish ◦ insertions of all currently allocated entries that are not yet in the list ◦ deletions of entries already marked as deleted but still in the list 19

  20. Chunk List is Different from Known Lock-Free Linked Lists  Non-private insertion: entry is visible when allocated, even before linking to the list.  Allow help with insertion.  Boundary conditions causing merges and splits. 20

  21. Entry Allocation k:┴ d:0 f:1 k:┴ d:0 f:0 k:3 d:9 f:1 k:4 d:2 f:1 k:8 d:5 f:0 1. Entry is allocated at the beginning of the insertion process 2. Find zeroed entry, with ┴ key value 3. Allocate by swapping the KeyData word to the desired value. ◦ Upon a failure of the CAS command, goto 2. ◦ Frozen entry can not be allocated 4. If no entry is found -- freeze starts Next, use allocated entry for list insertion…  21

  22. Entry Allocation k:┴ d:0 f:1 k:3 d:9 f:1 k:4 d:2 f:1 k:8 d:5 f:0 k:6 d:2 f:0 1. Entry is allocated at the beginning of the insertion process 2. Find zeroed entry, with ┴ key value 3. Allocate by swapping the KeyData word to the desired value. ◦ Upon a failure of the CAS command, goto 2. ◦ Frozen entry can not be allocated 4. If no entry is found -- freeze starts Next, use allocated entry for list insertion…  22

  23. previous next Insertion Algorithm k:┴ d:0 f:1 k:3 d:9 f:1 k:4 d:2 f:1 k:8 d:5 f:0 k:6 d:2 f:0 Record entry’s next pointer value in savedNext. 1. Find a location for adding the new entry. 2. ◦ If key already exists (in a different entry) – free allocated entry by clearing it and return. CAS entry’s next pointer from savedNext to the next entry in the list CAS previous entry’s next pointer to newly allocated entry 3. 4. ◦ If any CAS fails, goto 1 (restarting from the beginning of a chunk) Increase the counter and return 5. 23

  24. previous next Insertion Algorithm k:┴ d:0 f:1 k:3 d:9 f:1 k:4 d:2 f:1 k:8 d:5 f:0 k:6 d:2 f:0 Record entry’s next pointer value in savedNext. 1. Find a location for adding the new entry. 2. ◦ If key already exists (in a different entry) – free allocated entry by clearing it and return. CAS entry’s next pointer from savedNext to the next entry in the list CAS previous entry’s next pointer to newly allocated entry 3. 4. ◦ If any CAS fails, goto 1 (restarting from the beginning of a chunk) Increase the counter and return 5. 24

  25. previous next Insertion Algorithm k:┴ d:0 f:1 k:3 d:9 f:1 k:4 d:2 f:1 k:8 d:5 f:0 k:6 d:2 f:0 Record entry’s next pointer value in savedNext. 1. Find a location for adding the new entry. 2. ◦ If key already exists (in a different entry) – free allocated entry by clearing it and return. CAS entry’s next pointer from savedNext to the next entry in the list CAS previous entry’s next pointer to newly allocated entry 3. 4. ◦ If any CAS fails, goto 1 (restarting from the beginning of a chunk) Increase the counter and return 5. 25

  26. Deletion  Standard implementation, except for taking care not to get under the minimum number of entries  Counter always holds a lower bound on the actual number of entries. ◦ increased after actual insert ◦ decreased before actual delete  Decrementing the counter below the minimum allowed number, initiates a freeze  Frozen entry can not be marked as deleted 26

  27. Outline  Introduction  A list of memory chunks  Design of in-chunk list  Merges & Splits via freezing  Empirical results  Summary 28

  28. Freezing  Phase I: Marking entries with frozen bits ◦ Non-frozen entries can still change concurrently  Phase II: List stabilization ◦ Everything frozen, now finish all incomplete operations.  Phase III: Decision ◦ Split, merge, or copy.  Phase IV: Recovery ◦ Implementation of the above decision 29

  29. Phase IV - Recovery  Allocate new chunk or chunks locally  Copy the frozen data to the new chunk  Execute the operation that initially caused the freeze  Attach the new chunk to the frozen one  Replace frozen chunk(s) with new chunk(s) in the entire List’s data structure 30

  30. Remarks  Search can run on a frozen chunk (and is not delayed). ◦ Wait-free except for the use of the hazard pointer mechanism  A chunk can never be unfrozen 31

  31. Outline  Introduction  A list of memory chunks  Design of in-chunk list  Merges & Splits via freezing  Empirical results  Summary 32

  32. The Test Environment  Platform: SUN FIRE with UltraSPARC T1 8-core processor, each core running 4 hyper-threads.  OS: Solaris 10  Chunk size set to virtual page size -- 8KB. ◦ All accesses inside a chunk are on the same page 33

  33. Workload  Each test had two stages: ◦ Stage I:  Insertions (only) of N random keys (in order to obtain a substantial list)  N: 103, 104, 105, 106 ◦ Stage II:  Insertions, deletions and searches in parallel  N operations overall out of which 15% insertions, 15% deletions, and 70% searches.  Reporting results for runs of 32 concurrent threads. 34

  34. Reference for Comparison  Michael’s lock-free linked list implemented in C according to the pseudo-code from ◦ MICHAEL, M. M., Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects., in IEEE 2004. ◦ Uses hazard pointers.  A Java implementation of the lock-free linked list provided in the book “The Art of Multiprocessor Programming” ◦ Garbage collection is assumed. 35

  35. Comparison with Michael’s List Total Time Already at 20000 we get same performance Constantly better performance. For substantial lists in more then 10 times More then 10 times faster Stage II total time / N Original List Stage I total time / N Chunk List Original List Chunk List 1000 1000 368.08 237.93 100 100 33.91 27.00 24.68 20.269 10 logarithmic 10 4.90 logarithmic 2.050 time (s) 1.16 1.15 time (s) scale 0.56 1 1 scale 0.16 0.071 0.1 0.1 0.01 0.01 0.01 0.01 0.004 0.001 0.001 1000 10000 100000 1000000 1000 10000 100000 1000000 N N 36

  36. Comparison with Michael’s List Single Operation Average Better performance, as lists are going more substantial Again constantly better performance 37

  37. Comparison with Lock-Free List in Java Total Times 38

  38. Comparison with Lock-Free List in Java Single Operation Average 39

  39. Outline  Introduction  A list of memory chunks  Design of in-chunk list  Merges & Splits via freezing  Empirical results  Summary 40

  40. Conclusion  New lock-free algorithm for chunked linked list  Fast due to: ◦ Skips over chunks ◦ Restarting from the beginning of a chunk ◦ Locality-conscious  May be useful for other structures that can use the chunks  Good empirical results for the substantial lists 41

  41. Questions? 42

  42. Thank you !! 43

More Related