1 / 14

Implementation of Locks

Implementation of Locks. David Gregg Trinity College Dublin. Process Synchronization. Need to be able to coordinate processes working on a common task Lock variables (mutexes) are used to coordinate or synchronize processes

toconnell
Download Presentation

Implementation of Locks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementation of Locks David Gregg Trinity College Dublin

  2. Process Synchronization • Need to be able to coordinate processes working on a common task • Lock variables (mutexes) are used to coordinate or synchronize processes • Need an architecture-supported arbitration mechanism to decide which processor gets access to the lock variable • Single bus provides arbitration mechanism, since the bus is the only path to memory – the processor that gets the bus wins • Need an architecture-supported operation that locks the variable • Locking can be done via an atomic compare and swap operation (processor can both read a location and set it to the locked state – test-and-set – in the same bus operation)

  3. Atomic Compare and Swap • Machine instruction used to synchronize threads • Threads may be running on different cores • Compare and swap acts like a simple C function bool compare_and_swap(int * address, int testval, int newval) { int old_val = *address; if (oldval == testval) { *address = newval; } return old_val; }

  4. Atomic Compare and Swap • Compare and swap (CAS) operation completes atomically • There is no possibility of the memory location being modified between the CAS reading the location and writing to it • Can be used to implement locks • E.g, 0 means unlocked, 1 means locked: value = compare_and_swap(lock, 0, 1); if ( value == 0 ) /* we have the lock */ else /* we failed to acquire the lock */

  5. Atomic Compare and Swap • Lock algorithms can be built using compare and swap operations • Most locks use compare and swap to attempt to acquire the lock • Some specialist lock algorithms use other types of instructions • Major difference between different locking algorithms is what to do if you fail to acquire the lock • Two main choices • Go into a loop and repeatedly try to acquire the lock • Known as a spin lock • Keeps spinning until another thread releases the lock • Put the current thread to sleep and add it to a queue of threads waiting for the lock to be released

  6. Atomic Compare and Swap • Spin lock algorithms can be built using compare and swap operations void acquire_lock(int * lock) { do { bool old_val = compare_and_swap(lock, 0, 1); } while (old_val == 1); } void release_lock(int * lock) { fence(); *lock = 0; //maybe we need a memory fence on } // some architectures

  7. Spin Spin Lock Synchronization Read lock variable unlock variable: set lock variable to 0 No Unlocked? (=0?) Yes Finish update of shared data atomic operation Try to lock variable using swap: read lock variable and set it to locked value (1) . . . No Yes Begin update of shared data Succeed? (=0?) The single winning processor will read a 0 - all others processors will read the 1 set by the winning processor

  8. Memory fences • Modern superscalar processors typically execute instructions out of order • In a different order to the original program order • Loads and stores can be executed out of order • Processor may speculate that the load takes data from a different address to the store • Processor may actually have load and store addresses, and thus be able to tell which conflict and which are independent

  9. Memory fences • Each core typically has a load/store queue • Stores are written to the queue, and will be flushed to memory “eventually” • A load may take data directly from the load/store queue if there has been a recent store to the location • So a load instruction may not ever reach memory • If there is already a store to a memory location in the load/store queue, and additional store may overwrite the existing store

  10. Atomic Compare and Swap • Spin locks are usually faster than putting a waiting thread to sleep • Acquire the lock as soon as it released • No need to interact with operating system • Works well when lock is seldom contended • But wasteful of resources and energy when the lock is contended • Thread busy spins doing busy work • Large number of memory operations on bus • Disaster scenario is spinning while waiting for a thread that is not running

  11. Atomic Compare and Swap • Sleeping locks usually slower • Need operating system call to put thread to sleep • Need to manage queue of waiting threads • Every time the lock is released, you must check the list of waiters • Sleeping locks make better use of total resources • Core is released while thread waits • Works well when lock is often contended and there are more threads than cores • Especially important in virtualized systems • Many lock algorithms are hybrids • Spin for a little while to see if lock is release quickly • Then go to sleep if it is not • Used by Linux futex (fast mutex) library

  12. Locking variants • There are many variants of these basic locks • Spin locks with “back off” • If the lock is contended the thread waits a while before trying again • Avoids the waiting threads constantly trying to access the cache line containing the lock • Can help where the lock is contended by several threads • Queueing spin locks • Makes the queueing fairer

  13. Atomic Increment • Alternative to atomic compare and swap • Atomically increments a value in memory • Can be used for multithreaded counting without locks • Atomic increment locks • Can build locks from atomic increment • Known as ticket lock • Based on ticket system in passport office, GNIB or Argos • Queueing spin lock • Multiple threads can be waiting for a lock • With a regular spin lock, all waiting threads compete for the lock when it becomes free • With a queuing spin lock, the waiting threads for an orderly FIFO queue • Guarantees fairness

  14. Atomic Increment struct ticket_lock { int dispense; int current_ticket; }; void acquire_lock(struct ticket_lock * lock) { int my_ticket = atomic_increment(&(lock->dispense)); while ( my_ticket != lock->current_ticket); } void unlock(struct ticket_lock * lock) { lock->current_ticket++; }

More Related