As multicore CPUs become more common, scalable synchronization primitives have wider use and ideas previously used in large-scale computation are worth re-opening for wider use. In this paper I explore one approach to scalable synchronization, a minimal-contention lock (M-lock). The key idea is to avoid spinning on a global variable but instead for each blocked task (process or thread) to spin on a local lock representing the task that immediately preceded it in attempting to acquire the lock. This creates an ordering based on the order in which tasks attempt to acquire the lock, preventing starvation. The only globally shared variable is a pointer to the next local lock to be contended for. Each contending task swaps the value of this pointer for a pointer to its own variable. It spins on the variable previously pointed to by the global pointer. Each waiting task spins on a lock only seen by itself and the owner of that lock variable. While a task is spinning, the lock variable can be held in its local cache until invalidated by the lock owner when it unsets the lock. Consequently, the amount of bus traffic is considerably less than with a spinlock, which has the pernicious feature that the task releasing the lock is delayed by all the other bus traffic arising from contention for the lock. An MCS lock has similar properties but is more complicated and requires more memory contention-causing operations. This paper outlines the design of the M-lock and provides a preliminary performance analysis.
 A preliminary study of minimal-contention locks
A preliminary study of minimal-contention locks