preemption, pthread_spin_lock and atomic built-in - c

According to this question here by using pthread_spin_lock is dangerous to lock a critical section as the thread might be interrupted by the scheduler out of the bloom and other threads contenting on that resource might be left spinning.
Suppose that I decide to switch from pthread_spin_lock to locks implemented via atomic built-in + compare_and_swap idion: will this thing improve or still I will suffer from this issue?
Since with pthread it seems to be nothing to disable preemption, is there something I can do in case I use locks implemented via atomics or anything I can have a look at?
I am interested in locking a small critical region.

pthread_mutex_lock typically has a fast path which uses an atomic operation to try to acquire the lock. In the event that the lock is not owned, this can be very fast. Only if the lock is already held, does the thread enter the kernel via a system call. The kernel acquires a spin-lock, and then reattempts to acquire the mutex in case it was released since the first attempt. If this attempt fails, the calling thread is added to a wait queue associated with the mutex, and a context switch is performed. The kernel also sets a bit in the mutex to indicate that there is a waiting thread.
pthread_mutex_unlock also has a fast path. If the waiting thread flag is clear, it can simply release the lock. If the flag is set, the thread must enter the kernel via a system call so the waiting thread can be woken. Again, the kernel must acquire a spin lock so that it can manipulate its thread control data structures. In the event that there is no thread waiting after all, the lock can be released by the kernel. If there is a thread waiting, it is made runnable, and ownership of the mutex is transferred without it being released.
There are many subtle race conditions in this little dance, and hopefully it all works properly.
Since a thread that attempts to acquire a locked mutex is context switched out, it does not prevent the thread the owns the mutex from running, which gives the owner an opportunity to exit its critical section and release the mutex.
In contrast, a thread that attempts to acquire a locked spin-lock simply spins, consuming CPU cycles. This has the potential of preventing the thread that owns the spin-lock from exiting its critical section and releasing the lock. The spinning thread can be preempted when its timeslice has been consumed, allowing the thread that owns the lock to eventually regain control. Of course, this is not great for performance.
In practice, spin-locks are used where there is no chance that the thread can be preempted while it owns the lock. A kernel may set a per-cpu flag to prevent it from performing a context switch from an interrupt service routine (or it may raise the interrupt priority level to prevent interrupts that can cause context switches, or it may disable interrupts altogether). A user thread can prevent itself from being preempted (by other threads in the same process) by raising its priority. Note that, in a uniprocessor system, preventing the current thread from being preempted eliminates the need for the spin lock. Alternatively, in a multiprocessor system, you can bind threads to cpus (cpu affinity) so that they cannot preempt one another.
All locks ultimately require an atomic primitive (well, efficient locks; see here for a counter example). Mutexes can be inefficient if they are highly contended, causing threads to constantly enter the kernel and be context switched; especially if the critical section is smaller than the kernel overhead. Spin locks can be more efficient, but only if the owner cannot be preempted and the critical section is short. Note that the kernel must still acquire a spin lock when a thread attempts to acquire a locked mutex.
Personally, I would use atomic operations for things like shared counter updates, and mutexes for more complex operations. Only after profiling would I consider replacing mutexes with spin locks (and figure out how to deal with preemption). Note that if you intend to use condvars, you have no choice but to use mutexes.

Related

Spin_lock and mutex lock order

I got test question (interview).
You need to grab both spin_lock and mutex in order to do something.
What is the correct order of acquiring? Why?
I have some thoughts about this but no strong opinion about answer.
The reasone why you should grub lock is protect "critical region" on SMP or "critical region" on single CPU from preemption relative corruption (race). It is very important you machine type SMP or single CPU. It is also important what code inside spin and mutex. Is there kmalloc, vmalloc, mem_cache_alloc, alloc_bootmem or function with __user memory access oe even usleep.
spin_lock - it is the simplest lock in /include/asm/spinlock.h. Only one thread can be locked inside spin_lock in the same time. Any other thread that will try get spin_lock will be spin on the same place (instruction) until previous thread will free spin_lock. Spined thread will not go to sleep. So in the same time with spin_lock you can have two or more threads that will do something (one work and one spin). It is impossible on single CPU machine. But very good work on SMP. Code section inside spin_lock should be small and fast. If you code should work on differnt machines try check CONFIG_SMP and CONFIG_PREEMPT.
mutex - on other hand work lick semaphore /inside/asm/semaphore.h but counter is one. If counter is one so only one thread can go inside mutex lock region. Any other thread that will try get lock will see counter is zero becouse one thrad inside. And thread will go to wait queue. It will be woken up when the mutex be released and counter equal one. Thread inside mutex lock can sleep. It can call memory allocation function and get userspace memory.
(SMP)So imagine that you got spinlock and next mutex lock. So only one thread can get first spin and next mutex. But potentially inside mutex lock code cant sleep and it is bad. Because mutex inside spin.
(SMP)If you will get mutex lock and next spin lock. The same situation only one thread can go inside lock region. But between mutex get lock and spinlock code can sleep and also between spin_unlock and mutex free lock it can sleep too. Spin lock will get less unsleep region and it is good.
TL;DR: Lock mutex first and then spin lock.
First you need to avoid such situations and be very careful to avoid deadlocks.
Then, you should consider effects of locking. Mutex may cause thread to block and sleep, while spin lock may cause thread to occupy processor in busy waiting loop. So it is general recommendation to keep critical sections that own a spin lock short in time which leads to following rule of thumb: do not sleep (i.e. by locking mutex) while owning a spin lock or you will waste CPU time.

Why is "sleeping" not allowed while holding a spinlock? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why can't you sleep while holding spinlock?
As far as I know, spinlocks should be used in short duration, and are only choices in code such as interrupt handler where sleeping (preemption) is not allowed.
However, I do not know why there is such a "rule" that there SHOULD BE no sleeping at all while holding a spinlock. I know that it is not a recommended practice (since it is detrimental in performance), but I see no reason why sleeps SHOULD NOT be allowed in spinlocks.
You cannot hold a spin lock while you acquire a semaphore, because you might have to sleep while waiting for the semaphore, and you cannot sleep while holding a spin lock (from "Linux Kernel Development" by Robert Love).
The only reason I can see is for portability reasons, because in uniprocessors, spinlocks are implemented as disabling interrupts, and by disabling interrupts, sleeping is of course not allowed (but sleeping will not break code in SMP systems).
But I am wondering if my reasoning is correct or if there are any other reasons.
There are several reasons why, at least in Linux, sleeping in spinlocks is not allowed:
If thread A sleeps in a spinlock, and thread B then tries to acquire the same spinlock, a uniprocessor system will deadlock. Thread B will never go to sleep (because spinlocks don't have the waitlist necessary to awaken B when A is done), and thread A will never get a chance to wake up.
Spinlocks are used over semaphores precisely because they're more efficient - provided you do not contend for long. Allowing sleeping means that you will have long contention periods, erasing all the benefit of using a spinlock. Your system would be faster just using a semaphore in this case.
Spinlocks are often used to synchronize with interrupt handlers, by additionally disabling interrupts. This use case is not possible if you sleep (once you enter the interrupt handler, you cannot switch back to the thread to let it wake up and finish its spinlock critical section).
Use the right tool for the right job - if you need to sleep, semaphores and mutexes are your friends.
Actually, you can sleep with interrupts disabled or some other sort of exclusion active. If you don't, the condition for which you are sleeping could change state due to an interrupt and then you would never wake up. The sleep code would normally never be entered without an elevated priority or some other critical section that encloses the execution path between the decision to sleep and the context switch.
But for spinlocks, sleep is a disaster, as the lock stays set. Other threads will spin when they hit it, and they won't stop spinning until you wake up from the sleep. That could be an eternity compared to the handful of spins expected in the worst case at a spinlock, because spinlocks exist just to synchronize access to memory locations, they aren't supposed to interact with the context-switching mechanism. (For that matter, every other thread might eventually hit the spinlock and then you would have wedged every thread of every core of the entire system.)
You cannot when you use a spin lock as it is meant to be used. Spin locks are used where really necessary to protect critical regions and shared data structures. If you acquire one while also holding a semaphore, you lock access to whichever critical region (say) your lock is attached to (it is typically a member of a specific larger data structure), while allowing this process to possibly be put to sleep. If, say, an IRQ is raised while this process sleeps, and the IRQ handler needs access to the critical region still locked away, it's blocked, which can never happen with IRQs. Obviously, you could make up examples where your spin lock isn't used the way it should be (a hypothetical spin lock attached to a nop loop, say); but that's simply not a real spin lock found in Linux kernels.

Mutex access and system call

I know that in Linux mutexes are implemented as futexes down below and futex uses compare-and-swap mechanism. And usually for acquiring locks, a user-space thread does not need to make a system call as the lock is resolved in user-space.
Now my question is what happens when there is high contention and many threads are trying to lock a mutex at the same time. Does a system call occurs then for the kernel to decide which thread to grant the mutex? Especially when thread priorities are different? I myself think so.
As long as there is no contention, there are no system calls made. If there is contention, then a system call is made to place the thread in a sleep queue that will then be used to find the first thread to wake up when the mutex becomes free. Additionally, an adjustment is made in the syscall to the value of the futex so that the currently owning thread will not go through the user-land "fast-path" unlock routine (which simply resets the futex back to a zero or "unlocked" value), but will instead make another system call to check the sleep queue for waiting threads to pass the lock ownership to. With more threads contending for a lock, there is of course going to be a higher chance of a contention being found, but again, if there is no contention, then there is no sys-call made.
Futexes only do a small number of loops before falling back to a syscall, so in case of high lock contension there is a high chance that threads will fallback to a syscall.

When is pthread_spin_lock the right thing to use (over e.g. a pthread mutex)?

Given that pthread_spin_lock is available, when would I use it, and when should one not use them ?
i.e. how would I decide to protect some shared data structure with either a pthread mutex or a pthread spinlock ?
The short answer is that a spinlock can be better when you plan to hold the lock for an extremely short interval (for example to do nothing but increment a counter), and contention is expected to be rare, but the operation is occurring often enough to be a potential performance bottleneck. The advantages of a spinlock over a mutex are:
On unlock, there is no need to check if other threads may be waiting for the lock and waking them up. Unlocking is simply a single atomic write instruction.
Failure to immediately obtain the lock does not put your thread to sleep, so it may be able to obtain the lock with much lower latency as soon a it does become available.
There is no risk of cache pollution from entering kernelspace to sleep or wake other threads.
Point 1 will always stand, but point 2 and 3 are of somewhat diminished usefulness if you consider that good mutex implementations will probably spin a decent number of times before asking the kernel for help waiting.
Now, the long answer:
What you need to ask yourself before using spinlocks is whether these potential advantages outweigh one rare but very real disadvantage: what happens when the thread that holds the lock gets interrupted by the scheduler before it can release the lock. This is of course rare, but it can happen even if the lock is just held for a single variable-increment operation or something else equally trivial. In this case, any other threads attempting to obtain the lock will keep spinning until the thread the holds the lock gets scheduled and has a chance to release the lock. This may never happen if the threads trying to obtain the lock have higher priorities than the thread that holds the lock. That may be an extreme case, but even without different priorities in play, there can be very long delays before the lock owner gets scheduled again, and worst of all, once this situation begins, it can quickly escalate as many threads, all hoping to get the lock, begin spinning on it, tying up more processor time, and further delaying the scheduling of the thread that could release the lock.
As such, I would be careful with spinlocks... :-)
The spinlock is a "busy waiting" lock. It's main advantage is that it keeps the thread active and won't cause a context switch, so if you know that you will only be waiting for a very short time (because your critical operation is very quick), then this may give better performance than a mutex. Conversely a mutex will cause less demand on the system if the critical section takes a long time and a context switch is desirable.
TL;DR: It depends.
The safest method with a performance boost is a hybrid of the two: an adaptive mutex.
When your system has multiple cores you spin for a few thousand cycles to capture the best case of low or no contention, then defer to a full mutex to yield to other threads for long contended locks.
Both POSIX (PTHREAD_MUTEX_ADAPTIVE_NP) and Win32 (SetCriticalSectionSpinCount) have adaptive mutexes, many platforms don't have a POSIX spinlock API.
Spinlock has only interest in MP context. It is used to execute pseudo-atomical tasks. In monoprocessor system the principle is the following :
Lock the scheduler (if the task deals with interrupts, lock interrupts instead)
Do my atomic tack
Unlock the scheduler
But in MP systems we have no guaranties that an other core will not execute an other thread that could enter our code section. To prevent this the spin lock has been created, its purpose is to postpone the other cores execution preventing concurrency issue. The critical section becomes :
Lock the scheduler
SpinLock (prevent entering of other cores)
My task
SpinUnlock
Task Unlock
If the task lock is omitted, during a scheduling, an other thread could try to enter the section an will loop at 100% CPU waiting next scheduling. If this task is an high-priority one, it will produce a deadlock.

Why spinlocks are used in interrupt handlers

I would like to know why spin locks are used instead of semaphores inside an interrupt handler.
Semaphores cause tasks to sleep on contention, which is unacceptable for interrupt handlers. Basically, for such a short and fast task (interrupt handling) the work carried out by the semaphore is overkill. Also, spinlocks can't be held by more than one task.
Whats the problem with semaphore & mutex. And why spinlock needed ?
Can we use the semaphore or mutex in interrupt handlers. The answer is yes and no. you can use the up and unlock, but you can’t use down and lock, as these are blocking calls which put the process to sleep and we are not supposed to sleep in interrupt handlers.
Note that semaphore is not a systemV IPC techniques, its just a synchronization techniques. And there are three functions to acquires the semaphore.
down() : acquire the semaphore and put into un-interruptible state.
down_trylock() : try if lock is available, if lock is not available , don't sleep.
up() :- its useful for releasing the semaphore
So, what if we want to achieve the synchronization in interrupt handlers ? Use spinlocks.
What spinlocks will do ?
Spinlock is a lock which never yields.
Similar to mutex, it has two operations – lock and unlock.
If the lock is available, process will acquire it and will continue in the critical section and unlock it, once its done. This is similar to mutex. But, what if lock is not available ? Here, comes the interesting difference. With mutex, the process will sleep, until the lock is available. But,
in case of spinlock, it goes into the tight loop, where it
continuously checks for a lock, until it becomes available
.
This is the spinning part of the spin lock. This was designed for multiprocessor systems. But, with the preemptible kernel, even a uniprocessor system behaves like an SMP.
The problem is that interrupt handlers (IH) are triggered asynchronously and in unpredictable way, out of the scope of any other activities running in the system. In fact, IHs run out of the scope of concept of the threads and scheduling at all. Due to this all mutual exclusion primitives which rely to the scheduler are unacceptable. Because they usage in the IH can dramatically increases the interrupt handling latencies (in case of IH running in the context of low priority thread) and is able to produce deadlocks (in case of IH running in the context of thread which hold the lock).
You can look at nice and detailed description of spinlocks at http://www.makelinux.net/ldd3/chp-5-sect-5.

Resources