suspend pthread? - c

I want to implement a mutex lock.
From my understanding, mutex.lock() should work like
1) check lock owner
2) if lock is owned, put thread in waiting queue
3) suspend this thread until another thread send a wait up signal
However, there is nothing like pthread_suspend(), then how do I do suspend?
I found someone saying use pthread_con_wait(), but seems if I want to use that function, I have to set up a pthread_mutex lock first, which it doesn't make sense to use pthread_mutex inside my mutex.
Well, if my understanding of mutex is wrong, please correct me.
Thanks.

Mutexes, locks, and wait conditions are all different, distinct things. You need a mutex variable in order to implement both a lock and a wait condition.
A lock is a simple mechanism that prevents more than one thread from executing the same code at once by making all by one thread wait for the lock to become unlocked.
A wait condition is a slightly more complex structure that allows a thread to monitor a condition (usually a boolean flag) and only wake up when the flag has changed favourably.
In both cases, when a thread blocks (i.e. sleeps), the operating system's scheduling primitives automatically take care of descheduling the thread and using the available computing time elsewhere. Thread and task scheduling is not something you would normally have to worry about manually.

You can only make things that are at least as complex as the simplest pieces you have. If the simplest pieces you have are mutexes, then you can't make mutexes from the pieces you have. You can only make things at least as complex as a mutex or more so. If you have any pieces simpler than a mutex, tell us what they are, and we can tell you how to make a mutex out of them.
I suppose, if you want, you can make your own mutex out of pthread mutexes and condition variables. I'm not sure what the point is, but it's trivial to do. As you noted, you can use pthread_cond_wait to wait on your own kind of mutex.
The reason the pthreads standard gives you a mutex is because it's about the most flexible of the possible synchronization primitives.

mutex.lock() should work like:
1) check lock owner
2) if lock is owned, put thread in waiting queue
3) suspend this thread until THE THREAD THAT OWNS THE LOCK sends a wake up signal. No other thread can release the lock.
These steps should be performed as an atomic operation so that the correct behaviour is followed for all threads acquiring/releasing the mutex, no matter how such calls may be interrupted and reentered from other threads.
'However, there is nothing like pthread_suspend(), then how do I do suspend?' - usually, you don't. The OS kernel provides synchronization primitives that can block threads that should not run on. To implement a 'suspend' in user-space, you can only spin-wait - something that is a good strategy in a few cases, (underloaded multi-core box where the lock is only held for a very short time), but certainly not all, (and can lead to spectacularly disastrous livelocks across whole clusters of machines).
If you want a mutex, use an OS mutex - that's what any cross-platform lib. will do.

Related

preemption, pthread_spin_lock and atomic built-in

According to this question here by using pthread_spin_lock is dangerous to lock a critical section as the thread might be interrupted by the scheduler out of the bloom and other threads contenting on that resource might be left spinning.
Suppose that I decide to switch from pthread_spin_lock to locks implemented via atomic built-in + compare_and_swap idion: will this thing improve or still I will suffer from this issue?
Since with pthread it seems to be nothing to disable preemption, is there something I can do in case I use locks implemented via atomics or anything I can have a look at?
I am interested in locking a small critical region.
pthread_mutex_lock typically has a fast path which uses an atomic operation to try to acquire the lock. In the event that the lock is not owned, this can be very fast. Only if the lock is already held, does the thread enter the kernel via a system call. The kernel acquires a spin-lock, and then reattempts to acquire the mutex in case it was released since the first attempt. If this attempt fails, the calling thread is added to a wait queue associated with the mutex, and a context switch is performed. The kernel also sets a bit in the mutex to indicate that there is a waiting thread.
pthread_mutex_unlock also has a fast path. If the waiting thread flag is clear, it can simply release the lock. If the flag is set, the thread must enter the kernel via a system call so the waiting thread can be woken. Again, the kernel must acquire a spin lock so that it can manipulate its thread control data structures. In the event that there is no thread waiting after all, the lock can be released by the kernel. If there is a thread waiting, it is made runnable, and ownership of the mutex is transferred without it being released.
There are many subtle race conditions in this little dance, and hopefully it all works properly.
Since a thread that attempts to acquire a locked mutex is context switched out, it does not prevent the thread the owns the mutex from running, which gives the owner an opportunity to exit its critical section and release the mutex.
In contrast, a thread that attempts to acquire a locked spin-lock simply spins, consuming CPU cycles. This has the potential of preventing the thread that owns the spin-lock from exiting its critical section and releasing the lock. The spinning thread can be preempted when its timeslice has been consumed, allowing the thread that owns the lock to eventually regain control. Of course, this is not great for performance.
In practice, spin-locks are used where there is no chance that the thread can be preempted while it owns the lock. A kernel may set a per-cpu flag to prevent it from performing a context switch from an interrupt service routine (or it may raise the interrupt priority level to prevent interrupts that can cause context switches, or it may disable interrupts altogether). A user thread can prevent itself from being preempted (by other threads in the same process) by raising its priority. Note that, in a uniprocessor system, preventing the current thread from being preempted eliminates the need for the spin lock. Alternatively, in a multiprocessor system, you can bind threads to cpus (cpu affinity) so that they cannot preempt one another.
All locks ultimately require an atomic primitive (well, efficient locks; see here for a counter example). Mutexes can be inefficient if they are highly contended, causing threads to constantly enter the kernel and be context switched; especially if the critical section is smaller than the kernel overhead. Spin locks can be more efficient, but only if the owner cannot be preempted and the critical section is short. Note that the kernel must still acquire a spin lock when a thread attempts to acquire a locked mutex.
Personally, I would use atomic operations for things like shared counter updates, and mutexes for more complex operations. Only after profiling would I consider replacing mutexes with spin locks (and figure out how to deal with preemption). Note that if you intend to use condvars, you have no choice but to use mutexes.

What is the `pthread_mutex_lock()` wake order with multiple threads waiting?

Suppose I have multiple threads blocking on a call to pthread_mutex_lock(). When the mutex becomes available, does the first thread that called pthread_mutex_lock() get the lock? That is, are calls to pthread_mutex_lock() in FIFO order? If not, what, if any, order are they in? Thanks!
When the mutex becomes available, does the first thread that called pthread_mutex_lock() get the lock?
No. One of the waiting threads gets a lock, but which one gets it is not determined.
FIFO order?
FIFO mutex is rather a pattern already. See Implementing a FIFO mutex in pthreads
"If there are threads blocked on the mutex object referenced by mutex when pthread_mutex_unlock() is called, resulting in the mutex becoming available, the scheduling policy shall determine which thread shall acquire the mutex."
Aside from that, the answer to your question isn't specified by the POSIX standard. It may be random, or it may be in FIFO or LIFO or any other order, according to the choices made by the implementation.
FIFO ordering is about the least efficient mutex wake order possible. Only a truly awful implementation would use it. The thread that ran the most recently may be able to run again without a context switch and the more recently a thread ran, more of its data and code will be hot in the cache. Reasonable implementations try to give the mutex to the thread that held it the most recently most of the time.
Consider two threads that do this:
Acquire a mutex.
Adjust some data.
Release the mutex.
Go to step 1.
Now imagine two threads running this code on a single core CPU. It should be clear that FIFO mutex behavior would result in one "adjust some data" per context switch -- the worst possible outcome.
Of course, reasonable implementations generally do give some nod to fairness. We don't want one thread to make no forward progress. But that hardly justifies a FIFO implementation!

Implementing mutex in a user level thread library

I am developing a user level thread library as part of a project. I came up with an approach to implement mutex. I would like to see ur views before going on with it. Basically, i need to implement just 3 functions in my library
mutex_init, mutex_lock and mutex_unlock
I thought my mutex_t structure would look something like
typedef struct
{
int available; //indicates whether the mutex is locked or unlocked
queue listofwaitingthreads;
gtthread_t owningthread;
}mutex_t;
In my mutex_lock function, i will first check if the mutex is available in a while loop. If it is not, i will yield the processor for the next thread to execute.
In my mutex_unlock function, i will check if the owner thread is the current thread. If it is, i will set available to 0.
Is this the way to go about it ? Also, what about deadlock? Should i take care of those conditions in my user level library or should i leave the application programmers to write code properly ?
This won't work, because you have a race condition. If 2 threads try to catch the lock at the same time, both will see available == 0, and both will think they succeeded with taking the mutex.
If you want to do this properly, and without using an already-existing lock, You must access hardware operations like TAS, CAS, etc.
There are algorithms that give you mutual exclusion without such hardware support, but they make some assumptions that are many times false. For more details about this, I highly recommend reading Herlihy and Shavit's The art of multiprocessor programming, chapter 7.
You shouldn't worry about deadlocks in this level - mutex locks should be simple enough, and there is some assumption that the programmer using them should use care not to cause deadlocks (advanced mutexes can check for self-deadlock, meaning a thread that calls lock twice without calling unlock in the middle).
Not only that you have to do atomic operations to read and modify the flag (as Eran pointed out) you also have to watch that your queue is capable to have concurrent accesses. This is not completely trivial, sort of hen and egg problem.
But if you'd really implement this by spinning, you wouldn't even need to have such a queue. The access order to the lock then would be mainly random, though.
Probably just yielding would also not be enough, this can be quite costly if you have threads holding the lock for more than some processor cycles. Consider using nanosleep with a low time value for the wait.
In general, a mutex implementation should look like:
Lock:
while (trylock()==failed) {
atomic_inc(waiter_cnt);
atomic_sleep_if_locked();
atomic_dec(waiter_cnt);
}
Trylock:
return atomic_swap(&lock, 1);
Unlock:
atomic_store(&lock, 0);
if (waiter_cnt) wakeup_sleepers();
Things get more complex if you want recursive mutexes, mutexes that can synchronize their own destruction (i.e. freeing the mutex is safe as soon as you get the lock), etc.
Note that atomic_sleep_if_locked and wakeup_sleepers correspond to FUTEX_WAIT and FUTEX_WAKE ops on Linux. The other atomics are probably CPU instructions, but could be system calls or kernel-assisted userspace function code, as in the case of Linux/ARM and the 0xffff0fc0 atomic compare-and-swap call.
You do not need atomic instructions for a user level thread library, because all the threads are going to be user level threads of the same process. So actually when your process is given the time slice to execute, you are running multiple threads during that time slice but on the same processor. So, no two threads are going to be in the library function at the same time. Considering that the functions for mutex are already in the library, mutual exclusion is guaranteed.

Why spinlocks are used in interrupt handlers

I would like to know why spin locks are used instead of semaphores inside an interrupt handler.
Semaphores cause tasks to sleep on contention, which is unacceptable for interrupt handlers. Basically, for such a short and fast task (interrupt handling) the work carried out by the semaphore is overkill. Also, spinlocks can't be held by more than one task.
Whats the problem with semaphore & mutex. And why spinlock needed ?
Can we use the semaphore or mutex in interrupt handlers. The answer is yes and no. you can use the up and unlock, but you can’t use down and lock, as these are blocking calls which put the process to sleep and we are not supposed to sleep in interrupt handlers.
Note that semaphore is not a systemV IPC techniques, its just a synchronization techniques. And there are three functions to acquires the semaphore.
down() : acquire the semaphore and put into un-interruptible state.
down_trylock() : try if lock is available, if lock is not available , don't sleep.
up() :- its useful for releasing the semaphore
So, what if we want to achieve the synchronization in interrupt handlers ? Use spinlocks.
What spinlocks will do ?
Spinlock is a lock which never yields.
Similar to mutex, it has two operations – lock and unlock.
If the lock is available, process will acquire it and will continue in the critical section and unlock it, once its done. This is similar to mutex. But, what if lock is not available ? Here, comes the interesting difference. With mutex, the process will sleep, until the lock is available. But,
in case of spinlock, it goes into the tight loop, where it
continuously checks for a lock, until it becomes available
.
This is the spinning part of the spin lock. This was designed for multiprocessor systems. But, with the preemptible kernel, even a uniprocessor system behaves like an SMP.
The problem is that interrupt handlers (IH) are triggered asynchronously and in unpredictable way, out of the scope of any other activities running in the system. In fact, IHs run out of the scope of concept of the threads and scheduling at all. Due to this all mutual exclusion primitives which rely to the scheduler are unacceptable. Because they usage in the IH can dramatically increases the interrupt handling latencies (in case of IH running in the context of low priority thread) and is able to produce deadlocks (in case of IH running in the context of thread which hold the lock).
You can look at nice and detailed description of spinlocks at http://www.makelinux.net/ldd3/chp-5-sect-5.

pthreads mutex vs semaphore

What is the difference between semaphores and mutex provided by pthread library ?
semaphores have a synchronized counter and mutex's are just binary (true / false).
A semaphore is often used as a definitive mechanism for answering how many elements of a resource are in use -- e.g., an object that represents n worker threads might use a semaphore to count how many worker threads are available.
Truth is you can represent a semaphore by an INT that is synchronized by a mutex.
I am going to talk about Mutex vs Binary-Semaphore. You obviously use mutex to prevent data in one thread from being accessed by another thread at the same time.
(Assume that you have just called lock() and in the process of accessing a data. This means that, you don’t expect any other thread (or another instance of the same thread-code) to access the same data locked by the same mutex. That is, if it is the same thread-code getting executed on a different thread instance, hits the lock, then the lock() should block the control flow.)
This applies to a thread that uses a different thread-code, which is also accessing the same data and which is also locked by the same mutex.
In this case, you are still in the process of accessing the data and you may take, say, another 15 secs to reach the mutex unlock (so that the other thread that is getting blocked in mutex lock would unblock and would allow the control to access the data).
Do you ever allow another thread to just unlock the same mutex, and in turn, allow the thread that is already waiting (blocking) in the mutex lock to unblock and access the data? (Hope you got what I am saying here.)
As per agreed-upon universal definition,
with “mutex” this can’t happen. No other thread can unlock the lock
in your thread
with “binary-semaphore” this can happen. Any other thread can unlock
the lock in your thread
So, if you are very particular about using binary-semaphore instead of mutex, then you should be very careful in “scoping” the locks and unlocks, I mean, that every control-flow that hits every lock should hit an unlock call and also there shouldn’t be any “first unlock”, rather it should be always “first lock”.
The Toilet Example
Mutex:
Is a key to a toilet. One person can have the key - occupy the toilet - at the time. When finished, the person gives (frees) the key to the next person in the queue.
"Mutexes are typically used to serialise access to a section of re-entrant code that cannot be executed concurrently by more than one thread. A mutex object only allows one thread into a controlled section, forcing other threads which attempt to gain access to that section to wait until the first thread has exited from that section."
(A mutex is really a semaphore with value 1.)
Semaphore:
Is the number of free identical toilet keys.
For Example, say we have four toilets with identical locks and keys. The semaphore count - the count of keys - is set to 4 at beginning (all four toilets are free), then the count value is decremented as people are coming in. If all toilets are full, ie. there are no free keys left, the semaphore count is 0. Now, when eq. one person leaves the toilet, semaphore is increased to 1 (one free key), and given to the next person in the queue.
"A semaphore restricts the number of simultaneous users of a shared resource up to a maximum number. Threads can request access to the resource (decrementing the semaphore), and can signal that they have finished using the resource (incrementing the semaphore)."
Source
mutex is used to avoid race condition between multiple threads.
whereas semaphore is used as synchronizing element used across multiple process.
mutex can't be replaced with binary semaphore since, one process waits for semaphore while other process releases semaphore. In case mutex both acquisition and release is handled by same.
The difference between the semaphore and mutex is the difference between mechanism and pattern. The difference is in their purpose (intent)and how they work(behavioral).
The mutex, barrier, pipeline are parallel programming patterns. Mutex is used(intended) to protect a critical section and ensure mutual exclusion. Barrier makes the agents(thread/process) keep waiting for each other.
One of the feature(behavior) of mutex pattern is that only allowed agent(s)(process or thread) can enter a critical section and only that agent(s) can voluntarily get out of that.
There are cases when mutex allows single agent at a time. There are cases where it allows multiple agents(multiple readers) and disallow some other agents(writers).
The semaphore is a mechanism that can be used(intended) to implement different patterns. It is(behavior) generally a flag(possibly protected by mutual exclusion). (One interesting fact is even mutex pattern can be used to implement semaphore).
In popular culture, semaphores are mechanisms provided by kernels, and mutexes are provided by user-space library.
Note, there are misconceptions about semaphores and mutexes. It says that semaphores are used for synchronization. And mutexes has ownership. This is due to popular OS books. But the truth is all the mutexes, semaphores and barriers are used for synchronization. The intent of mutex is not ownership but mutual exclusion. This misconception gave the rise of popular interview question asking the difference of the mutexes and binary-semaphores.
Summary,
intent
mutex, mutual exclusion
semaphore, implement parallel design patterns
behavior
mutex, only the allowed agent(s) enters critical section and only it(they) can exit
semaphore, enter if the flag says go, otherwise wait until someone changes the flag
In design perspective, mutex is more like state-pattern where the algorithm that is selected by the state can change the state. The binary-semaphore is more like strategy-pattern where the external algorithm can change the state and eventually the algorithm/strategy selected to run.
This two articles explain great details about mutex vs semaphores
Also this stack overflow answer tells the similar answer.
Semaphore is more used as flag, for which your really don't need to bring RTOS / OS. Semaphore can be accidentally or deliberately changed by other threads (say due to bad coding).
When you thread use mutex, it owns the resources. No other thread can ever access it, before resource get free.
Mutexes can be applied only to threads in a single process and do not work between processes as do semaphores.
Mutex is like sempaphore with with S=1.
You can control number of concurrent accesses with semaphore but with mutex only one process at a time can access it.
See the implemenation of these two below: (all functions are atomic)
Semaphore:
wait(S) {
while (S <= 0 )
; // busy wait
S--;
}
signal(S) {
S++;
}
Mutex:
acquire() {
while (!available)
; // busy wait
available = false;
}
release() {
available = true;
}

Resources