POSIX threads and global variables in C on Linux

POSIX threads and global variables in C on Linux - c

If I have two threads and one global variable (one thread constantly loops to read the variable; the other constantly loops to write to it) would anything happen that shouldn't? (ex: exceptions, errors). If it, does what is a way to prevent this. I was reading about mutex locks and that they allow exclusive access to a variable to one thread. Does this mean that only that thread can read and write to it and no other?

Would anything happen that shouldn't?
It depends in part on the type of the variables. If the variable is, say, a string (long array of characters), then if the writer and the reader access it at the same time, it is completely undefined what the reader will see.
This is why mutexes and other coordinating mechanisms are provided by pthreads.
Does this mean that only that thread can read and write to it and no other?
Mutexes ensure that at most one thread that is using the mutex can have permission to proceed. All other threads using the same mutex will be held up until the first thread releases the mutex. Therefore, if the code is written properly, at any time, only one thread will be able to access the variable. If the code is not written properly, then:
one thread might access the variable without checking that it has permission to do so
one thread might acquire the mutex and never release it
one thread might destroy the mutex without notifying the other
None of these is desirable behaviour, but the mere existence of a mutex does not prevent any of these happening.
Nevertheless, your code could reasonably use a mutex carefully and then the access to the global variable would be properly controlled. While it has permission via the mutex, either thread could modify the variable, or just read the variable. Either will be safe from interference by the other thread.

Does this mean that only that thread can read and write to it and no other?
It means that only one thread can read or write to the global variable at a time.
The two threads will not race amongst themselves to access the global variable neither will they access it at the same time at any given point of time.
In short the access to the global variable is Synchronized.

First; In C/C++ unsynchronized read/write of variable does not generate any exceptions or system error, BUT it can generate application level errors -- mostly because you are unlikely to fully understand how the memory is accessed, and whether it is atomic unless you look at the generated assembler. A multi core CPU may likely create hard-to-debug race conditions when you access shared memory without synchronization.
Hence
Second; You should always use synchronization -- such as mutex locks -- when dealing with shared memory. A mutex lock is cheap; so it will not really impact performance if done right. Rule of thumb; keep the lcok for as short as possible, such as just for the duration of reading/incrementing/writing the shared memory.
However, from your description, it sounds like that one of your threads is doing nothing BUT waiting for the shared meory to change state before doing something -- that is a bad multi-threaded design which cost unnecessary CPU burn, so
Third; Look at using semaphores (sem_create/wait/post) for synchronization between your threads if you are trying to send a "message" from one thread to the other

As others already said, when communicating between threads through "normal" objects you have to take care of race conditions. Besides mutexes and other lock structures that are relatively heavy weight, the new C standard (C11) provides atomic types and operations that are guaranteed to be race-free. Most modern processors provide instructions for such types and many modern compilers (in particular gcc on linux) already provide their proper interfaces for such operations.

If the threads truly are only one producer and only one consumer, then (barring compiler bugs) then
1) marking the variable as volatile, and
2) making sure that it is correctly aligned, so as to avoid interleaved fetches and stores
will allow you to do this without locking.

Related

multithreading: which variables need mutex protection when communication via a condition variable?

I have a question on the interplay between condition variables and associated mutex locks (it arose from a simplified example I was presenting in a lecture, confusing myself in the process). Two threads are exchanging data (lets say an int n indicating an array size, and a double *d which points to the array) via shared variables in the memory of their process. I use an additional int flag (initially flag = 0) to indicate when the data (n and d) is ready (flag = 1), a pthread_mutex_t mtx, and a condition variable pthread_cond_t cnd.
This part is from the receiver thread which waits until flag becomes 1 under the protection of the mutex lock, but afterward processes n and d without protection:
while (1) {
pthread_mutex_lock(&mtx);
while (!flag) {
pthread_cond_wait(&cnd, &mtx);
}
pthread_mutex_unlock(&mtx);
// use n and d
}
This part is from the sender thread which sets n and d beforehand without protection by the mutex lock, but sets flag while the mutex is locked:
n = 10;
d = malloc(n * sizeof(float));
pthread_mutex_lock(&mtx);
flag = 1;
pthread_cond_signal(&cnd);
pthread_mutex_unlock(&mtx);
It is clear that you need the mutex in the sender since otherwise you have a "lost wakeup call" problem (see https://stackoverflow.com/a/4544494/3852630).
My question is different: I'm not sure which variables have to be set (in the sender thread) or read out (in the receiver thread) inside the region protected by the mutex lock, and which variables don't need to be protected by the mutex lock. Is it sufficient to protect flag on both sides, or do n and d also need protection?
Memory visiblity (see the rule below) between sender and receiver should be guaranteed by the call to pthread_cond_signal(), so the necessary pairwise memory barriers should be there (in combination with pthread_cond_wait()).
I'm aware that this is an unusual case. Typically my applications modify a task list in the sender and pop tasks from the list in the receiver, and the associated mutex lock protects of the list operations on both sides. However I'm not sure what would be necessary in the case above. Could the danger be that the compiler (which is not aware of the concurrent access to variables) optimizes away the write and/or read access to the variables? Are there other problems if n and d are not protected by the mutex?
Thanks for your help!
David R. Butenhof: Programming with POSIX Threads, p.89: "Whatever memory values a thread can see when is signals ... a condition variable can also be seen by any thread that is awakened by that signal ...".

At the low level of memory ordering, the rules aren't specified in terms of "the region protected by a mutex", but in terms of synchronization. Whenever two different threads access the same non-atomic object, then unless they are both reads, there must be a synchronization operation in between, to ensure that one of the accesses definitely happens before the other.
The way to achieve synchronization is to have one thread perform a release operation (such as unlocking a mutex) after accessing the shared variables, and have the other thread perform an acquire operation (such as locking a mutex) before accessing them, in such a way that program logic guarantees the acquire must have happened after the release.
And here, you have that. The sender thread does perform a mutex unlock after accessing n and d (last line of the sender code). And the receiver does perform a mutex lock before accessing them (inside pthread_cond_wait). The setting and testing of flag ensures that, when we exit the while (!flag) loop, the most recent lock of the mutex by the receiver did happen after the unlock by the sender. So synchronization is achieved.
The compiler and CPU must not perform any optimization that would defeat this, so in particular they can't optimize away the accesses to n and d, or reorder them around the synchronizing operations. This is usually ensured by treating the release/acquire operations as barriers. Any accesses to potentially shared objects that occur in program order before a release barrier must actually be performed and flushed to coherent memory/cache prior to anything that comes after the barrier (in this case, before any other thread may see the mutex as unlocked). If special CPU instructions are needed to ensure global visibility, the compiler must emit them. And vice versa for acquire barriers: any access that occurs in program order after an acquire barrier must not be reordered before it.
To say it another way, the compiler treats the release barrier as an operation that may potentially read all of memory; so all variables must be written out before that point, so that the actual contents of memory at that point will match what an abstract machine would have. Likewise, an acquire barrier is treated as an operation that may potentially write all of memory, and all variables must be reloaded from memory afterwards. The only exception would be local variables for which the compiler can prove that no other thread could legally know their address; those can be safely kept in registers or otherwise reordered.
It's true that, after the synchronizing lock operation, the receiver happened to unlock the mutex again, but that isn't relevant here. That unlock doesn't synchronize with anything in this particular program, and it has no impact on its execution. Likewise, for purposes of synchronizing access to n and d, it didn't matter whether the sender locked the mutex before or after accessing them. (Though it was important that the sender locked the mutex before writing to flag; that's how we ensure that any earlier reads of flag by the receiver really did happen before the write, instead of racing with it.)
The principle that "accesses to shared variables should be inside a critical region protected by a mutex" is just a higher-level abstraction that is one way to ensure that accesses by different threads always have a synchronizing unlock-lock pair in between them. And in cases where the variables could be accessed over and over again, you normally would want a lock before, and an unlock after, every such access, which is equivalent to the "critical section" principle. But this principle is not itself the fundamental rule.
That said, in real life you probably do want to follow this principle as much as possible, since it will make it easier to write correct code and avoid subtle bugs, and likewise make it easier for other programmers to verify that your code is correct. So even though it is not strictly necessary in this program for the accesses to n,d to be "protected" by the mutex, it would probably be wise to do so anyway, unless there is a significant and measurable benefit (performance or otherwise) to be gained by avoiding it.
The condition variable doesn't play a role in the race avoidance here, except insofar as the pthread_cond_wait locked the mutex. Functionally, it is equivalent to having the receiver simply do a tight spin loop of "lock mutex; test flag; unlock mutex", but without wasting all those CPU cycles.
And I think that the quote from Butenhof is mistaken, or at best misleading. My understanding is that pthread_cond_signal by itself is not guaranteed to be a barrier of any kind, and in fact has no memory ordering effect whatsoever. POSIX doesn't directly address memory ordering, but this is the case for the standard C equivalent cnd_signal. There would be no point in having pthread_cond_signal ensure global visibility unless you could make use of it by assuming that all those accesses were visible by the time the corresponding pthread_cond_wait returns. But since pthread_cond_wait can wake up spuriously, you cannot ever safely make such an assumption.
In this program, the necessary release barrier in the sender comes not from pthread_cond_signal, but rather from the subsequent pthread_mutex_unlock.

Is it sufficient to protect flag on both sides, or do n and d also need protection?
In principle, if you use your mutex, CV, and flag in such a way that the the writer does not modify n, d, or *d after setting flag under mutex protection, and the reader cannot access n, d, or *d until after it observes the modification of flag (under protection of the same mutex), then you can rely on the reader observing the writer's last-written values of n, d, and *d. This is more or less a hand-rolled semaphore.
In practice, you should use whichever system synchronization objects you have chosen (mutexes, semaphores, etc.) to protect all the shared data. Doing so is easier to reason about and less prone to spawn bugs. Often, it's simpler, too.

Mutex on pointers to a shared variable

I am new to thread programming. I know that mutexes are used to protect access to shared data in a multi-threaded program.
Suppose I have one thread with variable a and a second one with the pointer variable p that holds the address of a. Is the code safe if, in the second thread, I lock a mutex before I modify the value of a using the pointer variable? From my understanding it is safe.
Can you confirm? And also can you provide the reason why it is true or why it is not true?
I am working with c and pthreads.

The general rule when doing multithreading is that shared variables among threads that are read and written need to be accessed serially, which means that you must use some sort of synchronization primitive. Mutexes are a popular choice, but no matter what you end up using, you just need to remember that before reading from or writing to a shared variable, you need to acquire a lock to ensure consistency.
So, as long as every thread in your code agrees to always use the same lock before accessing a given variable, you're all good.
Now, to answer your specific questions:
Is the code safe if, in the second thread, I lock a mutex before I
modify the value of a using the pointer variable?
It depends. How do you read a on the first thread? The first thread needs to lock the mutex too before accessing a in any way. If both threads lock the same mutex before reading or writing the value of a, then it is safe.
It's safe because the region of code between the mutex lock and unlock is exclusive (as long as every thread respects the rule that before doing Y, they need to acquire lock X), since only one thread at a time can have the lock.
As for this comment:
And if the mutex is locked before p is used, then both a and p are
protected? The conclusion being that every memory reference present in
a section where a mutex is locked is protected, even if the memory is
indirectly referenced?
Mutexes don't protect memory regions or references, they protect a region of code. Whatever you make between locking and unlocking is exclusive; that's it. So, if every thread accessing or modifying either of a or p locks the same mutex before and unlocks afterwards, then as a side-effect you have synchronized accesses.
TL;DR Mutexes allow you to write code that never executes in parallel, you get to choose what that code does - a remarkably popular pattern is to access and modify shared variables.

How costly is mutex locks when thread synchronization is not critical?

I have a an object (kind of a queue) which is accessed across the threads. The queue object can be mutex locked before used by either thread.
A simpler way to manage this is by bringing the lock inside the queue object itself - hence every API will lock the queue and release when the work is done. This way, threads don't have to manage additional mutex variables along with each queue.
Now my question is, sometimes there is only one thread which is accessing queue (say it is a local variable). But since now inherently the queue would first lock its internal data structure and unlock before leaving, will this be a costly affair?
How costly is the redundant mutex_lock and mutex_unlock operation - when there is no specific need of thread synchronization?
PS:
My question is slightly related to this one: How efficient is locking an unlocked mutex? What is the cost of a mutex?
But i am looking for a specific answer in my design and understanding of why.
I AM USING C, and pthread libraries.

One way to handle this is to have your queue initialization take a parameter that indicates whether a lock should be acquired or not during queue operations. If a queue is being used by a single thread, it gets initialized such that it won't acquire/release locks (or uses a lock object where the acquire/release operations are nops).
See this answer for an example of how boost::pool does something along these lines (although in C++ and as a compile time configuration): https://stackoverflow.com/a/10188784/12711
A similar concept can be applied to C code at runtime, too.

First of all: Neither the C library nor pthreads implements mutex locking - they call into the kernel to use OS primitives for that. This implies, that the performance of muteces will vary wildly with the base OS.
If you can reduce your portability spectrum to hardware supporting atomic compare-exchange or atomic increase-and-read (such as any x86 from this millennium) you can use atomic increase-and-read to create a threadsafe queue that does not need locking.
For the .Net platform I have such a beast at http://sourceforge.net/projects/dotnetlockless - it should be quite easy to port it to C.

Race condition in glibc/NPTL/Linux robust mutexes?

In a comment on the question Automatically release mutex on crashes in Unix back in 2010, jilles claimed:
glibc's robust mutexes are so fast because glibc takes dangerous shortcuts. There is no guarantee that the mutex still exists when the kernel marks it as "will cause EOWNERDEAD". If the mutex was destroyed and the memory replaced by a memory mapped file that happens to contain the last owning thread's ID at the right place and the last owning thread terminates just after writing the lock word (but before fully removing the mutex from its list of owned mutexes), the file is corrupted. Solaris and will-be-FreeBSD9 robust mutexes are slower because they do not want to take this risk.
I can't make any sense of the claim, since destroying a mutex is not legal unless it's unlocked (and thus not in any thread's robust list). I also can't find any references searching for such a bug/issue. Was the claim simply erroneous?
The reason I ask and that I'm interested is that this is relevant to the correctness of my own implementation built upon the same Linux robust-mutex primitive.

I think I found the race, and it is indeed very ugly. It goes like this:
Thread A has held the robust mutex and unlocks it. The basic procedure is:
Put it in the "pending" slot of the thread's robust list header.
Remove it from the linked list of robust mutexes held by the current thread.
Unlock the mutex.
Clear the "pending" slot of the thread's robust list header.
The problem is that between steps 3 and 4, another thread in the same process could obtain the mutex, then unlock it, and (rightly) believing itself to be the final user of the mutex, destroy and free/munmap it. After that, if any thread in the process creates a shared mapping of a file, device, or shared memory and it happens to get assigned the same address, and the value at that location happens to match the pid of the thread that's still between steps 3 and 4 of unlocking, you have a situation whereby, if the process is killed, the kernel will corrupt the mapped file by setting the high bit of a 32-bit integer it thinks is the mutex owner id.
The solution is to hold a global lock on mmap/munmap between steps 2 and 4 above, exactly the same as in my solution to the barrier issue described in my answer to this question:
Can a correct fail-safe process-shared barrier be implemented on Linux?

The description of the race by FreeBSD pthread developer David Xu: http://lists.freebsd.org/pipermail/svn-src-user/2010-November/003668.html
I don't think the munmap/mmap cycle is strictly required for the race. The piece of shared memory might be put to a different use as well. This is uncommon but valid.
As also mentioned in that message, more "fun" occurs if threads with different privilege access a common robust mutex. Because the node for the list of owned robust mutexes is in the mutex itself, a thread with low privilege may corrupt a high privilege thread's list. This could be exploited easily to make the high privilege thread crash and in rare cases this might allow the high privilege thread's memory to be corrupted. Apparently Linux's robust mutexes are only designed for use by threads with the same privileges. This could have been avoided easily by making the robust list an array fully in the thread's memory instead of a linked list.

Implementing mutex in a user level thread library

I am developing a user level thread library as part of a project. I came up with an approach to implement mutex. I would like to see ur views before going on with it. Basically, i need to implement just 3 functions in my library
mutex_init, mutex_lock and mutex_unlock
I thought my mutex_t structure would look something like
typedef struct
{
int available; //indicates whether the mutex is locked or unlocked
queue listofwaitingthreads;
gtthread_t owningthread;
}mutex_t;
In my mutex_lock function, i will first check if the mutex is available in a while loop. If it is not, i will yield the processor for the next thread to execute.
In my mutex_unlock function, i will check if the owner thread is the current thread. If it is, i will set available to 0.
Is this the way to go about it ? Also, what about deadlock? Should i take care of those conditions in my user level library or should i leave the application programmers to write code properly ?

This won't work, because you have a race condition. If 2 threads try to catch the lock at the same time, both will see available == 0, and both will think they succeeded with taking the mutex.
If you want to do this properly, and without using an already-existing lock, You must access hardware operations like TAS, CAS, etc.
There are algorithms that give you mutual exclusion without such hardware support, but they make some assumptions that are many times false. For more details about this, I highly recommend reading Herlihy and Shavit's The art of multiprocessor programming, chapter 7.
You shouldn't worry about deadlocks in this level - mutex locks should be simple enough, and there is some assumption that the programmer using them should use care not to cause deadlocks (advanced mutexes can check for self-deadlock, meaning a thread that calls lock twice without calling unlock in the middle).

Not only that you have to do atomic operations to read and modify the flag (as Eran pointed out) you also have to watch that your queue is capable to have concurrent accesses. This is not completely trivial, sort of hen and egg problem.
But if you'd really implement this by spinning, you wouldn't even need to have such a queue. The access order to the lock then would be mainly random, though.
Probably just yielding would also not be enough, this can be quite costly if you have threads holding the lock for more than some processor cycles. Consider using nanosleep with a low time value for the wait.

In general, a mutex implementation should look like:
Lock:
while (trylock()==failed) {
atomic_inc(waiter_cnt);
atomic_sleep_if_locked();
atomic_dec(waiter_cnt);
}
Trylock:
return atomic_swap(&lock, 1);
Unlock:
atomic_store(&lock, 0);
if (waiter_cnt) wakeup_sleepers();
Things get more complex if you want recursive mutexes, mutexes that can synchronize their own destruction (i.e. freeing the mutex is safe as soon as you get the lock), etc.
Note that atomic_sleep_if_locked and wakeup_sleepers correspond to FUTEX_WAIT and FUTEX_WAKE ops on Linux. The other atomics are probably CPU instructions, but could be system calls or kernel-assisted userspace function code, as in the case of Linux/ARM and the 0xffff0fc0 atomic compare-and-swap call.

You do not need atomic instructions for a user level thread library, because all the threads are going to be user level threads of the same process. So actually when your process is given the time slice to execute, you are running multiple threads during that time slice but on the same processor. So, no two threads are going to be in the library function at the same time. Considering that the functions for mutex are already in the library, mutual exclusion is guaranteed.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight