Setting Global variables in thread - c

I need to have a string as a global variable. There is a possibility for multiple threads to set the global variable. Should I have to go for mutex for this? Or will OS handle such actions.
Going for mutex affects the application performance.
I am not concerned about the order of actions happening. I am afraid of the data corruption.
Could somebody let me know about this.

It sounds like you understand all of the concerns. If the global variable can be corrupt you definitely need to lock it in a mutex. This will affect performance, since this part is by definition now going to be synchronous. That being said, you will want to lock the smallest part of the code as necessary, to minimize the time that synchronous code is being called.

What's your global variable? A pointer to the string buffer, or the buffer itself?
On many architectures (including AFAIR 32-bit x86) overwriting a single pointer is atomic.
This example might work:
volatile char **global_var;
void set_var(char *str) {
char *tmp = strdup(str);
global_var = &tmp;
}

You can use Thread-Local Storage for this.
Unfortunately, its not specified in current C99 standart, but possible will be there in C1X. For now, you can use compiler-specific implementations (GCC, ICC and Visual C have it).

As far as the standards are concerned, yes, you must use a mutex. Failure to do so results in undefined behavior. In practice, most machine architectures will have no problem with this. Future versions of the C standard (C1x) will have atomic types which, if used here, would definitely make the assignment without lock safe (albeit possibly using an internal lock, on broken archs that lack real atomics).

Related

multithreaded C/C++ variable no cache (Linux)

I use 2 pthreads, where one thread "notifies" the other one of an event, and for that there is a variable ( normal integer ), which is set by the second thread.
This works, but my question is, is it possible that the update is not seen immediately by the first (reading) thread, meaning the cache is not updated directly? And if so, is there a way to prevent this behaviour, e.g. like the volatile keyword in java?
(the frequency which the event occurs is approximately in microsecond range, so more or less immediate update needs to be enforced).
/edit: 2nd question: is it possible to enforce that the variable is hold in the cache of the core where thread 1 is, since this one is reading it all the time. ?
It sounds to me as though you should be using a pthread condition variable as your signaling mechanism. This takes care of all the issues you describe.
It may not be immediately visible by the other processors but not because of cache coherence. The biggest problems of visibility will be due to your processor's out-of-order execution schemes or due to your compiler re-ordering instructions while optimizing.
In order to avoid both these problems, you have to use memory barriers. I believe that most pthread primitives are natural memory barriers which means that you shouldn't expect loads or stores to be moved beyond the boundaries formed by the lock and unlock calls. The volatile keyword can also be useful to disable a certain class of compiler optimizations that can be useful when doing lock-free algorithms but it's not a substitute for memory barriers.
That being said, I recommend you don't do this manually and there are quite a few pitfalls associated with lock-free algorithms. Leaving these headaches to library writters should make you a happier camper (unless you're like me and you love headaches :) ). So my final recomendation is to ignore everything I said and use what vromanov or David Heffman suggested.
The most appropriate way to pass a signal from one thread to another should be to use the runtime library's signalling mechanisms, such as mutexes, condition variables, semaphores, and so forth.
If these have too high an overhead, my first thought would be that there was something wrong with the structure of the program. If it turned out that this really was the bottleneck, and restructuring the program was inappropriate, then I would use atomic operations provided by the compiler or a suitable library.
Using plain int variables, or even volatile-qualified ones is error prone, unless the compiler guarantees they have the appropriate semantics. e.g. MSVC makes particular guarantees about the atomicity and ordering constraints of plain loads and stores to volatile variables, but gcc does not.
Better way to use atomic variables. For sample you can use libatomic. volatile keyword not enough.

C99: Restricted Pointers to Document Thread Safety?

This question isn't about the technical usage of restricted, more about the subjective usage. Although I might be mistaken as to how restricted technically works, in which case you should feel free to grill me for basing a question on a false premise.
Here are two examples of how I'm using restricted so far:
If I have a function that takes a pointer to a sequence of immutable chars, I don't say it's restricted, since other people are allowed to access the data via their own pointers at the same time as the function's executing, e.g. from another parallel thread. The data isn't being modified, so no problem.
However, if the function takes a pointer to a sequence of mutable chars that it might modify, I say it's restricted because the data absolutely should not be accessed in anyway from any pointer (bar the argument the function uses, obviously) during the execution of the function due to potentially inconsistent data. It also states the possibility of the data being modified, so the coder knows not to read stale data and that they should use a memory barrier when accessing or whatever...
I don't code much C, so I could easily be wrong about what I'm assuming here. Is this correct usage of restrict? Is it worth doing in this scenario?
I'm also assuming that once the restricted pointer is popped off the stack when the function returns, that the data can then freely be accessed via any other pointer again, and that the restriction only lasts as long as the restricted pointer. I know that this relies on the coder following the rules, since accessing a restricted data via an 'unofficial' pointer is UB.
Have I got all of this right?
EDIT:
I'd just like to make clear that I already know it does absolutely nothing to prevent the users from accessing data via multiple threads, and I also know that C89 has no knowledge of what 'threads' even are.
But given that any context where an argument can be modified via reference, it's clear that it mustn't be accessed as the function is running. This doesn't do anything to enforce thread safety, but it does clearly document that you modify the data through your own pointer during the execution of the function at your own risk.
Even if threading is taken completely out of the equation, you still allow for further optimizations in a scenario where it seems correct to me.
Even so, thanks for all your authoritative answers so far. Do I upvote all the answers that I liked, or just the one that I accept? What if more than one is accepted? Sorry, I'm new here, I'll look through the FAQ more thoroughly now...
restrict has nothing to do with thread safety. In fact, the existing C standards have nothing to say on the topic of threads at all; from the point of view of the spec, there is no such thing as a "thread".
restrict is a way to inform the compiler about aliasing. Pointers often make it hard for the compiler to generate efficient code, because the compiler cannot know at compile time whether two pointers actually refer to the same memory. Toy example:
void foo(int *x, int *y) {
*x = 5;
*y = 7;
printf("%d\n", *x);
}
When the compiler processes this function, it has no idea whether x and y refer to the same memory location. Therefore it does not know whether it will print 5 or 7, and it has to emit code to actually read *x before calling printf.
But if you declare x as int *restrict x, the compiler can prove that this function prints 5, so it can feed a compile-time constant to the printf call.
Many such optimizations become possible with restrict, especially when you are talking about operations on arrays.
But none of this has anything to do with threads. To get multi-treading applications right, you need proper synchronization primitives like mutexes, condition variables, and memory barriers... All of which are specific to your platform, and none of which have anything to do with restrict.
[edit]
To answer your question about using restrict as a form of documentation, I would say "no" to that, also.
You seem to be thinking that you should document when a variable can or cannot be concurrently accessed. But for shared data, the proper discipline is (almost always) to ensure that it is never concurrently accessed.
The relevant documentation for a variable is whether it is shared at all and which mutex protects it. Any code accessing that variable for any reason, even just to read it, needs to hold the mutex. The author should neither know nor care whether some other thread might or might not happen to be accessing the variable concurrently... Because that other thread will be obeying the same discipline, and the mutex guarantees there is no concurrent access.
This discipline is simple, it works, and it scales, which is why it is one of the dominant paradigms for sharing data between threads. (The other is message passing.) If you ever find yourself trying to reason "do I really need to lock the mutex this time?", you are almost certainly doing something wrong. It would be hard to overstate this point.
No, I don't think that this is a good dialect to provide any information about acces from different threads. It is meant as assertions about pointers that a particular peace of code gets for different pointers it handles. Threading is not part of the language, yet. Thread safe acces to data needs much more, restrict is not the right tool. volatile isn't a guarantee, which you sometimes see proposed as well.
mutexes or other lock structures
memory fences that ensure data integrity
atomic operations and types
The upcoming standard C1x is supposed to provide such constructs. C99 isn't.
The canonical example of restrict is memcpy() - which the manpage on OS X 10.6 gives the prototype as:
void* memcpy(void *restrict s1, const void *restrict s2, size_t n);
The source and destination regions in memcpy are not permitted to overlap; therefore by labelling them as restrict this restriction is enforced - the compiler can know that no part of the source array aliases with the destination; can it can do things like read a large chunk of the source and then write it into the destination.
Essentially restrict is about assisting compiler obtimizations by tagging pointers as not aliasing - in and of itself it doeen't help with thread safety - it doesn't automatically cause the object pointed to be to locked while the function is called.
See How to use the restrict qualified in C and the wikipedia article on restrict for a more lengthy discussion.
restrict is a hint to the compiler that the buffer accessed via a pointer is not aliased via another pointer in scope. So if you have a function like:
foo(char *restrict datai, char *restrict dataj)
{
// we've "promised" the compiler that it should not worry about overlap between
// datai and dataj buffers, this gives the compiler the opportunity to generate
// "better" code, potentially mitigating load-store issues, amongst other things
}
To not use restrict is not enough to provide guarded access in a multi-threaded application though. If you have a shared buffer simultaneously accessed by multiple threads via char * parameters in a read/write way you would potentially need to use some kind of lock/mutex etc - the absence of restrict does not imply thread safety.
Hope this helps.

Following pointers in a multithreaded environment

If I have some code that looks something like:
typedef struct {
bool some_flag;
pthread_cond_t c;
pthread_mutex_t m;
} foo_t;
// I assume the mutex has already been locked, and will be unlocked
// some time after this function returns. For clarity. Definitely not
// out of laziness ;)
void check_flag(foo_t* f) {
while(f->flag)
pthread_cond_wait(&f->c, &f->m);
}
Is there anything in the C standard preventing an optimizer from rewriting check_flag as:
void check_flag(foo_t* f) {
bool cache = f->flag;
while(cache)
pthread_cond_wait(&f->c, &f->m);
}
In other words, does the generated code have to follow the f pointer every time through the loop, or is the compiler free to pull the dereference out?
If it is free to pull it out, is there any way to prevent this? Do I need to sprinkle a volatile keyword somewhere? It can't be check_flag's parameter because I plan on having other variables in this struct that I don't mind the compiler optimizing like this.
Might I have to resort to:
void check_flag(foo_t* f) {
volatile bool* cache = &f->some_flag;
while(*cache)
pthread_cond_wait(&f->c, &f->m);
}
In the general case, even if multi-threading wasn't involved and your loop looked like:
void check_flag(foo_t* f) {
while(f->flag)
foo(&f->c, &f->m);
}
the compiler would be unable to to cache the f->flag test. That's because the compiler can't know whether or not a function (like foo() above) might change whatever object f is pointing to.
Under special circumstances (foo() is visible to the compiler, and all pointers passed to the check_flag() are known not to be aliased or otherwise modifiable by foo()) the compiler might be able to optimize the check.
However, pthread_cond_wait() must be implemented in a way that would prevent that optimization.
See Does guarding a variable with a pthread mutex guarantee it's also not cached?:
You might also be interested in Steve Jessop's answer to: Can a C/C++ compiler legally cache a variable in a register across a pthread library call?
But how far you want to take the issues raised by Boehm's paper in your own work is up to you. As far as I can tell, if you want to take the stand that pthreads doesn't/can't make the guarantee, then you're in essence taking the stand that pthreads is useless (or at least provides no safety guarantees, which I think by reduction has the same outcome). While this might be true in the strictest sense (as addressed in the paper), it's also probably not a useful answer. I'm not sure what option you'd have other than pthreads on Unix-based platforms.
Normally, you should try to lock the pthread mutex before waiting on the condition object as the pthread_cond_wait call release the mutex (and reacquire it before returning). So, your check_flag function should be rewritten like that to conform to the semantic on the pthread condition.
void check_flag(foo_t* f) {
pthread_mutex_lock(&f->m);
while(f->flag)
pthread_cond_wait(&f->c, &f->m);
pthread_mutex_unlock(&f->m);
}
Concerning the question of whether or not the compiler is allowed to optimize the reading of the flagfield, this answer explains it in more detail than I can.
Basically, the compiler know about the semantic of pthread_cond_wait, pthread_mutex_lock and pthread_mutex_unlock. He know that he can't optimize memory reading in those situation (the call to pthread_cond_wait in this exemple). There is no notion of memory barrier here, just a special knowledge of certain function, and some rule to follow in their presence.
There is another thing protecting you from optimization performed by the processor. Your average processor is capable of reordering memory access (read / write) provided that the semantic is conserved, and it is always doing it (as it allow to increase performance). However, this break when more than one processor can access the same memory address. A memory barrier is just an instruction to the processor telling it that it can move the read / write that were issued before the barrier and execute them after the barrier. It has finish them now.
As written, the compiler is free to cache the result as you describe or even in a more subtle way - by putting it into a register. You can prevent this optimization from taking place by making the variable volatile. But that is not necessarily enough - you should not code it this way! You should use condition variables as prescribed (lock, wait, unlock).
Trying to do work around the library is bad, but it gets worse. Perhaps reading Hans Boehm's paper on the general topic from PLDI 2005 ("Threads Cannot be Implemented as a Library"), or many of his follow-on articles (which lead up to work on a revised C++ memory model) will put the fear of God in you and steer you back to the straight and narrow :).
Volatile is for this purpose. Relying on the compiler to know about pthread coding practices seems a little nuts to me, although; compilers are pretty smart these days. In fact, the compiler probably sees that you are looping to test a variable and won't cache it in a register for that reason, not because it sees you using pthreads. Just use volatile if you really care.
Kind of funny little note. We have a VOLATILE #define that is either "volatile" (when we think the bug can't possibly be our code...) or blank. When we think we have a crash due to the optimizer killing us, we #define it "volatile" which puts volatile in front of almost everything. We then test to see if the problem goes away. So far... the bugs have been the developer and not the compiler! who'd have thought!? We have developed a high performance "non locking" and "non blocking" threading library. We have a test platform that hammers it to the point of thousands of races per second. So fare, we have never detected a problem needing volatile! So far gcc has never cached a shared variable in a register. yah...we are surprised too. We are still waiting for our chance to use volatile!

Why doesn't POSIX mmap return a volatile void*?

Mmap returns a void*, but not a volatile void*. If I'm using mmap to map shared memory, then another process could be writing to that memory, which means two subsequent reads from the same memory location can yield different values -- the exact situation volatile is meant for. So why doesn't it return a volatile void*?
My best guess is that if you have a process that's exclusively writing to the shared memory segment, it doesn't need to look at the shared memory through volatile pointers because it will always have the right understanding of what's present; any optimizations the compiler does to prevent redundant reads won't matter since there is nothing else writing and changing the values under its feet. Or is there some other historical reason? I'm inclined to say returning volatile void* would be a safer default, and those wanting this optimization could then manually cast to void*.
POSIX mmap description: http://opengroup.org/onlinepubs/007908775/xsh/mmap.html
Implementing shared memory is only one small subset of the uses of mmap(). In fact the most common uses are creating private mappings, both anonymous and file-backed. This means that, even if we accepted your contention about requiring a volatile-qualified pointer for shared memory access, such a qualifier would be superfluous in the general case.
Remember that you can always add final qualifiers to a pointer type without casting, but you can't remove them. So, with the current mmap() declaration, you can do both this:
volatile char *foo = mmap(); /* I need volatile */
and this:
char *bar = mmap(); /* But _I_ do not */
With your suggestion, the users in the common case would have to cast the volatile away.
The deeply-held assumption running through many software systems is that most programmers are sequential programmers. This has only recently started to change.
mmap has dozens of uses not related to shared memory. In the event that a programmer is writing a multithreaded program, they must take their own steps to ensure safety. Protecting each variable with a mutex is not the default. Likewise, mmap does not assume that another thread will make contentious accesses to the same shared-memory segment, or even that a segment so mapped will be accessible by another thread.
I'm also unconvinced that marking the return of mmap as volatile will have an effect on this. A programmer would still have to ensure safety in access to the mapped region, no?
Being volatile would only cover a single read (which depending on the architecture might be 32 bit or something else, and thus be quite limiting. Often you'll need to write more than 1 machine word, and you'll anyway have to introduce some sort of locking.
Even if it were volatile, you could easily have 2 processes reading different values from the same memory, all it takes is a 3. process to write to the memory in the nanosecond between the read from the 1. process and the read from the 2. process(unless you can guarantee the 2 processes reading the same memory within almost exact the same clock cycles.
Thus - it's pretty useless for mmap() to try to deal with these things, and is better left up to the programmer how to deal with access to the memory and mark the pointer as volatile where needed - if the memory is shared - you will need to have all partys involved be cooperative and aware of how they can update the memory in relation to eachother - something out of scope of mmap, and something volative will not solve.
I don't think volatile does what you think it does.
Basically, it just tells the compiler not to optimize the variable by storing its value in a register. This forces it to retrieve the value each time you reference it, which is a good idea if another thread (or whatever) could have updated it in the interim.
The function returns a void*, but it's not going to be updated, so calling it volatile is meaningless. Even if you assigned the value to a local volatile void*, nothing would be gained.
The type volatile void * or void * volatile is nonsensical: you cannot dereference a void *, so it doesn't make sense to specify type qualifiers to it.
And, since you anyway need a cast to char * or whatever your data type, then perhaps that is the right place to specify volatility. Thus, the API as defined nicely side-steps the responsibility of marking the memory changable-under-your-feet/volatile.
That said, from a big picture POV, I agree with you: mmap should have a return type stating that the compiler should not cache this range.
It's probably done that way for performance reasons, providing nothing extra by default. If you know that on your particular architecture that writes/reads won't be reordered by the processor you may not need volatile at all (possibly in conjuction with other synchronization). EDIT: this was just an example - there may be a variety of other cases where you know that you don't need to force a reread every time the memory is accessed.
If you need to ensure that all the addresses are read from memory each time they're accessed, const_cast (or C-style cast) volatile onto the return value yourself.

Using C/Pthreads: do shared variables need to be volatile?

In the C programming language and Pthreads as the threading library; do variables/structures that are shared between threads need to be declared as volatile? Assuming that they might be protected by a lock or not (barriers perhaps).
Does the pthread POSIX standard have any say about this, is this compiler-dependent or neither?
Edit to add: Thanks for the great answers. But what if you're not using locks; what if you're using barriers for example? Or code that uses primitives such as compare-and-swap to directly and atomically modify a shared variable...
As long as you are using locks to control access to the variable, you do not need volatile on it. In fact, if you're putting volatile on any variable you're probably already wrong.
https://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/
The answer is absolutely, unequivocally, NO. You do not need to use 'volatile' in addition to proper synchronization primitives. Everything that needs to be done are done by these primitives.
The use of 'volatile' is neither necessary nor sufficient. It's not necessary because the proper synchronization primitives are sufficient. It's not sufficient because it only disables some optimizations, not all of the ones that might bite you. For example, it does not guarantee either atomicity or visibility on another CPU.
But unless you use volatile, the compiler is free to cache the shared data in a register for any length of time... if you want your data to be written to be predictably written to actual memory and not just cached in a register by the compiler at its discretion, you will need to mark it as volatile. Alternatively, if you only access the shared data after you have left a function modifying it, you might be fine. But I would suggest not relying on blind luck to make sure that values are written back from registers to memory.
Right, but even if you do use volatile, the CPU is free to cache the shared data in a write posting buffer for any length of time. The set of optimizations that can bite you is not precisely the same as the set of optimizations that 'volatile' disables. So if you use 'volatile', you are relying on blind luck.
On the other hand, if you use sychronization primitives with defined multi-threaded semantics, you are guaranteed that things will work. As a plus, you don't take the huge performance hit of 'volatile'. So why not do things that way?
I think one very important property of volatile is that it makes the variable be written to memory when modified, and reread from memory each time it accessed. The other answers here mix volatile and synchronization, and it is clear from some other answers than this that volatile is NOT a sync primitive (credit where credit is due).
But unless you use volatile, the compiler is free to cache the shared data in a register for any length of time... if you want your data to be written to be predictably written to actual memory and not just cached in a register by the compiler at its discretion, you will need to mark it as volatile. Alternatively, if you only access the shared data after you have left a function modifying it, you might be fine. But I would suggest not relying on blind luck to make sure that values are written back from registers to memory.
Especially on register-rich machines (i.e., not x86), variables can live for quite long periods in registers, and a good compiler can cache even parts of structures or entire structures in registers. So you should use volatile, but for performance, also copy values to local variables for computation and then do an explicit write-back. Essentially, using volatile efficiently means doing a bit of load-store thinking in your C code.
In any case, you positively have to use some kind of OS-level provided sync mechanism to create a correct program.
For an example of the weakness of volatile, see my Decker's algorithm example at http://jakob.engbloms.se/archives/65, which proves pretty well that volatile does not work to synchronize.
There is a widespread notion that the keyword volatile is good for multi-threaded programming.
Hans Boehm points out that there are only three portable uses for volatile:
volatile may be used to mark local variables in the same scope as a setjmp whose value should be preserved across a longjmp. It is unclear what fraction of such uses would be slowed down, since the atomicity and ordering constraints have no effect if there is no way to share the local variable in question. (It is even unclear what fraction of such uses would be slowed down by requiring all variables to be preserved across a longjmp, but that is a separate matter and is not considered here.)
volatile may be used when variables may be "externally modified", but the modification in fact is triggered synchronously by the thread itself, e.g. because the underlying memory is mapped at multiple locations.
A volatile sigatomic_t may be used to communicate with a signal handler in the same thread, in a restricted manner. One could consider weakening the requirements for the sigatomic_t case, but that seems rather counterintuitive.
If you are multi-threading for the sake of speed, slowing down code is definitely not what you want. For multi-threaded programming, there two key issues that volatile is often mistakenly thought to address:
atomicity
memory consistency, i.e. the order of a thread's operations as seen by another thread.
Let's deal with (1) first. Volatile does not guarantee atomic reads or writes. For example, a volatile read or write of a 129-bit structure is not going to be atomic on most modern hardware. A volatile read or write of a 32-bit int is atomic on most modern hardware, but volatile has nothing to do with it. It would likely be atomic without the volatile. The atomicity is at the whim of the compiler. There's nothing in the C or C++ standards that says it has to be atomic.
Now consider issue (2). Sometimes programmers think of volatile as turning off optimization of volatile accesses. That's largely true in practice. But that's only the volatile accesses, not the non-volatile ones. Consider this fragment:
volatile int Ready;
int Message[100];
void foo( int i ) {
Message[i/10] = 42;
Ready = 1;
}
It's trying to do something very reasonable in multi-threaded programming: write a message and then send it to another thread. The other thread will wait until Ready becomes non-zero and then read Message. Try compiling this with "gcc -O2 -S" using gcc 4.0, or icc. Both will do the store to Ready first, so it can be overlapped with the computation of i/10. The reordering is not a compiler bug. It's an aggressive optimizer doing its job.
You might think the solution is to mark all your memory references volatile. That's just plain silly. As the earlier quotes say, it will just slow down your code. Worst yet, it might not fix the problem. Even if the compiler does not reorder the references, the hardware might. In this example, x86 hardware will not reorder it. Neither will an Itanium(TM) processor, because Itanium compilers insert memory fences for volatile stores. That's a clever Itanium extension. But chips like Power(TM) will reorder. What you really need for ordering are memory fences, also called memory barriers. A memory fence prevents reordering of memory operations across the fence, or in some cases, prevents reordering in one direction.Volatile has nothing to do with memory fences.
So what's the solution for multi-threaded programming? Use a library or language extension that implements the atomic and fence semantics. When used as intended, the operations in the library will insert the right fences. Some examples:
POSIX threads
Windows(TM) threads
OpenMP
TBB
Based on article by Arch Robison (Intel)
In my experience, no; you just have to properly mutex yourself when you write to those values, or structure your program such that the threads will stop before they need to access data that depends on another thread's actions. My project, x264, uses this method; threads share an enormous amount of data but the vast majority of it doesn't need mutexes because its either read-only or a thread will wait for the data to become available and finalized before it needs to access it.
Now, if you have many threads that are all heavily interleaved in their operations (they depend on each others' output on a very fine-grained level), this may be a lot harder--in fact, in such a case I'd consider revisiting the threading model to see if it can possibly be done more cleanly with more separation between threads.
NO.
Volatile is only required when reading a memory location that can change independently of the CPU read/write commands. In the situation of threading, the CPU is in full control of read/writes to memory for each thread, therefore the compiler can assume the memory is coherent and optimizes the CPU instructions to reduce unnecessary memory access.
The primary usage for volatile is for accessing memory-mapped I/O. In this case, the underlying device can change the value of a memory location independently from CPU. If you do not use volatile under this condition, the CPU may use a previously cached memory value, instead of reading the newly updated value.
POSIX 7 guarantees that functions such as pthread_lock also synchronize memory
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11 "4.12 Memory Synchronization" says:
The following functions synchronize memory with respect to other threads:
pthread_barrier_wait()
pthread_cond_broadcast()
pthread_cond_signal()
pthread_cond_timedwait()
pthread_cond_wait()
pthread_create()
pthread_join()
pthread_mutex_lock()
pthread_mutex_timedlock()
pthread_mutex_trylock()
pthread_mutex_unlock()
pthread_spin_lock()
pthread_spin_trylock()
pthread_spin_unlock()
pthread_rwlock_rdlock()
pthread_rwlock_timedrdlock()
pthread_rwlock_timedwrlock()
pthread_rwlock_tryrdlock()
pthread_rwlock_trywrlock()
pthread_rwlock_unlock()
pthread_rwlock_wrlock()
sem_post()
sem_timedwait()
sem_trywait()
sem_wait()
semctl()
semop()
wait()
waitpid()
Therefore if your variable is guarded between pthread_mutex_lock and pthread_mutex_unlock then it does not need further synchronization as you might attempt to provide with volatile.
Related questions:
Does guarding a variable with a pthread mutex guarantee it's also not cached?
Does pthread_mutex_lock contains memory fence instruction?
Volatile would only be useful if you need absolutely no delay between when one thread writes something and another thread reads it. Without some sort of lock, though, you have no idea of when the other thread wrote the data, only that it's the most recent possible value.
For simple values (int and float in their various sizes) a mutex might be overkill if you don't need an explicit synch point. If you don't use a mutex or lock of some sort, you should declare the variable volatile. If you use a mutex you're all set.
For complicated types, you must use a mutex. Operations on them are non-atomic, so you could read a half-changed version without a mutex.
Volatile means that we have to go to memory to get or set this value. If you don't set volatile, the compiled code might store the data in a register for a long time.
What this means is that you should mark variables that you share between threads as volatile so that you don't have situations where one thread starts modifying the value but doesn't write its result before a second thread comes along and tries to read the value.
Volatile is a compiler hint that disables certain optimizations. The output assembly of the compiler might have been safe without it but you should always use it for shared values.
This is especially important if you are NOT using the expensive thread sync objects provided by your system - you might for example have a data structure where you can keep it valid with a series of atomic changes. Many stacks that do not allocate memory are examples of such data structures, because you can add a value to the stack then move the end pointer or remove a value from the stack after moving the end pointer. When implementing such a structure, volatile becomes crucial to ensure that your atomic instructions are actually atomic.
The underlying reason is that the C language semantic is based upon a single-threaded abstract machine. And the compiler is within its own right to transform the program as long as the program's 'observable behaviors' on the abstract machine stay unchanged. It can merge adjacent or overlapping memory accesses, redo a memory access multiple times (upon register spilling for example), or simply discard a memory access, if it thinks the program's behaviors, when executed in a single thread, doesn't change. Therefore as you may suspect, the behaviors do change if the program is actually supposed to be executing in a multi-threaded way.
As Paul Mckenney pointed out in a famous Linux kernel document:
It _must_not_ be assumed that the compiler will do what you want
with memory references that are not protected by READ_ONCE() and
WRITE_ONCE(). Without them, the compiler is within its rights to
do all sorts of "creative" transformations, which are covered in
the COMPILER BARRIER section.
READ_ONCE() and WRITE_ONCE() are defined as volatile casts on referenced variables. Thus:
int y;
int x = READ_ONCE(y);
is equivalent to:
int y;
int x = *(volatile int *)&y;
So, unless you make a 'volatile' access, you are not assured that the access happens exactly once, no matter what synchronization mechanism you are using. Calling an external function (pthread_mutex_lock for example) may force the compiler do memory accesses to global variables. But this happens only when the compiler fails to figure out whether the external function changes these global variables or not. Modern compilers employing sophisticated inter-procedure analysis and link-time optimization make this trick simply useless.
In summary, you should mark variables shared by multiple threads volatile or access them using volatile casts.
As Paul McKenney has also pointed out:
I have seen the glint in their eyes when they discuss optimization techniques that you would not want your children to know about!
But see what happens to C11/C++11.
Some people obviously are assuming that the compiler treats the synchronization calls as memory barriers. "Casey" is assuming there is exactly one CPU.
If the sync primitives are external functions and the symbols in question are visible outside the compilation unit (global names, exported pointer, exported function that may modify them) then the compiler will treat them -- or any other external function call -- as a memory fence with respect to all externally visible objects.
Otherwise, you are on your own. And volatile may be the best tool available for making the compiler produce correct, fast code. It generally won't be portable though, when you need volatile and what it actually does for you depends a lot on the system and compiler.
No.
First, volatile is not necessary. There are numerous other operations that provide guaranteed multithreaded semantics that don't use volatile. These include atomic operations, mutexes, and so on.
Second, volatile is not sufficient. The C standard does not provide any guarantees about multithreaded behavior for variables declared volatile.
So being neither necessary nor sufficient, there's not much point in using it.
One exception would be particular platforms (such as Visual Studio) where it does have documented multithreaded semantics.
Variables that are shared among threads should be declared 'volatile'. This tells the
compiler that when one thread writes to such variables, the write should be to memory
(as opposed to a register).

Resources