One thread reading and another writing to volatile variable - thread-safe? - c

In C I have a pointer that is declared volatile and initialized null.
void* volatile pvoid;
Thread 1 is occasionally reading the pointer value to check if it is non-null. Thread 1 will not set the value of the pointer.
Thread 2 will set the value of a pointer just once.
I believe I can get away without using a mutex or condition variable.
Is there any reason thread 1 will read a corrupted value or thread 2 will write a corrupted value?

To make it thread safe, you have to make atomic reads/writes to the variable, it being volatile is not safe in all timing situations. Under Win32 there are the Interlocked functions, under Linux you can build it yourself with assembly if you do not want to use the heavy weight mutexes and conditional variables.
If you are not against GPL then http://www.threadingbuildingblocks.org and its atomic<> template seems promising. The lib is cross platform.

In the case where the value fits in a single register, such as a memory aligned pointer, this is safe. In other cases where it might take more than one instruction to read or write the value, the read thread could get corrupted data. If you are not sure wether the read and write will take a single instruction in all usage scenarios, use atomic reads and writes.

Depends on your compiler, architecture and operating system. POSIX (since this question was tagged pthreads Im assuming we're not talking about windows or some other threading model) and C don't give enough constraints to have a portable answer to this question.
The safe assumption is of course to protect the access to the pointer with a mutex. However based on your description of the problem I wonder if pthread_once wouldn't be a better way to go. Granted there's not enough information in the question to say one way or the other.

Unfortunately, you cannot portably make any assumptions about what is atomic in pure C.
GCC, however, does provide some atomic built-in functions that take care of using the proper instructions for many architectures for you. See Chapter 5.47 of the GCC manual for more information.

Well this seems fine.. The only problem will happen in this case
let thread A be your checking thread and B the modifying one..
The thing is that checking for equality is not atomic technically first the values should be copied to registers then checked and then restored. Lets assume that thread A has copied to register, now B decides to change the value , now the value of your variable changes. So when control goes back to A it will say it is not null even though it SHUD be according to when the thread was called. This seems harmless in this program but MIGHT cause problems..
Use a mutex.. simple enuf.. and u can be sure you dont have sync errors!

On most platforms where a pointer value can be read/written in a single instruction, it either set or it isn't set yet. It can't be interrupted in the middle and contain a corrupted value. A mutex isn't needed on that kind of platform.

Related

Threading and Thread Safety in C

When there is a common set of global data that needs to be shared among several threaded processes, I typically have used a thread token to protect the shared resource:
Edit - 7/22/15 (to incorporate atomics as a viable option, per Jens comments)
My [First] question is, in C, if I write my routines in such a way as to guarantee each thread accesses one, and only one element of an array:
Is there any reason to think that asynchronous and simultaneous access to different indices of the same unprotected array (as shown in diagram) would be a problem?
Second question: Given that an object that can be accessed as
an atomic entity, even in the presence of asynchronous interrupts ( C99 - 7.14 Signal handling ) would using atomics be an effective method for thread protection for an otherwise unprotected variable?
Edit (Clarifications to address questions in comments to this point):
- Specifics for this application:
- Target OS: Windows 7/8/10
- Compiler : C99 compliant (cannot use C11, which include the _Atomic() type specifier )
- H/W : Intel i7 family
This (which looks like a C standard of some sort)
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf sayeth:
NOTE 1 Two threads of execution can update and access separate memory
locations without interfering with each other
NOTE 13 Compiler transformations that introduce assignments to a
potentially shared memory location that would not be modified by the
abstract machine are generally precluded by this standard, since such
an assignment might overwrite another assignment by a different thread
in cases in which an abstract machine execution would not have
encountered a data race. This includes implementations of data member
assignment that overwrite adjacent members in separate memory
locations. We also generally preclude reordering of atomic loads in
cases in which the atomics in question may alias, since this may
violate the "visible sequence" rules.
The way I understand it, this would preclude quamrana's concerns and guarantee you that unprotected writes to separate memory locations should never result in undefined behavior if there is no data race.
In C it will depend on your platform, that is your combination of compiler, processor architecture and operating system.
Your compiler can choose how to use the internal registers and instructions of the cpu to make the executable seem to perform the intent of the program. And C may know nothing about threads. It is usually the job of the operating system to provide a threading library.
There may be processors which might perform the write to an element of your array by reading a much larger patch of memory than just one element, then overwrite just the right bits that forms one element within internal registers and then writing the whole patch back. A single threaded program would work just fine, but two or more threads which interrupt each other could cause chaos in the array.
On the other hand it may work out just fine.
And as has been said, read-only access is always just fine.
Also, google is your friend. It found this stackoverflow question.
If each thread is accessing a different array element, and only the element it is "assigned", this shouldn't be a problem. Both scenarios above are essentially equivalent, since each array element has its own address.

multithreaded C/C++ variable no cache (Linux)

I use 2 pthreads, where one thread "notifies" the other one of an event, and for that there is a variable ( normal integer ), which is set by the second thread.
This works, but my question is, is it possible that the update is not seen immediately by the first (reading) thread, meaning the cache is not updated directly? And if so, is there a way to prevent this behaviour, e.g. like the volatile keyword in java?
(the frequency which the event occurs is approximately in microsecond range, so more or less immediate update needs to be enforced).
/edit: 2nd question: is it possible to enforce that the variable is hold in the cache of the core where thread 1 is, since this one is reading it all the time. ?
It sounds to me as though you should be using a pthread condition variable as your signaling mechanism. This takes care of all the issues you describe.
It may not be immediately visible by the other processors but not because of cache coherence. The biggest problems of visibility will be due to your processor's out-of-order execution schemes or due to your compiler re-ordering instructions while optimizing.
In order to avoid both these problems, you have to use memory barriers. I believe that most pthread primitives are natural memory barriers which means that you shouldn't expect loads or stores to be moved beyond the boundaries formed by the lock and unlock calls. The volatile keyword can also be useful to disable a certain class of compiler optimizations that can be useful when doing lock-free algorithms but it's not a substitute for memory barriers.
That being said, I recommend you don't do this manually and there are quite a few pitfalls associated with lock-free algorithms. Leaving these headaches to library writters should make you a happier camper (unless you're like me and you love headaches :) ). So my final recomendation is to ignore everything I said and use what vromanov or David Heffman suggested.
The most appropriate way to pass a signal from one thread to another should be to use the runtime library's signalling mechanisms, such as mutexes, condition variables, semaphores, and so forth.
If these have too high an overhead, my first thought would be that there was something wrong with the structure of the program. If it turned out that this really was the bottleneck, and restructuring the program was inappropriate, then I would use atomic operations provided by the compiler or a suitable library.
Using plain int variables, or even volatile-qualified ones is error prone, unless the compiler guarantees they have the appropriate semantics. e.g. MSVC makes particular guarantees about the atomicity and ordering constraints of plain loads and stores to volatile variables, but gcc does not.
Better way to use atomic variables. For sample you can use libatomic. volatile keyword not enough.

Real dangers of 2+ threads writing/reading a variable

What are the real dangers of simultaneous read/write to a single variable?
If I use one thread to write a variable and another to read the variable in a while loop and there is no danger if the variable is read while being written and an old value is used what else is a danger here?
Can a simultaneous read/write cause a thread crash or what happens on the low level when an exact simultaneous read/write occurs?
If two threads access a variable without suitable synchronization, and at least one of those accesses is a write then you have a data race and undefined behaviour.
How undefined behaviour manifests is entirely implementation dependent. On most modern architectures, you won't get a trap or exception or anything from the hardware, and it will read something, or store something. The thing is, it won't necessarily read or write what you expected.
e.g. with two threads incrementing a variable, you can miss counts, as described in my article at devx: http://www.devx.com/cplus/Article/42725
For a single writer and a single reader, the most common outcome will be that reader sees a stale value, but you might also see a partially-updated value if the update requires more than one cycle, or the variable is split across cache lines. What happens then depends on what you do with it --- if it's a pointer and you get a partially updated value then it might not be a valid pointer, and won't point to what you intended it to anyway, and then you might get any kind of corruption or error due to dereferencing an invalid pointer value. This may include formatting your hard disk or other bad consequences if the bad pointer value just happens to point to a memory mapped I/O register....
In general you get unexpected results. Wikipedia defines two distinct racing conditions:
A critical race occurs when the order in which internal variables are changed determines the eventual state that the state machine will end up in.
A non-critical race occurs when the order in which internal variables are changed does not alter the eventual state. In other words, a non-critical race occurs when moving to a desired state means that more than one internal state variable must be changed at once, but no matter in what order these internal state variables change, the resultant state will be the same.
So the output will not always get messed up, it depends on the code. It's good practice to always deal with racing conditions for later code scaling and preventing possible errors. Nothing is more annoying then not being able to trust your own data.
Two threads reading the same value is no problem at all.
The problem begins when one thread writes a non-atomic variable and another thread reads it. Then the results of the read are undefined. Since a thread may be preempted (stopped) at any time. Only operations on atomic variables are guaranteed to be non-breakable. Atomic actions are usually writes to int type variables.
If you have two threads accessing the same data, it is best practice + usually unavoidable to use locking (mutex, semaphore).
hth
Mario
Depends on the platform. For example, on Win32, then read and write ops of aligned 32bit values are atomic- that is, you can't half-read a new value and half-read an old value, and if you write, then when someone comes to read, either they get the full new value or the old value. That's not true for all values, or all platforms, of course.
Result is undefined.
Consider this code:
global int counter = 0;
tread()
{
for(i=0;i<10;i++)
{
counter=counter+1;
}
}
Problem is that if you have N threads result can be anything between 10 and N*10.
This is because it might happen all treads read same value increase it and then write value +1 back. But you asked if you can crash program or hardware.
It depends. In most cases are wrong results useless.
For solving this locking problem you need mutex or semaphore.
Mutex is lock for code. In upper case you would lock part of code in line
counter = counter+1;
Where semaphore is lock for variable
counter
Basicaly same thing for solving same type of problem.
Check for this tools in your tread library.
http://en.wikipedia.org/wiki/Mutual_exclusion
The worst that will happen depends on the implementation. There are so many completely independent implementations of pthreads, running on different systems and hardware, that I doubt anyone knows everything about all of them.
If p isn't a pointer-to-volatile then I think that a compiler for a conforming Posix implementation is allowed to turn:
while (*p == 0) {}
exit(0);
Into a single check of *p followed by an infinite loop that doesn't bother looking at the value of *p at all. In practice, it won't, so it's a question of whether you want to program to the standard, or program to undocumented observed behavior of the implementations you're using. The latter generally works for simple cases, and then you build on the code until you do something complicated enough that it unexpectedly doesn't work.
In practice, on a multi-CPU system that doesn't have coherent memory caches, it could be a very long time before that while loop ever sees a change made from a different CPU, because without memory barriers it might never update its cached view of main memory. But Intel has coherent caches, so most likely you personally won't see any delays long enough to care about. If some poor sucker ever tries to run your code on a more exotic architecture, they may end up having to fix it.
Back to theory, the setup you're describing could cause a crash. Imagine a hypothetical architecture where:
p points to a non-atomic type, like long long on a typical 32 bit architecture.
long long on that system has trap representations, for example because it has a padding bit used as a parity check.
the write to *p is half-complete when the read occurs
the half-write has updated some of the bits of the value, but has not yet updated the parity bit.
Bang, undefined behavior, you read a trap representation. It may be that Posix forbids certain trap representations that the C standard allows, in which case long long might not be a valid example for the type of *p, but I expect you can find a type for which trap representations are permitted.
If the variable being written to and from can not be updated or read atomically then it is possible for the reader to pick up a corrupt "partially updated" value.
You can see a partial update (e.g. you may see a long long variable with half of it coming from the new value and the other half coming from the old value).
You are not guaranteed to see the new value until you use a memory barrier (pthread_mutex_unlock() contains an implicit memory barrier).

Concurrent access to struct member

I'm using 32-bit microcontroller (STR91x). I'm concurrently accessing (from ISR and main loop) struct member of type enum. Access is limited to writing to that enum field in the ISR and checking in the main loop. Enum's underlying type is not larger than integer (32-bit).
I would like to make sure that I'm not missing anything and I can safely do it.
Provided that 32 bit reads and writes are atomic, which is almost certainly the case (you might want to make sure that your enum's word-aligned) then that which you've described will be just fine.
As paxdiablo & David Knell said, generally speaking this is fine. Even if your bus is < 32 bits, chances are the instruction's multiple bus cycles won't be interrupted, and you'll always read valid data.
What you stated, and what we all know, but it bears repeating, is that this is fine for a single-writer, N-reader situation. If you had more than one writer, all bets are off unless you have a construct to protect the data.
If you want to make sure, find the compiler switch that generates an assembly listing and examine the assembly for the write in the ISR and the read in the main loop. Even if you are not familiar with ARM assembly, I'm sure you could quickly and easily be able to discern whether or not the reads and writes are atomic.
ARM supports 32-bit aligned reads that are atomic as far as interrupts are concerned. However, make sure your compiler doesn't try to cache the value in a register! Either mark it as a volatile, or use an explicit memory barrier - on GCC this can be done like so:
int tmp = yourvariable;
__sync_synchronize(yourvariable);
Note, however, that current versions of GCC person a full memory barrier for __sync_synchronize, rather than just for the one variable, so volatile is probably better for your needs.
Further, note that your variable will be aligned automatically unless you are doing something Weird (ie, explicitly specifying the location of the struct in memory, or requesting a packed struct). Unaligned variables on ARM cannot be read atomically, so make sure it's aligned, or disable interrupts while reading.
Well, it depends entirely on your hardware but I'd be surprised if an ISR could be interrupted by the main thread.
So probably the only thing you have to watch out for is if the main thread could be interrupted halfway through a read (so it may get part of the old value and part of the new).
It should be a simple matter of consulting the specs to ensure that interrupts are only processed between instructions (this is likely since the alternative would be very complex) and that your 32-bit load is a single instruction.
An aligned 32 bit access will generally be atomic (unless it were a particularly ludicrous compiler!).
However the rock-solid solution (and one generally applicable to non-32 bit targets too) is to simply disable the interrupt temporarily while accessing the data outside of the interrupt. The most robust way to do this is through an access function to statically scoped data rather than making the data global where you then have no single point of access and therefore no way of enforcing an atomic access mechanism when needed.

Using C/Pthreads: do shared variables need to be volatile?

In the C programming language and Pthreads as the threading library; do variables/structures that are shared between threads need to be declared as volatile? Assuming that they might be protected by a lock or not (barriers perhaps).
Does the pthread POSIX standard have any say about this, is this compiler-dependent or neither?
Edit to add: Thanks for the great answers. But what if you're not using locks; what if you're using barriers for example? Or code that uses primitives such as compare-and-swap to directly and atomically modify a shared variable...
As long as you are using locks to control access to the variable, you do not need volatile on it. In fact, if you're putting volatile on any variable you're probably already wrong.
https://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/
The answer is absolutely, unequivocally, NO. You do not need to use 'volatile' in addition to proper synchronization primitives. Everything that needs to be done are done by these primitives.
The use of 'volatile' is neither necessary nor sufficient. It's not necessary because the proper synchronization primitives are sufficient. It's not sufficient because it only disables some optimizations, not all of the ones that might bite you. For example, it does not guarantee either atomicity or visibility on another CPU.
But unless you use volatile, the compiler is free to cache the shared data in a register for any length of time... if you want your data to be written to be predictably written to actual memory and not just cached in a register by the compiler at its discretion, you will need to mark it as volatile. Alternatively, if you only access the shared data after you have left a function modifying it, you might be fine. But I would suggest not relying on blind luck to make sure that values are written back from registers to memory.
Right, but even if you do use volatile, the CPU is free to cache the shared data in a write posting buffer for any length of time. The set of optimizations that can bite you is not precisely the same as the set of optimizations that 'volatile' disables. So if you use 'volatile', you are relying on blind luck.
On the other hand, if you use sychronization primitives with defined multi-threaded semantics, you are guaranteed that things will work. As a plus, you don't take the huge performance hit of 'volatile'. So why not do things that way?
I think one very important property of volatile is that it makes the variable be written to memory when modified, and reread from memory each time it accessed. The other answers here mix volatile and synchronization, and it is clear from some other answers than this that volatile is NOT a sync primitive (credit where credit is due).
But unless you use volatile, the compiler is free to cache the shared data in a register for any length of time... if you want your data to be written to be predictably written to actual memory and not just cached in a register by the compiler at its discretion, you will need to mark it as volatile. Alternatively, if you only access the shared data after you have left a function modifying it, you might be fine. But I would suggest not relying on blind luck to make sure that values are written back from registers to memory.
Especially on register-rich machines (i.e., not x86), variables can live for quite long periods in registers, and a good compiler can cache even parts of structures or entire structures in registers. So you should use volatile, but for performance, also copy values to local variables for computation and then do an explicit write-back. Essentially, using volatile efficiently means doing a bit of load-store thinking in your C code.
In any case, you positively have to use some kind of OS-level provided sync mechanism to create a correct program.
For an example of the weakness of volatile, see my Decker's algorithm example at http://jakob.engbloms.se/archives/65, which proves pretty well that volatile does not work to synchronize.
There is a widespread notion that the keyword volatile is good for multi-threaded programming.
Hans Boehm points out that there are only three portable uses for volatile:
volatile may be used to mark local variables in the same scope as a setjmp whose value should be preserved across a longjmp. It is unclear what fraction of such uses would be slowed down, since the atomicity and ordering constraints have no effect if there is no way to share the local variable in question. (It is even unclear what fraction of such uses would be slowed down by requiring all variables to be preserved across a longjmp, but that is a separate matter and is not considered here.)
volatile may be used when variables may be "externally modified", but the modification in fact is triggered synchronously by the thread itself, e.g. because the underlying memory is mapped at multiple locations.
A volatile sigatomic_t may be used to communicate with a signal handler in the same thread, in a restricted manner. One could consider weakening the requirements for the sigatomic_t case, but that seems rather counterintuitive.
If you are multi-threading for the sake of speed, slowing down code is definitely not what you want. For multi-threaded programming, there two key issues that volatile is often mistakenly thought to address:
atomicity
memory consistency, i.e. the order of a thread's operations as seen by another thread.
Let's deal with (1) first. Volatile does not guarantee atomic reads or writes. For example, a volatile read or write of a 129-bit structure is not going to be atomic on most modern hardware. A volatile read or write of a 32-bit int is atomic on most modern hardware, but volatile has nothing to do with it. It would likely be atomic without the volatile. The atomicity is at the whim of the compiler. There's nothing in the C or C++ standards that says it has to be atomic.
Now consider issue (2). Sometimes programmers think of volatile as turning off optimization of volatile accesses. That's largely true in practice. But that's only the volatile accesses, not the non-volatile ones. Consider this fragment:
volatile int Ready;
int Message[100];
void foo( int i ) {
Message[i/10] = 42;
Ready = 1;
}
It's trying to do something very reasonable in multi-threaded programming: write a message and then send it to another thread. The other thread will wait until Ready becomes non-zero and then read Message. Try compiling this with "gcc -O2 -S" using gcc 4.0, or icc. Both will do the store to Ready first, so it can be overlapped with the computation of i/10. The reordering is not a compiler bug. It's an aggressive optimizer doing its job.
You might think the solution is to mark all your memory references volatile. That's just plain silly. As the earlier quotes say, it will just slow down your code. Worst yet, it might not fix the problem. Even if the compiler does not reorder the references, the hardware might. In this example, x86 hardware will not reorder it. Neither will an Itanium(TM) processor, because Itanium compilers insert memory fences for volatile stores. That's a clever Itanium extension. But chips like Power(TM) will reorder. What you really need for ordering are memory fences, also called memory barriers. A memory fence prevents reordering of memory operations across the fence, or in some cases, prevents reordering in one direction.Volatile has nothing to do with memory fences.
So what's the solution for multi-threaded programming? Use a library or language extension that implements the atomic and fence semantics. When used as intended, the operations in the library will insert the right fences. Some examples:
POSIX threads
Windows(TM) threads
OpenMP
TBB
Based on article by Arch Robison (Intel)
In my experience, no; you just have to properly mutex yourself when you write to those values, or structure your program such that the threads will stop before they need to access data that depends on another thread's actions. My project, x264, uses this method; threads share an enormous amount of data but the vast majority of it doesn't need mutexes because its either read-only or a thread will wait for the data to become available and finalized before it needs to access it.
Now, if you have many threads that are all heavily interleaved in their operations (they depend on each others' output on a very fine-grained level), this may be a lot harder--in fact, in such a case I'd consider revisiting the threading model to see if it can possibly be done more cleanly with more separation between threads.
NO.
Volatile is only required when reading a memory location that can change independently of the CPU read/write commands. In the situation of threading, the CPU is in full control of read/writes to memory for each thread, therefore the compiler can assume the memory is coherent and optimizes the CPU instructions to reduce unnecessary memory access.
The primary usage for volatile is for accessing memory-mapped I/O. In this case, the underlying device can change the value of a memory location independently from CPU. If you do not use volatile under this condition, the CPU may use a previously cached memory value, instead of reading the newly updated value.
POSIX 7 guarantees that functions such as pthread_lock also synchronize memory
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11 "4.12 Memory Synchronization" says:
The following functions synchronize memory with respect to other threads:
pthread_barrier_wait()
pthread_cond_broadcast()
pthread_cond_signal()
pthread_cond_timedwait()
pthread_cond_wait()
pthread_create()
pthread_join()
pthread_mutex_lock()
pthread_mutex_timedlock()
pthread_mutex_trylock()
pthread_mutex_unlock()
pthread_spin_lock()
pthread_spin_trylock()
pthread_spin_unlock()
pthread_rwlock_rdlock()
pthread_rwlock_timedrdlock()
pthread_rwlock_timedwrlock()
pthread_rwlock_tryrdlock()
pthread_rwlock_trywrlock()
pthread_rwlock_unlock()
pthread_rwlock_wrlock()
sem_post()
sem_timedwait()
sem_trywait()
sem_wait()
semctl()
semop()
wait()
waitpid()
Therefore if your variable is guarded between pthread_mutex_lock and pthread_mutex_unlock then it does not need further synchronization as you might attempt to provide with volatile.
Related questions:
Does guarding a variable with a pthread mutex guarantee it's also not cached?
Does pthread_mutex_lock contains memory fence instruction?
Volatile would only be useful if you need absolutely no delay between when one thread writes something and another thread reads it. Without some sort of lock, though, you have no idea of when the other thread wrote the data, only that it's the most recent possible value.
For simple values (int and float in their various sizes) a mutex might be overkill if you don't need an explicit synch point. If you don't use a mutex or lock of some sort, you should declare the variable volatile. If you use a mutex you're all set.
For complicated types, you must use a mutex. Operations on them are non-atomic, so you could read a half-changed version without a mutex.
Volatile means that we have to go to memory to get or set this value. If you don't set volatile, the compiled code might store the data in a register for a long time.
What this means is that you should mark variables that you share between threads as volatile so that you don't have situations where one thread starts modifying the value but doesn't write its result before a second thread comes along and tries to read the value.
Volatile is a compiler hint that disables certain optimizations. The output assembly of the compiler might have been safe without it but you should always use it for shared values.
This is especially important if you are NOT using the expensive thread sync objects provided by your system - you might for example have a data structure where you can keep it valid with a series of atomic changes. Many stacks that do not allocate memory are examples of such data structures, because you can add a value to the stack then move the end pointer or remove a value from the stack after moving the end pointer. When implementing such a structure, volatile becomes crucial to ensure that your atomic instructions are actually atomic.
The underlying reason is that the C language semantic is based upon a single-threaded abstract machine. And the compiler is within its own right to transform the program as long as the program's 'observable behaviors' on the abstract machine stay unchanged. It can merge adjacent or overlapping memory accesses, redo a memory access multiple times (upon register spilling for example), or simply discard a memory access, if it thinks the program's behaviors, when executed in a single thread, doesn't change. Therefore as you may suspect, the behaviors do change if the program is actually supposed to be executing in a multi-threaded way.
As Paul Mckenney pointed out in a famous Linux kernel document:
It _must_not_ be assumed that the compiler will do what you want
with memory references that are not protected by READ_ONCE() and
WRITE_ONCE(). Without them, the compiler is within its rights to
do all sorts of "creative" transformations, which are covered in
the COMPILER BARRIER section.
READ_ONCE() and WRITE_ONCE() are defined as volatile casts on referenced variables. Thus:
int y;
int x = READ_ONCE(y);
is equivalent to:
int y;
int x = *(volatile int *)&y;
So, unless you make a 'volatile' access, you are not assured that the access happens exactly once, no matter what synchronization mechanism you are using. Calling an external function (pthread_mutex_lock for example) may force the compiler do memory accesses to global variables. But this happens only when the compiler fails to figure out whether the external function changes these global variables or not. Modern compilers employing sophisticated inter-procedure analysis and link-time optimization make this trick simply useless.
In summary, you should mark variables shared by multiple threads volatile or access them using volatile casts.
As Paul McKenney has also pointed out:
I have seen the glint in their eyes when they discuss optimization techniques that you would not want your children to know about!
But see what happens to C11/C++11.
Some people obviously are assuming that the compiler treats the synchronization calls as memory barriers. "Casey" is assuming there is exactly one CPU.
If the sync primitives are external functions and the symbols in question are visible outside the compilation unit (global names, exported pointer, exported function that may modify them) then the compiler will treat them -- or any other external function call -- as a memory fence with respect to all externally visible objects.
Otherwise, you are on your own. And volatile may be the best tool available for making the compiler produce correct, fast code. It generally won't be portable though, when you need volatile and what it actually does for you depends a lot on the system and compiler.
No.
First, volatile is not necessary. There are numerous other operations that provide guaranteed multithreaded semantics that don't use volatile. These include atomic operations, mutexes, and so on.
Second, volatile is not sufficient. The C standard does not provide any guarantees about multithreaded behavior for variables declared volatile.
So being neither necessary nor sufficient, there's not much point in using it.
One exception would be particular platforms (such as Visual Studio) where it does have documented multithreaded semantics.
Variables that are shared among threads should be declared 'volatile'. This tells the
compiler that when one thread writes to such variables, the write should be to memory
(as opposed to a register).

Resources