Is Setting a Variable Atomic in THESE conditions - c

I have this situation where I have a state variable; int state = (2, 1, 0)
and an infinite loop:
ret = BoolCompareAndSwap(state, 1, 2)
if (ret) {
// Change something ...
state = 0;
}
Would this state setting be atomic?
Assuming to set a variable you must:
Take out from memory
Change value
Set new value
If some other thread came and compared the variable, it would be atomic since the actual value doesn't change until it it re-set in memory, Correct?

Strictly speaking, C compilers would still be standard conforming if they wrote the state bitwise. After writing the first few bits, some other thread can read any kind of garbage.
Most compilers do no such thing (with the possible exception of compilers for ancient 4bit processors or even narrower ...), because it would be a performance loss.
Also, and more practically relevant, if any other thread writes (instead of only reading) to the state, that written value can get lost if you do not protect the described code against racing conditions.
As a side note, the described state change (read, modify, write) is never atomic. The question however when that non-atomicity is vulnerable, is valid and it is what I tried to answer above.
More generically speaking, thinking through all possible combinations of concurrent access is a valid protection mechanism. It is however extremely costly in many ways (design effort, test effort, risk during maintenance...).
Only if those costs are in total smaller than the intended saving (possibly performance), it is feasible to go that way, instead of using an appropriate protection mechanism.

Related

why reading a variable modified by other threads can be neither old value nor new value

It has been mentioned by several, for example here c++ what happens when in one thread write and in second read the same object? (is it safe?)
that if two threads are operating on the same variable without atomics and lock, reading the variable can return neither the old value nor the new value.
I don't understand why this can happen and I cannot find an example such things happen, I think load and store is always one single instruction which will not be interrupted, then why can this happen?
For on example, C may be implemented on hardware which supports only 16-bit accesses to memory. In this case, loading or storing a 32-bit integer requires two load or store instructions. A thread performing these two instructions may be interrupted between their executions, and another thread may execute before the first thread is resumed. If that other thread loads, it may load one new part and one old part. If it stores, it may store both parts, and the first thread, when resumed, will see one old part and one new part. And other such mixes are possible.
From a language-lawyer point of view (i.e. in terms of what the C or C++ spec says, without considering any particular hardware the program might be running on), operations are either defined or undefined, and if operations are undefined, then the program is allowed to do literally anything it wants to, because they don't want to constrain the performance of the language by forcing compiler writers to support any particular behavior for operations that the programmer should never allow to happen anyway.
From a practical standpoint, the most likely scenario (on common hardware) where you'd read a value that is neither-old-nor-new would be the "word-tearing" scenario; where (broadly speaking) the other thread has written to part of the variable at the instant your thread reads from it, but not to the other part, so you get half of the old value and half of the new value.
It has been mentioned by several, for example here c++ what happens when in one thread write and in second read the same object? (is it safe?) that if two threads are operating on the same variable without atomics and lock, reading the variable can return neither the old value nor the new value.
Correct. Undefined behavior is undefined.
I don't understand why this can happen and I cannot find an example such things happen, I think load and store is always one single instruction which will not be interrupted, then why can this happen?
Because undefined behavior is undefined. There is no requirement that you be able to think of any way it can go wrong. Do not ever think that because you can't think of some way something can break, that means it can't break.
For example, say there's a function that has an unsynchronized read in it. The compiler could conclude that therefore this function can never be called. If it's the only function that could modify a variable, then the compiler could omit reads to that variable. For example:
int j = 12;
// This is the only code that modifies j
int q = some_variable_another_thread_is_writing;
j = 0;
// other code
if (j != 12) important_function();
Since the only code that modifies j reads a variable another thread is writing, the compiler is free to assume that code will never execute, thus j will always be 12, and thus the test of j and the call to important_function can be optimized out. Ouch.
Here's another example:
if (some_function()) j = 0;
else j = 1;
If the implementation thinks that some_function will almost always return true and can prove some_function cannot access j, it is perfectly legal for it to optimize this to:
j = 0;
if (!some_function()) j++;
This will cause your code to break horribly if other threads mess with j without a lock or j is not a type defined to be atomic.
And do not ever think that some compiler optimization, though legal, will never happen. That has burned people over and over again as compilers get smarter.

embedded C - using "volatile" to assert consistency

Consider the following code:
// In the interrupt handler file:
volatile uint32_t gSampleIndex = 0; // declared 'extern'
void HandleSomeIrq()
{
gSampleIndex++;
}
// In some other file
void Process()
{
uint32_t localSampleIndex = gSampleIndex; // will this be optimized away?
PrevSample = RawSamples[(localSampleIndex + 0) % NUM_RAW_SAMPLE_BUFFERS];
CurrentSample = RawSamples[(localSampleIndex + 1) % NUM_RAW_SAMPLE_BUFFERS];
NextSample = RawSamples[(localSampleIndex + 2) % NUM_RAW_SAMPLE_BUFFERS];
}
My intention is that PrevSample, CurrentSample and NextSample are consistent, even if gSampleIndex is updated during the call to Process().
Will the assignment to the localSampleIndex do the trick, or is there any chance it will be optimized away even though gSampleIndex is volatile?
In principle, volatile is not enough to guarantee that Process only sees consistent values of gSampleIndex. In practice, however, you should not run into any issues if uinit32_t is directly supported by the hardware. The proper solution would be to use atomic accesses.
The problem
Suppose that you are running on a 16-bit architecture, so that the instruction
localSampleIndex = gSampleIndex;
gets compiled into two instructions (loading the upper half, loading the lower half). Then the interrupt might be called between the two instructions, and you'll get half of the old value combined with half of the new value.
The solution
The solution is to access gSampleCounter using atomic operations only. I know of three ways of doing that.
C11 atomics
In C11 (supported since GCC 4.9), you declare your variable as atomic:
#include <stdatomic.h>
atomic_uint gSampleIndex;
You then take care to only ever access the variable using the documented atomic interfaces. In the IRQ handler:
atomic_fetch_add(&gSampleIndex, 1);
and in the Process function:
localSampleIndex = atomic_load(gSampleIndex);
Do not bother with the _explicit variants of the atomic functions unless you're trying to get your program to scale across large numbers of cores.
GCC atomics
Even if your compiler does not support C11 yet, it probably has some support for atomic operations. For example, in GCC you can say:
volatile int gSampleIndex;
...
__atomic_add_fetch(&gSampleIndex, 1, __ATOMIC_SEQ_CST);
...
__atomic_load(&gSampleIndex, &localSampleIndex, __ATOMIC_SEQ_CST);
As above, do not bother with weak consistency unless you're trying to achieve good scaling behaviour.
Implementing atomic operations yourself
Since you're not trying to protect against concurrent access from multiple cores, just race conditions with an interrupt handler, it is possible to implement a consistency protocol using standard C primitives only. Dekker's algorithm is the oldest known such protocol.
In your function you access volatile variable just once (and it's the only volatile one in that function) so you don't need to worry about code reorganization that compiler may do (and volatile prevents). What standard says for these optimizations at ยง5.1.2.3 is:
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
Note last sentence: "...no needed side effects are produced (...accessing a volatile object)".
Simply volatile will prevent any optimization compiler may do around that code. Just to mention few: no instruction reordering respect other volatile variables. no expression removing, no caching, no value propagation across functions.
BTW I doubt any compiler may break your code (with or without volatile). Maybe local stack variable will be elided but value will be stored in a registry (for sure it won't repeatedly access a memory location). What you need volatile for is value visibility.
EDIT
I think some clarification is needed.
Let me safely assume you know what you're doing (you're working with interrupt handlers so this shouldn't be your first C program): CPU word matches your variable type and memory is properly aligned.
Let me also assume your interrupt is not reentrant (some magic cli/sti stuff or whatever your CPU uses for this) unless you're planning some hard-time debugging and tuning.
If these assumptions are satisfied then you don't need atomic operations. Why? Because localSampleIndex = gSampleIndex is atomic (because it's properly aligned, word size matches and it's volatile), with ++gSampleIndex there isn't any race condition (HandleSomeIrq won't be called again while it's still in execution). More than useless they're wrong.
One may think: "OK, I may not need atomic but why I can't use them? Even if such assumption are satisfied this is an *extra* and it'll achieve same goal" . No, it doesn't. Atomic has not same semantic of volatile variables (and seldom volatile is/should be used outside memory mapped I/O and signal handling). Volatile (usually) is useless with atomic (unless a specific architecture says it is) but it has a great difference: visibility. When you update gSampleIndex in HandleSomeIrq standard guarantees that value will be immediately visible to all threads (and devices). with atomic_uint standard guarantees it'll be visible in a reasonable amount of time.
To make it short and clear: volatile and atomic are not the same thing. Atomic operations are useful for concurrency, volatile are useful for lower level stuff (interrupts, devices). If you're still thinking "hey they do *exactly* what I need" please read few useful links picked from comments: cache coherency and a nice reading about atomics.
To summarize:
In your case you may use an atomic variable with a lock (to have both atomic access and value visibility) but no one on this earth would put a lock inside an interrupt handler (unless absolutely definitely doubtless unquestionably needed, and from code you posted it's not your case).

When do I need to use volatile in ISRs?

I am making embedded firmware where everything after initialization happens in ISRs. I have variables that are shared between them, and I am wondering in which cases they need to be volatile. I never block, waiting for a change in another ISR.
When can I be certain that actual memory is read or written to when not using volatile? Once every ISR?
Addendum:
This is for ARM Cortex-M0, but this isn't really a question about ISRs as much as it is about compiler optimization, and as such, the platform shouldn't really be important.
The question is entirely answerable, and the answer is simple:
Without volatile you (simplistically) can't assume that actual memory access will ever happen - the compiler is free to conclude that results are either entirely unused (if that is apparent in what it can see), or that they may be safely cached in a register, or computed out of order (as long as visible dependencies are maintained).
You need volatile to tell the compiler that the side effects of access may be important to something the optimizer is unable to analyze, such as an interrupt context or the context of a different "thread".
In effect volatile is how you say to the compiler "I know something you don't, so don't try to be clever here"
Beware that volatile does not guarantee atomicity of read-modify-write, or of even read or write alone where the data type (or its misalignment) requires muti-step access. In those cases, you risk not just getting a stale value, but an entirely erroneous one.
it is already mentioned that actual write to memory/cache is not exactly predictable when using non volatile variables.
but it is also worth mentioning about another aspect where the volatile variable might get cached and might require a forced cache flush to write in to the actual memory ( depends on whether a write-through or a write-back policy is used).
consider another case where the volatile variable is not cached ( placed in non-cacheable area)
but due to the presence of write buffers and bus bridges sometimes it is not predictable when the real write happens to the intended register and it requires a dummy read to ensure that write actually happened to the real register/memory. This is particularly helpful to avoid race conditions in interrupt clearing/masking.
even though compilers are not supposed to be clever around volatile variables.it is free to do some optimizations with respect to volatile sequence points ( optimization across sequence points not permitted, but optimization between sequence points are permitted)
The variables that need volatile are:
1) Share data between ISR and program data or other threads.
Preferable these are flags that indicate block access to various data structures.
// main() code;
disable_interrupts();
if (flag == 0) {
flag = 1;
enable_interrupts();
Manipulate_data();
flag = 0;
} else {
enable_interrupts();
Cope_with_data_unavailable();
}
2) Memory mapped hardware registers.
This "memory" can change at any time due to hardware conditions and the complier needs to know that their values are not necessarily consistent. Without volatile, a naive comapiler would only sample fred once resulting in a potential endless loop.
volatile int *fred = 0x1234; // Hardware reg at address 0x1234;
while (*fred);

multithreaded C/C++ variable no cache (Linux)

I use 2 pthreads, where one thread "notifies" the other one of an event, and for that there is a variable ( normal integer ), which is set by the second thread.
This works, but my question is, is it possible that the update is not seen immediately by the first (reading) thread, meaning the cache is not updated directly? And if so, is there a way to prevent this behaviour, e.g. like the volatile keyword in java?
(the frequency which the event occurs is approximately in microsecond range, so more or less immediate update needs to be enforced).
/edit: 2nd question: is it possible to enforce that the variable is hold in the cache of the core where thread 1 is, since this one is reading it all the time. ?
It sounds to me as though you should be using a pthread condition variable as your signaling mechanism. This takes care of all the issues you describe.
It may not be immediately visible by the other processors but not because of cache coherence. The biggest problems of visibility will be due to your processor's out-of-order execution schemes or due to your compiler re-ordering instructions while optimizing.
In order to avoid both these problems, you have to use memory barriers. I believe that most pthread primitives are natural memory barriers which means that you shouldn't expect loads or stores to be moved beyond the boundaries formed by the lock and unlock calls. The volatile keyword can also be useful to disable a certain class of compiler optimizations that can be useful when doing lock-free algorithms but it's not a substitute for memory barriers.
That being said, I recommend you don't do this manually and there are quite a few pitfalls associated with lock-free algorithms. Leaving these headaches to library writters should make you a happier camper (unless you're like me and you love headaches :) ). So my final recomendation is to ignore everything I said and use what vromanov or David Heffman suggested.
The most appropriate way to pass a signal from one thread to another should be to use the runtime library's signalling mechanisms, such as mutexes, condition variables, semaphores, and so forth.
If these have too high an overhead, my first thought would be that there was something wrong with the structure of the program. If it turned out that this really was the bottleneck, and restructuring the program was inappropriate, then I would use atomic operations provided by the compiler or a suitable library.
Using plain int variables, or even volatile-qualified ones is error prone, unless the compiler guarantees they have the appropriate semantics. e.g. MSVC makes particular guarantees about the atomicity and ordering constraints of plain loads and stores to volatile variables, but gcc does not.
Better way to use atomic variables. For sample you can use libatomic. volatile keyword not enough.

Real dangers of 2+ threads writing/reading a variable

What are the real dangers of simultaneous read/write to a single variable?
If I use one thread to write a variable and another to read the variable in a while loop and there is no danger if the variable is read while being written and an old value is used what else is a danger here?
Can a simultaneous read/write cause a thread crash or what happens on the low level when an exact simultaneous read/write occurs?
If two threads access a variable without suitable synchronization, and at least one of those accesses is a write then you have a data race and undefined behaviour.
How undefined behaviour manifests is entirely implementation dependent. On most modern architectures, you won't get a trap or exception or anything from the hardware, and it will read something, or store something. The thing is, it won't necessarily read or write what you expected.
e.g. with two threads incrementing a variable, you can miss counts, as described in my article at devx: http://www.devx.com/cplus/Article/42725
For a single writer and a single reader, the most common outcome will be that reader sees a stale value, but you might also see a partially-updated value if the update requires more than one cycle, or the variable is split across cache lines. What happens then depends on what you do with it --- if it's a pointer and you get a partially updated value then it might not be a valid pointer, and won't point to what you intended it to anyway, and then you might get any kind of corruption or error due to dereferencing an invalid pointer value. This may include formatting your hard disk or other bad consequences if the bad pointer value just happens to point to a memory mapped I/O register....
In general you get unexpected results. Wikipedia defines two distinct racing conditions:
A critical race occurs when the order in which internal variables are changed determines the eventual state that the state machine will end up in.
A non-critical race occurs when the order in which internal variables are changed does not alter the eventual state. In other words, a non-critical race occurs when moving to a desired state means that more than one internal state variable must be changed at once, but no matter in what order these internal state variables change, the resultant state will be the same.
So the output will not always get messed up, it depends on the code. It's good practice to always deal with racing conditions for later code scaling and preventing possible errors. Nothing is more annoying then not being able to trust your own data.
Two threads reading the same value is no problem at all.
The problem begins when one thread writes a non-atomic variable and another thread reads it. Then the results of the read are undefined. Since a thread may be preempted (stopped) at any time. Only operations on atomic variables are guaranteed to be non-breakable. Atomic actions are usually writes to int type variables.
If you have two threads accessing the same data, it is best practice + usually unavoidable to use locking (mutex, semaphore).
hth
Mario
Depends on the platform. For example, on Win32, then read and write ops of aligned 32bit values are atomic- that is, you can't half-read a new value and half-read an old value, and if you write, then when someone comes to read, either they get the full new value or the old value. That's not true for all values, or all platforms, of course.
Result is undefined.
Consider this code:
global int counter = 0;
tread()
{
for(i=0;i<10;i++)
{
counter=counter+1;
}
}
Problem is that if you have N threads result can be anything between 10 and N*10.
This is because it might happen all treads read same value increase it and then write value +1 back. But you asked if you can crash program or hardware.
It depends. In most cases are wrong results useless.
For solving this locking problem you need mutex or semaphore.
Mutex is lock for code. In upper case you would lock part of code in line
counter = counter+1;
Where semaphore is lock for variable
counter
Basicaly same thing for solving same type of problem.
Check for this tools in your tread library.
http://en.wikipedia.org/wiki/Mutual_exclusion
The worst that will happen depends on the implementation. There are so many completely independent implementations of pthreads, running on different systems and hardware, that I doubt anyone knows everything about all of them.
If p isn't a pointer-to-volatile then I think that a compiler for a conforming Posix implementation is allowed to turn:
while (*p == 0) {}
exit(0);
Into a single check of *p followed by an infinite loop that doesn't bother looking at the value of *p at all. In practice, it won't, so it's a question of whether you want to program to the standard, or program to undocumented observed behavior of the implementations you're using. The latter generally works for simple cases, and then you build on the code until you do something complicated enough that it unexpectedly doesn't work.
In practice, on a multi-CPU system that doesn't have coherent memory caches, it could be a very long time before that while loop ever sees a change made from a different CPU, because without memory barriers it might never update its cached view of main memory. But Intel has coherent caches, so most likely you personally won't see any delays long enough to care about. If some poor sucker ever tries to run your code on a more exotic architecture, they may end up having to fix it.
Back to theory, the setup you're describing could cause a crash. Imagine a hypothetical architecture where:
p points to a non-atomic type, like long long on a typical 32 bit architecture.
long long on that system has trap representations, for example because it has a padding bit used as a parity check.
the write to *p is half-complete when the read occurs
the half-write has updated some of the bits of the value, but has not yet updated the parity bit.
Bang, undefined behavior, you read a trap representation. It may be that Posix forbids certain trap representations that the C standard allows, in which case long long might not be a valid example for the type of *p, but I expect you can find a type for which trap representations are permitted.
If the variable being written to and from can not be updated or read atomically then it is possible for the reader to pick up a corrupt "partially updated" value.
You can see a partial update (e.g. you may see a long long variable with half of it coming from the new value and the other half coming from the old value).
You are not guaranteed to see the new value until you use a memory barrier (pthread_mutex_unlock() contains an implicit memory barrier).

Resources