Are race conditions that write the same value safe? - c

Suppose I have some multi-threaded program using shared memory, in which multiple threads are at random times overwriting the value of some multi-byte variable (e.g. an int or a double), sometimes colliding which each other (a.k.a. a race condition), and reading the value from the same variable at random times too.
Assuming all the threads always write the same value to the memory address (e.g. each thread does x = 1000) - if a thread reads the variable at the exact moment that another thread(s) is/are overwriting it, is the variable guaranteed to have the correct value? or could the memory somehow get overwritten with something random?
That is, if all the threads always write x = 1000, can a thread read x and get something other than 1000?

Assuming all the threads always write the same value to the memory
address (e.g. each thread does x = 1000) - if a thread reads the
variable at the exact moment that another thread(s) is/are overwriting
it, is the variable guaranteed to have the correct value?
The C language specifications expressly decline to make such a guarantee by declaring the behavior of programs containing race conditions to be undefined. And you're right that without synchronization, the situation you describe is a race condition, notwithstanding whether the value being written is the same is the initial contents of the memory.
or could
the memory somehow get overwritten with something random?
The behavior is undefined. In principle, anything could happen, including the read seeing a value that was never stored at the location in question.
Note also that the race is not about any kind of objective simultaneity. Rather, it is about lack of synchronization that would prevent simultaneous access, regardless of whether any simultaneous access actually occurs.
In practice, you would probably find that on some implementations, under at least some circumstances, writes that do not change the contents of memory act as if they did not conflict with each other or with reads that happen after that value was first written to the location, where "happens after" is a technical term that depends in part on synchronization. I do not recommend depending on such behavior, however. Not even if it is documented.

The C Standard allows implementations which can cheaply offer stronger guarantees than what it mandates to do so. Consequently, the Standard bends over backward to avoid requiring implementations to uphold any guarantees whose costs might sometimes exceed their benefits, leaving the question of how and when to uphold guarantees whose benefits would exceed the costs as a Quality of Implementation outside the Standard's jurisdiction.
Although it would seem like it should cost nothing to guarantee that writing an object with the value it already contains would have no effect, upholding such a guarantee would sometimes require foregoing some potentially-useful optimizations. As a simple example, consider the following function
volatile zz;
unsigned test(unsigned *p, unsigned *q)
{
unsigned temp;
*p = 0x1234;
temp = *q;
zz = 1;
do {} while(zz);
*p = 0x1235;
return temp;
}
On some platforms, including the original 8088/8086, the most efficient way to process the code may be to replace the last assignment to *p with *p += 1; which could then be processed using an inc instruction. If the code were executed in two threads simultaneously, however, that could cause *p to be left holding the value 0x1236.
In many cases, upholding a guarantee that writing an object with the value it already contains would cost nothing, while treating race conditions involving such writes as benign would eliminate the cost of some synchronization actions that would be rendered unnecessary. Unfortunately, while the Standard allows implementations to offer guarantees beyond what the Standard requires when doing so would be practical and useful, it provides no means of distinguishing implementations that offer such guarantees from those that don't.

Related

Are concurrent unordered writes with fencing to shared memory undefined behavior?

I have heard that it is undefined behavior to read/write to the same location in memory concurrently, but I am unsure if the same is true when there are no clear race-conditions involved. I suspect that the c18 standard will state it is undefined behavior on principal due to the potential to create race conditions, but I am more interested in if this still counts as undefined behavior at an application level when these instances are surrounded by fencing.
Setup
For context, say we have two threads A and B, set up to operate on the same location in memory. It can be assumed that the shared memory mentioned here is not used or accessible anywhere else.
// Prior to the creation of these threads, the current thread has exclusive ownership of the shared memory
pthread_t a, b;
// Create two threads which operate on the same memory concurrently
pthread_create(&a, NULL, operate_on_shared_memory, NULL);
pthread_create(&b, NULL, operate_on_shared_memory, NULL);
// Join both threads giving the current thread exclusive ownership to shared memory
pthread_join(a, NULL);
pthread_join(b, NULL);
// Read from memory now that the current thread has exclusive ownership
printf("Shared Memory: %d\n", shared_memory);
Write/Write
Each thread then theoretically runs operate_on_shared_memory which mutates the value of shared_memory at the same time across both threads. However with the caveat that both threads attempt to set the shared memory to the same unchanging constant. Even if it is a race condition, the race winner should not matter. Does this count as undefined behavior? If so, why?
int shared_memory = 0;
void *operate_on_shared_memory(void *_unused) {
const int SOME_CONSTANT = 42;
shared_memory = SOME_CONSTANT;
return NULL;
}
Optional Branching Write/Write
If the previous version does not count as undefined behavior, then what about this example which first reads from shared_memory then writes the constant to a second location in shared memory. The important part here being that even if one or both threads succeeds in running the if statement, it should still have the same outcome.
int shared_memory = 0;
int other_shared_memory = 0;
void *operate_on_shared_memory(void *_unused) {
const int SOME_CONSTANT = 42;
if (shared_memory != SOME_CONSTANT) {
other_shared_memory = SOME_CONSTANT;
}
shared_memory = SOME_CONSTANT;
return NULL;
}
If this is undefined behavior, then why? If the only reason is that it introduces a race condition, is there any reason why I shouldn't deem it acceptable for one thread to potentially execute an extra machine instruction? Is it because the CPU or compiler may re-order memory operations? What if I were to put atomic_thread_fence at the start and end of the operate_on_shared_memory?
Context
GCC and Clang doesn't seem to have any complaints. I used c18 for this test, but I don't mind referring to a later standard if they are easier to reference.
$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
$ gcc -std=c18 main.c -pthread -g -O3 -Wall
If the value of an object is modified any time near where it is read via non-qualified lvalue, machine code generated by gcc may yield behavior which is inconsistent with any particular value the object held or could have held. Situations where such inconsistencies would occur are likely rare, but I don't think there's any way of judging whether such issues could arise in machine code generated from any particular source, except by inspecting the machine code in question.
For example, (godbolt link https://godbolt.org/z/T3jd6voax) the function:
unsigned test(unsigned short *p)
{
unsigned short temp = *p;
unsigned short result = temp - (temp>>15);
return result;
}
will be processed by gcc 10.2.1, when targeting the Cortex-M0 platform, by code equivalent to:
unsigned test(unsigned short *p)
{
unsigned short temp1 = *p;
signed short temp2 = *(signed short*)p;
unsigned short result = temp1 + (temp2 >> 15);
return result;
}
Although there is no way the original function could return 65535, the revised function may do so if the value at *p changes from 0 to 65535, or from 65535 to 0, between the two reads.
Many compilers are designed in a way that would inherently guarantee that an unqualified read of any word-size-or-smaller object will always yield a value the object has held or will hold in future, but unfortunately it is rare for compilers to explicitly document such things. The only compilers that wouldn't uphold such a guarantee would be those that process code using some sequence of steps which differs from that specified, but is expected to behave identically, and compiler writers seldom see any reason to enumerate--much less document--all of the transformations that they could perform, but don't.
As long as you don't plan to count every toggle of shared_memory and other_shared_memory, and if you don't care if some modifications aren't done or are done twice unnecessarily, it should work.
For example, if your code is planned to simply monitor/show another system's activity for end users, it's fine: a mismatch during one microsecond isn't a serious issue.
If you plan to sample precisely two inputs and get an accurate array of results, or do precise computations on thread's results in shared memory, then you're doing it very wrongly.
Here, your UB is mostly that you can't guarantee that shared_memory isn't modified between the test and the assignment.
I've numbered two lines in your code:
void *operate_on_shared_memory(void *_unused) {
const int SOME_CONSTANT = 42;
/*1*/if (shared_memory != SOME_CONSTANT) {
other_shared_memory = SOME_CONSTANT;
}
/*2*/shared_memory = SOME_CONSTANT;
return NULL;
}
When on line marked 1, if for example you're toggling shared_memory between two values (SOME_CONSTANT and SOME_CONSTANT_2), since it isn't atomic reads/writes you MAY read something different than the two used constants.
On line marked 2, it's the same: you can't be sure that you won't be interrupted by another write, and got finally a value that isn't SOME_CONSTANT or SOME_CONSTANT_2, but something else. Think about reading the upper part of one, and the lower part of the other.
Also, you can "miss" a true condition on line #1, and therefore miss an update to other_shared_memory, or do it twice because the write at line #2 will be messed up - so for next test line #1, value will be different from SOME_CONSTANT and you'll do an unwanted update.
All this depends on several factors, like:
Are your writes/reads atomic anyway, despite not being explicitely atomics?
Can your threads be really interrupted between lines #1 and #2, or are you "protected" (humm...) by scheduler/priorities?
Is shared memory tolerant to multiple concurrent access, or will you lock the chip that controls it if you do such an attempt?
You can't reply? That's why it's an undefined behavior...
In your particular situation, it MAY works. Or not. Or fails on my machine while working on yours.
"Undefined behavior" is usually not properly understood. What it really means is: "You cannot predict nor guarantee what the behavior will be for ALL possible platforms".
Nothing more, nothing less. It's not a guarantee of having problems, it's the absence of a guarantee of NOT having them. I may sounds like a subtle difference, but in fact it's a huge one.
By "platform", we mean the tuple build with:
An execution machine, including all currently running softwares,
An operating system, including its version and installed components,
A compiler chain, including its version and switches,
A build system, including all possible flags passed to compiler chain.
But UB doesn't mean "your program will act randomly"... A given set of CPU instruction will always produce the same result (in the same initial conditions), there is no randomness here. Obviously, they can be the wrong instructions for the problem you wish to solve, but it's reproductible. That's how we hunt bugs, BTW...
So, on a fixed platform, having an UB means "you can't predict what will happen". And in no way "you'll face pure randomness". In fact, a LOT of programs can even exploit UB, because they're known on this particular platform and it's easier/cheaper/faster than doing it the good way.
Or because, even if officially an UB, your compiler does finally the same thing as the other (i.e. there is an UB when downcasting an integer to a signed smaller integer, and char is usually signed... Near nobody cares.).
But once your code is written, you'll know what the behavior is: it won't be undefined anymore... For YOUR platform, and ONLY this platform. Update your OS or your compiler, launch another program that can mess the scheduling, use a faster CPU, and you MUST test again to check if behavior is still the same. And that's why it's annoying to have UB: it can works NOW. And cause a tricky bug a bit later.
It's one of the major reasons why industrial software often use "old" OSes and/or compiler: upgrading them is a HIGH risk of triggering/causing a bug because an update corrected what was a real bug, but the project's code exploited this bug (maybe unknowingly!) and the updated software now crash... Or worse, can destroy some hardware!
We're in 2022, I still have a project that uses an embedded 2008 Linux, with a GCC3, VS2008, C++98, and WinXP/Qt4 on user's machine. Project is actively maintained - and trust me, it's a pain. But upgrading software/platform? No way. Better deal with known bugs rather than discovering new ones. Cheaper, too.
One of my speciality is softwares porting, mostly from "old" platforms to new ones (often with 10 years or more between the two). I've faced this kind of things a LOT of times: it worked on old platform, it breaks on new one, and only because an UB was exploited then, and now the behavior (still undefined...) is not the same anymore.
I obviously don't speak about changing C/C++ standard, or machine's endianness, where you need anyway to rewrite code, or dealing with new OS features (like UAC on Windows). I speak about "normal" code, compiled without any warning, that behaves differently now and then. And you can't imagine how frequent it is, since no compiler will warn you about neither high-level UB (for example, non thread-safe functions) nor instruction-level UB (a simple cast or alias can fully hide it without ANY warning).

why reading a variable modified by other threads can be neither old value nor new value

It has been mentioned by several, for example here c++ what happens when in one thread write and in second read the same object? (is it safe?)
that if two threads are operating on the same variable without atomics and lock, reading the variable can return neither the old value nor the new value.
I don't understand why this can happen and I cannot find an example such things happen, I think load and store is always one single instruction which will not be interrupted, then why can this happen?
For on example, C may be implemented on hardware which supports only 16-bit accesses to memory. In this case, loading or storing a 32-bit integer requires two load or store instructions. A thread performing these two instructions may be interrupted between their executions, and another thread may execute before the first thread is resumed. If that other thread loads, it may load one new part and one old part. If it stores, it may store both parts, and the first thread, when resumed, will see one old part and one new part. And other such mixes are possible.
From a language-lawyer point of view (i.e. in terms of what the C or C++ spec says, without considering any particular hardware the program might be running on), operations are either defined or undefined, and if operations are undefined, then the program is allowed to do literally anything it wants to, because they don't want to constrain the performance of the language by forcing compiler writers to support any particular behavior for operations that the programmer should never allow to happen anyway.
From a practical standpoint, the most likely scenario (on common hardware) where you'd read a value that is neither-old-nor-new would be the "word-tearing" scenario; where (broadly speaking) the other thread has written to part of the variable at the instant your thread reads from it, but not to the other part, so you get half of the old value and half of the new value.
It has been mentioned by several, for example here c++ what happens when in one thread write and in second read the same object? (is it safe?) that if two threads are operating on the same variable without atomics and lock, reading the variable can return neither the old value nor the new value.
Correct. Undefined behavior is undefined.
I don't understand why this can happen and I cannot find an example such things happen, I think load and store is always one single instruction which will not be interrupted, then why can this happen?
Because undefined behavior is undefined. There is no requirement that you be able to think of any way it can go wrong. Do not ever think that because you can't think of some way something can break, that means it can't break.
For example, say there's a function that has an unsynchronized read in it. The compiler could conclude that therefore this function can never be called. If it's the only function that could modify a variable, then the compiler could omit reads to that variable. For example:
int j = 12;
// This is the only code that modifies j
int q = some_variable_another_thread_is_writing;
j = 0;
// other code
if (j != 12) important_function();
Since the only code that modifies j reads a variable another thread is writing, the compiler is free to assume that code will never execute, thus j will always be 12, and thus the test of j and the call to important_function can be optimized out. Ouch.
Here's another example:
if (some_function()) j = 0;
else j = 1;
If the implementation thinks that some_function will almost always return true and can prove some_function cannot access j, it is perfectly legal for it to optimize this to:
j = 0;
if (!some_function()) j++;
This will cause your code to break horribly if other threads mess with j without a lock or j is not a type defined to be atomic.
And do not ever think that some compiler optimization, though legal, will never happen. That has burned people over and over again as compilers get smarter.

Guaranteeing the order of execution without using volatile or memory barrier and locks

I have Question regarding on compiler changing the order of execution. I am trying to improve performance of a multi-thread program (C language) by replacing the critical section with a signaling mechanism (thorugh semaphore).
I need to guarantee the order of execution here, and have been doing some research on this. I saw many questions on the order of execution within a function, but not much discussion on a function within a function.
Based on https://en.wikipedia.org/wiki/Sequence_point rules #4, would the below code chunk guarantees that *p->a has to be evaluated first before func2 is entered since func2 takes p as an input (assuming the compiler adheres to the rules of schedule point defined here)?
func1 (struct *p) {
p->a = x;
func2 (p);
}
func2 (struct *p) {
p->b = y;
releaseSemaphore(s);
}
It is critical that p->b is set only after p->a is set as another thread is in a loop processing various request and identifies a valid request by whether p->b is set. Releasing semaphore only triggers the task if it is idle (and waiting for the semaphore), but if it is busy processing other requests, it will check p->b later, and we cannot guarantee that func1 is called only when that thread is idle.
No. Sequence point ordering does not transition over thread boundaries. That is the whole point of why we need memory ordering guarantees in the first place.
The sequence point ordering is always guaranteed (modulo as-if-rule) for the thread which executes the code. Any other thread might observe the writes of that thread in an arbitrary order. This means that even if Thread #1 can verify that it performs writes in a certain order, Thread #2 might still observe them in a different order. That is why volatile is also not enough here.
Technically this can be explained eg. by caches. The writes by Thread #1 might go to a write buffer first, where they will still be invisible to Thread #2. Only once the write buffer is flushed back to main memory they become visible and the hardware is allowed to reorder the writes before flushing.
Note that just because the platform is allowed to reorder writes does not mean that it will. This is the dangerous part. Code that will run perfectly fine on one platform might break out of the blue when being ported to another. Using proper memory orderings guarantees that the code will work everywhere.
Implementation can1 change the ordering as long as this isn't not done over function calls from other translation units.
Such reordering is orthogonal to multithreading, i.e. it is done in both singlethreaded and multithreaded programs.
If function func2 is in the same translation unit as func1, the execution could be done as if:
func1 (struct *p)
{
func2 (p);
p->a = x;
}
Use volatile iff you want to prevent2 such reorderings. (Note that this is done to prevent reordering mentioned above, not for other synchronization purposes. You will have to use atomic primitives for those.)
1 (Quoted from: ISO/IEC 9899:201x 5.1.2.3 Program execution 10)
Alternatively, an implementation might perform various optimizations within each translation unit, such
that the actual semantics would agree with the abstract semantics only when making function calls across
translation unit boundaries.
2 (Quoted from: ISO/IEC 9899:201x 6.7.3 Type qualifiers 7)
An object that has volatile-qualified type may be modified in ways unknown to the
implementation or have other unknown side effects. Therefore any expression referring
to such an object shall be evaluated strictly according to the rules of the abstract machine,
as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the
object shall agree with that prescribed by the abstract machine, except as modified by the
unknown factors mentioned previously.

Real dangers of 2+ threads writing/reading a variable

What are the real dangers of simultaneous read/write to a single variable?
If I use one thread to write a variable and another to read the variable in a while loop and there is no danger if the variable is read while being written and an old value is used what else is a danger here?
Can a simultaneous read/write cause a thread crash or what happens on the low level when an exact simultaneous read/write occurs?
If two threads access a variable without suitable synchronization, and at least one of those accesses is a write then you have a data race and undefined behaviour.
How undefined behaviour manifests is entirely implementation dependent. On most modern architectures, you won't get a trap or exception or anything from the hardware, and it will read something, or store something. The thing is, it won't necessarily read or write what you expected.
e.g. with two threads incrementing a variable, you can miss counts, as described in my article at devx: http://www.devx.com/cplus/Article/42725
For a single writer and a single reader, the most common outcome will be that reader sees a stale value, but you might also see a partially-updated value if the update requires more than one cycle, or the variable is split across cache lines. What happens then depends on what you do with it --- if it's a pointer and you get a partially updated value then it might not be a valid pointer, and won't point to what you intended it to anyway, and then you might get any kind of corruption or error due to dereferencing an invalid pointer value. This may include formatting your hard disk or other bad consequences if the bad pointer value just happens to point to a memory mapped I/O register....
In general you get unexpected results. Wikipedia defines two distinct racing conditions:
A critical race occurs when the order in which internal variables are changed determines the eventual state that the state machine will end up in.
A non-critical race occurs when the order in which internal variables are changed does not alter the eventual state. In other words, a non-critical race occurs when moving to a desired state means that more than one internal state variable must be changed at once, but no matter in what order these internal state variables change, the resultant state will be the same.
So the output will not always get messed up, it depends on the code. It's good practice to always deal with racing conditions for later code scaling and preventing possible errors. Nothing is more annoying then not being able to trust your own data.
Two threads reading the same value is no problem at all.
The problem begins when one thread writes a non-atomic variable and another thread reads it. Then the results of the read are undefined. Since a thread may be preempted (stopped) at any time. Only operations on atomic variables are guaranteed to be non-breakable. Atomic actions are usually writes to int type variables.
If you have two threads accessing the same data, it is best practice + usually unavoidable to use locking (mutex, semaphore).
hth
Mario
Depends on the platform. For example, on Win32, then read and write ops of aligned 32bit values are atomic- that is, you can't half-read a new value and half-read an old value, and if you write, then when someone comes to read, either they get the full new value or the old value. That's not true for all values, or all platforms, of course.
Result is undefined.
Consider this code:
global int counter = 0;
tread()
{
for(i=0;i<10;i++)
{
counter=counter+1;
}
}
Problem is that if you have N threads result can be anything between 10 and N*10.
This is because it might happen all treads read same value increase it and then write value +1 back. But you asked if you can crash program or hardware.
It depends. In most cases are wrong results useless.
For solving this locking problem you need mutex or semaphore.
Mutex is lock for code. In upper case you would lock part of code in line
counter = counter+1;
Where semaphore is lock for variable
counter
Basicaly same thing for solving same type of problem.
Check for this tools in your tread library.
http://en.wikipedia.org/wiki/Mutual_exclusion
The worst that will happen depends on the implementation. There are so many completely independent implementations of pthreads, running on different systems and hardware, that I doubt anyone knows everything about all of them.
If p isn't a pointer-to-volatile then I think that a compiler for a conforming Posix implementation is allowed to turn:
while (*p == 0) {}
exit(0);
Into a single check of *p followed by an infinite loop that doesn't bother looking at the value of *p at all. In practice, it won't, so it's a question of whether you want to program to the standard, or program to undocumented observed behavior of the implementations you're using. The latter generally works for simple cases, and then you build on the code until you do something complicated enough that it unexpectedly doesn't work.
In practice, on a multi-CPU system that doesn't have coherent memory caches, it could be a very long time before that while loop ever sees a change made from a different CPU, because without memory barriers it might never update its cached view of main memory. But Intel has coherent caches, so most likely you personally won't see any delays long enough to care about. If some poor sucker ever tries to run your code on a more exotic architecture, they may end up having to fix it.
Back to theory, the setup you're describing could cause a crash. Imagine a hypothetical architecture where:
p points to a non-atomic type, like long long on a typical 32 bit architecture.
long long on that system has trap representations, for example because it has a padding bit used as a parity check.
the write to *p is half-complete when the read occurs
the half-write has updated some of the bits of the value, but has not yet updated the parity bit.
Bang, undefined behavior, you read a trap representation. It may be that Posix forbids certain trap representations that the C standard allows, in which case long long might not be a valid example for the type of *p, but I expect you can find a type for which trap representations are permitted.
If the variable being written to and from can not be updated or read atomically then it is possible for the reader to pick up a corrupt "partially updated" value.
You can see a partial update (e.g. you may see a long long variable with half of it coming from the new value and the other half coming from the old value).
You are not guaranteed to see the new value until you use a memory barrier (pthread_mutex_unlock() contains an implicit memory barrier).

Do I need a lock when only a single thread writes to a shared variable?

I have 2 threads and a shared float global. One thread only writes to the variable while the other only reads from it, do I need to lock access to this variable? In other words:
volatile float x;
void reader_thread() {
while (1) {
// Grab mutex here?
float local_x = x;
// Release mutex?
do_stuff_with_value(local_x);
}
}
void writer_thread() {
while (1) {
float local_x = get_new_value_from_somewhere();
// Grab mutex here?
x = local_x;
// Release mutex?
}
}
My main concern is that a load or store of a float not being atomic, such that local_x in reader_thread ends up having a bogus, partially updated value.
Is this a valid concern?
Is there another way to guarantee atomicity without a mutex?
Would using sig_atomic_t as the shared variable work, assuming it has enough bits for my purposes?
The language in question is C using pthreads.
Different architectures have different rules, but in general, memory loads and stores of aligned, int-sized objects are atomic. Smaller and larger may be problematic. So if sizeof(float) == sizeof(int) you might be safe, but I still wouldn't depend on it in a portable program.
Also, the behavior of volatile isn't particularly well-defined... The specification uses it as a way to prevent optimizing away accesses to memory-mapped device I/O, but says nothing about its behavior on any other memory accesses.
In short, even if loads and stores are atomic on float x, I would use explicit memory barriers (though how varies by platform and compiler) in instead of depending on volatile. Without the guarantee of loads and stores being atomic, you would have to use locks, which do imply memory barriers.
According to section 24.4.7.2 of the GNU C library documentation:
In practice, you can assume that int and other integer types no longer than int are atomic. You can also assume that pointer types are atomic; that is very convenient. Both of these assumptions are true on all of the machines that the GNU C library supports and on all POSIX systems we know of.
float technically doesn't count under these rules, although if a float is the same size as an int on your architecture, what you could do is make your global variable an int, and then convert it to a float with a union every time you read or write it.
The safest course of action is to use some form of mutex to protect accesses to the shared variable. Since the critical sections are extremely small (reading/writing a single variable), you're almost certainly going to get better performance out of a light-weight mutex such as a spin lock, as opposed to a heavy-weight mutex that makes system calls to do its job.
I would lock it down. I'm not sure how large float is in your environment, but it might not be read/written in a single instruction so your reader could potentially read a half-written value. Remember that volatile doesn't say anything about atomicity of operations, it simply states that the read will come from memory instead of being cached in a register or something like that.
The assignment is not atomic, at least for some compilers, and in the sense that it takes a single instruction to perform. The following code was generated by Visual C++ 6.0 - f1 and f2 are of type float.
4: f2 = f1;
00401036 mov eax,dword ptr [ebp-4]
00401039 mov dword ptr [ebp-8],eax
c11 c17
In the memory model introduced by C11 and later, the clear answer is yes: you do need a lock or other means of synchronization, or else to declare the variable x as atomic_float using <stdatomic.h>.
If a non-atomic variable is written by one thread, and either read or written by another, without appropriate synchronization to ensure that one access happens before the other in the precise sense defined in the standard, then a data race exists and the behavior of the program becomes undefined. (In particular, the bad effects need not be limited to just getting a bogus value when you read the variable; the program is allowed to crash, corrupt unrelated data, etc.)
Note that the presence of volatile is irrelevant. Declaring a variable volatile does not save you from UB when a data race otherwise exists, and if a data race is avoided by use of atomic_float or otherwise, then volatile is not needed.
Since it's a single word in memory you're changing you should be fine with just the volatile declaration.
I don't think you guarantee you'll have the latest value when you read it though, unless you use a lock.
In all probability, no. Since you have no chance for write collision the only concern is whether you could read it while it's half-written. It's hugely unlikely that your code is going to be run on a platform where writing a float doesn't happen in a single operation if you're writing something with threads.
However it's possible because the definition of a float in C does not mandate that the underlying hardware storage be limited to the processor's word size. You could be compiling to machine code where, say, sign and mantissa are written in two different operations.
The real question, I think, is two questions: "what's the downside to having a mutex here?" and "What's the repercussions if I get a garbage read?"
Perhaps rather than a mutex you should write an assert that determines whether the storage size of a float is smaller or equal to the word size of the underlying CPU.

Resources