Is int safe to read from multiple threads? - c

I have multiple threads reading same int variable.
and one thread is writing the value.
I don't care about the race condition.
only my concern is writing and reading int value at same time is memory safe ?
and it will not result in any application crash .

Yes, that should be all right. The only way I can envision that crashing is if one of the threads deallocates the memory backing that integer. For best results I would also make sure the integers are aligned at sizeof(int) boundaries. (Some CPUs cannot access integers at all without this alignment. Others provide weaker guarantees of atomicity for unaligned access.)

Yes, on x86 and x86-64, as long as the value you're reading is aligned properly. 32-bit ints, they need to be aligned on a 4-byte boundary in order for access to be atomic when reading or writing, which will almost always be the case unless you go out of your way to create unaligned ints (say, by using a packed structure or by doing casting/pointer arithmetic with byte buffers).
You probably also want to declare your variable as volatile so that the compiler will generate code that will re-fetch the variable from memory every time it's accessed. That will prevent it from making optimizations such as caching it in a register when it might be altered by another thread.

On all Linux platforms that I know of, reads and writes of aligned int's are atomic and safe. You will never read a value that wasn't written (no word tearing). You will never cause a fault or crash.

Related

Achieve atomic increment on 64bit variable on 32 bit system

I am trying to do an atomic increment on a 64bit variable on a 32 bit system. I am trying to use atomic_fetch_add_explicit(&system_tick_counter_us,1, memory_order_relaxed);
But the compiler throws out an error - warning: large atomic operation may incur significant performance penalty; the access size (8 bytes) exceeds the max lock-free size (4 bytes) [-Watomic-alignment]
My question is how I can achieve atomicity without using critical sections.
how I can achieve atomicity without using critical sections?
On an object that is larger than the size of a single memory "word?" You probably can't. End of story. Use a mutex. Or, use atomic<...> and accept that the library will use a mutex on your behalf.
You can't even do this using 32bit data on systems which have to read the memory, modify the value and save it. (RMW). Almost all (if not all) RISC processors do not have instructions modifying the memory in a single instruction. It includes all ARM-Cortex micros, RISCV-V and many many other processors.
Many of them have special hardware mechanisms which can help archive the atomic access (at least preventing other processes to access the data). Cortex-M cores have LDREX, STREX instructions, some have hardware mutexes or semaphores but they still require the programmer to provide atomic (or at least mutually excluded access) to the memory location
If you need to read the 64-bit value while the program is running then you probably can't do this safely without a mutex as others have said, but on the off-chance that you only need to read this value after all of the threads have finished, then you can implement this with an array of 2 32-bit atomic variables.
Since your system can only guarantee atomicity of this type on 4-byte memory regions, you should use those instead to maximize performance, for instance:
#include <stdio.h>
#include <threads.h>
#include <stdatomic.h>
_Atomic uint32_t system_tick_counter_us[2];
Then increment one of those two 4-byte atomic variables whenever you want to increment an 8-byte one, then check if it overflowed, and if it did, atomically increment the other. Keep in mind that atomic_fetch_add_explicit returns the value of the atomic variable before it was incremented, so it's important to check for the value that will cause the overflow, not zero.
if(atomic_fetch_add_explicit(&system_tick_counter_us[0], 1, memory_order_relaxed) == (uint32_t)0-1)
atomic_fetch_add_explicit(&system_tick_counter_us[1], 1, memory_order_relaxed);
However, as I mentioned, this can cause a race condition in the case that the 64-bit variable is constructed between system_tick_counter_us[0] overflowing and that same thread incrementing system_tick_counter_us[1] but if you can find a way to guarantee that all threads are done executing the two lines above, then this is a safe solution.
The 64-bit value can be constructed as ((uint64_t)system_tick_counter_us[1] << 32) | (uint64_t)system_tick_counter_us[0] once you're sure the memory is no longer being modified

Using bit fields for AVR ports

I'd like to be able to use something like this to make access to my ports clearer:
typedef struct {
unsigned rfid_en: 1;
unsigned lcd_en: 1;
unsigned lcd_rs: 1;
unsigned lcd_color: 3;
unsigned unused: 2;
} portc_t;
extern volatile portc_t *portc;
But is it safe? It works for me, but...
1) Is there a chance of race conditions?
2) Does gcc generate read-modify-write cycles for code that modifies a single field?
3) Is there a safe way to update multiple fields?
4) Is the bit packing and order guaranteed? (I don't care about portability in this case, so gcc-specific options to make it Do What I Mean are fine.)
Handling race conditions must be done by operating system level calls (which will indeed use read-modify-writes), GCC won't do that.
Idem., and no GCC does not generate read-modify-write instructions for volatile. However, a CPU will normally do the write atomically (simply because it's one instruction). This holds true if the bit-field stays within an int for example, but this is CPU/implementation dependent; I mean some may guarantee this up to 8-byte value, while other only up to 4-byte values. So under that condition, bits can't be mixed up (i.e. a few written from one thread, and others from another thread won't occur).
The only way to set multiple fields at the same time, is to set these values in an intermediate variable, and then assign this variable to the volatile.
The C standard specifies that bits are packed together (it seems that there might be exceptions when you start mixing types, but I've never seen that; everyone always uses unsigned ...).
Note: Defining something volatile does not cause a compiler to generate read-modify-writes. What volatile does is telling the compiler that an assignment to that pointer/address must always be made, and may not be optimised away.
Here's another post about the same subject matter. I found there to be quite a few other places where you can find more details.
The keyword volatile has nothing to do with race conditions, or what thread is accessing code. The keyword tells the compiler not to cache the value in registers. It tells the compiler to generate code so that every access goes to the location allocated to the variable, because each access may see a different value. This is the case with memory mapped peripherals. This doesn't help if your MPU has it's own cache. There are usually special instructions or un-cached areas of the memory map to ensure the location, and not a cached copy, is read.
As for being thread safe, just remember that even a memory access may not be thread safe is it is done in two instructions. E.g. in 8051 assembler, you have to get a 16 bit value one byte at a time. The instruction sequence can be interrupted by an IRQ or another thread and the second byte read or written, potentially corrupted.

Read and Write atomic operation implementation in the Linux Kernel

Recently I've peeked into the Linux kernel implementation of an atomic read and write and a few questions came up.
First the relevant code from the ia64 architecture:
typedef struct {
int counter;
} atomic_t;
#define atomic_read(v) (*(volatile int *)&(v)->counter)
#define atomic64_read(v) (*(volatile long *)&(v)->counter)
#define atomic_set(v,i) (((v)->counter) = (i))
#define atomic64_set(v,i) (((v)->counter) = (i))
For both read and write operations, it seems that the direct approach was taken to read from or write to the variable. Unless there is another trick somewhere, I do not understand what guarantees exist that this operation will be atomic in the assembly domain. I guess an obvious answer will be that such an operation translates to one assembly opcode, but even so, how is that guaranteed when taking into account the different memory cache levels (or other optimizations)?
On the read macros, the volatile type is used in a casting trick. Anyone has a clue how this affects the atomicity here? (Note that it is not used in the write operation)
I think you are misunderstanding the (very much vague) usage of the word "atomic" and "volatile" here. Atomic only really means that the words will be read or written atomically (in one step, and guaranteeing that the contents of this memory position will always be one write or the other, and not something in between). And the volatile keyword tells the compiler to never assume the data in that location due to an earlier read/write (basically, never optimize away the read).
What the words "atomic" and "volatile" do NOT mean here is that there's any form of memory synchronization. Neither implies ANY read/write barriers or fences. Nothing is guaranteed with regards to memory and cache coherence. These functions are basically atomic only at the software level, and the hardware can optimize/lie however it deems fit.
Now as to why simply reading is enough: the memory models for each architecture are different. Many architectures can guarantee atomic reads or writes for data aligned to a certain byte offset, or x words in length, etc. and vary from CPU to CPU. The Linux kernel contains many defines for the different architectures that let it do without any atomic calls (CMPXCHG, basically) on platforms that guarantee (sometimes even only in practice even if in reality their spec says the don't actually guarantee) atomic reads/writes.
As for the volatile, while there is no need for it in general unless you're accessing memory-mapped IO, it all depends on when/where/why the atomic_read and atomic_write macros are being called. Many compilers will (though it is not set in the C spec) generate memory barriers/fences for volatile variables (GCC, off the top of my head, is one. MSVC does for sure.). While this would normally mean that all reads/writes to this variable are now officially exempt from just about any compiler optimizations, in this case by creating a "virtual" volatile variable only this particular instance of a read/write is off-limits for optimization and re-ordering.
The reads are atomic on most major architectures, so long as they are aligned to a multiple of their size (and aren't bigger than the read size of a give type), see the Intel Architecture manuals. Writes on the other hand many be different, Intel states that under x86, single byte write and aligned writes may be atomic, under IPF (IA64), everything use acquire and release semantics, which would make it guaranteed atomic, see this.
the volatile prevents the compiler from caching the value locally, forcing it to be retrieve where ever there is access to it.
If you write for a specific architecture, you can make assumptions specific to it.
I guess IA-64 does compile these things to a single instruction.
The cache shouldn't be an issue, unless the counter crosses a cache line boundry. But if 4/8 byte alignment is required, this can't happen.
A "real" atomic instruction is required when a machine instruction translates into two memory accesses. This is the case for increments (read, increment, write) or compare&swap.
volatile affects the optimizations the compiler can do.
For example, it prevents the compiler from converting multiple reads into one read.
But on the machine instruction level, it does nothing.

Concurrent access to struct member

I'm using 32-bit microcontroller (STR91x). I'm concurrently accessing (from ISR and main loop) struct member of type enum. Access is limited to writing to that enum field in the ISR and checking in the main loop. Enum's underlying type is not larger than integer (32-bit).
I would like to make sure that I'm not missing anything and I can safely do it.
Provided that 32 bit reads and writes are atomic, which is almost certainly the case (you might want to make sure that your enum's word-aligned) then that which you've described will be just fine.
As paxdiablo & David Knell said, generally speaking this is fine. Even if your bus is < 32 bits, chances are the instruction's multiple bus cycles won't be interrupted, and you'll always read valid data.
What you stated, and what we all know, but it bears repeating, is that this is fine for a single-writer, N-reader situation. If you had more than one writer, all bets are off unless you have a construct to protect the data.
If you want to make sure, find the compiler switch that generates an assembly listing and examine the assembly for the write in the ISR and the read in the main loop. Even if you are not familiar with ARM assembly, I'm sure you could quickly and easily be able to discern whether or not the reads and writes are atomic.
ARM supports 32-bit aligned reads that are atomic as far as interrupts are concerned. However, make sure your compiler doesn't try to cache the value in a register! Either mark it as a volatile, or use an explicit memory barrier - on GCC this can be done like so:
int tmp = yourvariable;
__sync_synchronize(yourvariable);
Note, however, that current versions of GCC person a full memory barrier for __sync_synchronize, rather than just for the one variable, so volatile is probably better for your needs.
Further, note that your variable will be aligned automatically unless you are doing something Weird (ie, explicitly specifying the location of the struct in memory, or requesting a packed struct). Unaligned variables on ARM cannot be read atomically, so make sure it's aligned, or disable interrupts while reading.
Well, it depends entirely on your hardware but I'd be surprised if an ISR could be interrupted by the main thread.
So probably the only thing you have to watch out for is if the main thread could be interrupted halfway through a read (so it may get part of the old value and part of the new).
It should be a simple matter of consulting the specs to ensure that interrupts are only processed between instructions (this is likely since the alternative would be very complex) and that your 32-bit load is a single instruction.
An aligned 32 bit access will generally be atomic (unless it were a particularly ludicrous compiler!).
However the rock-solid solution (and one generally applicable to non-32 bit targets too) is to simply disable the interrupt temporarily while accessing the data outside of the interrupt. The most robust way to do this is through an access function to statically scoped data rather than making the data global where you then have no single point of access and therefore no way of enforcing an atomic access mechanism when needed.

Does a CPU assigns a value atomically to memory?

A quick question I've been wondering about for some time; Does the CPU assign values atomically, or, is it bit by bit (say for example a 32bit integer).
If it's bit by bit, could another thread accessing this exact location get a "part" of the to-be-assigned value?
Think of this:
I have two threads and one shared "unsigned int" variable (call it "g_uiVal").
Both threads loop.
On is printing "g_uiVal" with printf("%u\n", g_uiVal).
The second just increase this number.
Will the printing thread ever print something that is totally not or part of "g_uiVal"'s value?
In code:
unsigned int g_uiVal;
void thread_writer()
{
g_uiVal++;
}
void thread_reader()
{
while(1)
printf("%u\n", g_uiVal);
}
Depends on the bus widths of the CPU and memory. In a PC context, with anything other than a really ancient CPU, accesses of up to 32 bit accesses are atomic; 64-bit accesses may or may not be. In the embedded space, many (most?) CPUs are 32 bits wide and there is no provision for anything wider, so your int64_t is guaranteed to be non-atomic.
I believe the only correct answer is "it depends". On what you may ask?
Well for starters which CPU. But also some CPUs are atomic for writing word width values, but only when aligned. It really is not something you can guarantee at a C language level.
Many compilers offer "intrinsics" to emit correct atomic operations. These are extensions which act like functions, but emit the correct code for your target architecture to get the needed atomic operations. For example: http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html
You said "bit-by-bit" in your question. I don't think any architecture does operations a bit at a time, except with some specialized serial protocol busses. Standard memory read/writes are done with 8, 16, 32, or 64 bits of granularity. So it is POSSIBLE the operation in your example is atomic.
However, the answer is heavily platform dependent.
It depends on the CPU's capabilities.
Can the hardware do an atomic 32-bit
operation? Here's a hint: If the
variable you are working on is larger
than the native register size (e.g.
64-bit int on a 32-bit system), it's
definitely NOT atomic.
It depends on how the compiler
generates the machine code. It could
have turned your 32-bit variable
access into 4x 8-bit memory reads.
It gets tricky if the address of what
you are accessing is not aligned
across a machine's natural word
boundary. You can hit a a cache
fault or page fault.
It is VERY POSSIBLE that you would see a corrupt or unexpected value using the code example that you posted.
Your platform probably provides some method of doing atomic operations. In the case of a Windows platform, it is via the Interlocked functions. In the case of Linux/Unix, look at the atomic_t type.
To add to what has been said so far - another potential concern is caching. CPUs tend to work with the local (on die) memory cache which may or may not be immediately flushed back to the main memory. If the box has more than one CPU, it is possible that another CPU will not see the changes for some time after the modifying CPU made them - unless there is some synchronization command informing all CPUs that they should synchronize their on-die caches. As you can imagine such synchronization can considerably slow the processing down.
Don't forget that the compiler assumes single-thread when optimizing, and this whole thing could just go away.
POSIX defines the special type sig_atomic_t which guarentees that writes to it are atomic with respect to signals, which will make it also atomic from the point of view of other threads like you want. They don't specifically define an atomic cross-thread type like this, since thread communication is expected to be mediated by mutexes or other sychronization primitives.
Considering modern microprocessors (and ignoring microcontrollers), the 32-bit assignment is atomic, not bit-by-bit.
However, now completely off of your question's topic... the printing thread could still print something that is not expected because of the lack of synchronization in this example, of course, due to instruction reordering and multiple cores each with their own copy of g_uiVal in their caches.

Resources