Atomicity of 16-bit operations in a 32-bit system

Atomicity of 16-bit operations in a 32-bit system - c

Considering a 32-bit system (such as an ARM RISC MCU), how can one ensure that 16-bit variables are written/ read in an atomic way? Based on this doc, If I understood correctly, both 16-bit and 8-bit operations are atomic, but only assuming the memory is aligned. Question is, does the compiler always align the memory to 32-bit words (excluding cases like packed structures)?
The rationale here is to use uint16_t whenever possible instead of uint32_t for better code portability between 32-bit and 16-bit platforms. This is not about typedefing a type that is different on either platform (16 or 32 bit).

The compiler may align any (scalar) object as it pleases unless it is part of an array or similar, there's no restriction or guarantee from C. Arrays are guaranteed to be allocated contiguously however, with no padding. And the first member of a struct/union is guaranteed by be aligned (since the address of the struct/union may be converted to a pointer to the type of the first member).
To get atomic operations, you have to use something like atomic_uint_fast16_t (stdatomic.h) if supported by the compiler. Otherwise any operation in C cannot be regarded as atomic no matter the type, period.
It is a common mistake to think "8 bit copy is atomic on my CPU so if I use 8 bit types my code is re-entrant". It isn't, because uint8_t x = y; is not guaranteed to be done in a single instruction, nor is it guaranteed to result in atomic machine code instructions.
And even if an atomic instruction is picked, you could still get real-time bugs from code like that. Example with pseudo-machine code:
Store the contents of y in register A.
An interrupt which only occurs once every full moon fires, changing y.
The old value of y is stored in x - yay, this is an atomic instruction!
Now x has the old, outdated value.
Correct real-time behavior would have been to either completely update x before the interrupt hit, or alternatively update it after the interrupt hit.

The place were I work has a special agreement with all the relevant compiler suppliers that they are responsible for notifying our tool chain experts of cases were that assumption is not applicable.
This turned out to be necessary, because it otherwise cannot be reliably determined. Documentation does not say and the standard even lesss.
Or that is what the tooling people told me.
So it seems that for you the answer is: Ask the supplier.

Related

Is it guaranteed that the padding bits of "zeroed" structure will be zeroed in C?

This statement in the article made me embarrassed:
C permits an implementation to insert padding into structures (but not into arrays) to ensure that all fields have a useful alignment for the target. If you zero a structure and then set some of the fields, will the padding bits all be zero? According to the results of the survey, 36 percent were sure that they would be, and 29 percent didn't know. Depending on the compiler (and optimization level), it may or may not be.
It was not completely clear, so I turned to the standard. The ISO/IEC 9899 in §6.2.6.1 states:
When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.
Also in §6.7.2.1:
The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
I just remembered that I recently implemented let's say some kind of hack, where I used the not-declared part of byte owned by bit-field. It was something like:
/* This struct is always allocated on the heap and is zeroed. */
struct some_struct {
/* initial part ... */
enum {
ONE,
TWO,
THREE,
FOUR,
} some_enum:8;
unsigned char flag:1;
unsigned char another_flag:1;
unsigned int size_of_smth;
/* ... remaining part */
};
The structure was not at my disposal therefore I couldn't change it, but I had an acute need to pass some information through it. So I calculated an address of corresponding byte like:
unsigned char *ptr = &some->size_of_smth - 1;
*ptr |= 0xC0; /* set flags */
Then later I checked flags the same way.
Also I should mention that the target compiler and platform were defined, so it's not a cross-platform thing. However, current questions are still take a place:
Can I rely on the fact that the padding bits of struct (in heap) will be still zeroed after memset/kzalloc/whatever and after some subsequent using? (This post does not disclose the topic in terms of the standard and safeguards for the further use of struct). And what about struct zeroed on stack like = {0}?
If yes, does it mean that I can safely use "unnamed"/"not declared" part of bit-field to transfer some info for my purposes everywhere (different platform, compiler, ..) in C? (If I know for sure that no one crazy is trying to store anything in this byte).

The short answer to your first question is "no".
While an appropriate call of memset(), such as memset(&some_struct_instance, 0, sizeof(some_struct)) will set all bytes in the structure to zero, that change is not required to be persistent after "some use" of some_struct_instance, such as setting any of the members within it.
So, for example, there is no guarantee that some_struct_instance.some_enum = THREE (i.e. storing a value into a member) will leave any padding bits in some_struct_instance unchanged. The only requirement in the standard is that values of other members of the structure are unaffected. However, the compiler may (in emitted object code or machine instructions) implement the assignment using some set of bitwise operations, and be allowed to take shortcuts in a way that doesn't leave the padding bits alone (e.g. by not emitting instructions that would otherwise ensure the padding bits are unaffected).
Even worse, a simple assignment like some_struct_instance = some_other_struct_instance (which, by definition, is the storing of a value into some_struct_instance) comes with no guarantees about the values of padding bits. It is not guaranteed that the padding bits in some_struct_instance will be set to the same bitwise values as padding bits in some_other_struct_instance, nor is there a guarantee that the padding bits in some_struct_instance will be unchanged. This is because the compiler is allowed to implement the assignment in whatever means it deems most "efficient" (e.g. copying memory verbatim, some set of member-wise assignments, or whatever) but - since the value of padding bits after the assignment are unspecified - is not required to ensure the padding bits are unchanged.
If you get lucky, and fiddling with the padding bits works for your purpose, it will not be because of any support in the C standard. It will be because of good graces of the compiler vendor (e.g. choosing to emit a set of machine instructions that ensure padding bits are not changed). And, practically, there is no guarantee that the compiler vendor will keep doing things the same way - for example, your code that relies on such a thing may break when the compiler is updated, when you choose different optimisation settings, or whatever.
Since the answer to your first question is "no", there is no need to answer your second question. However, philosophically, if you are trying to store data in padding bits of a structure, it is reasonable to assert that someone else - crazy or not - may potentially attempt to do the same thing, but using an approach that messes up the data you are attempting to pass around.

From the first words of the standard specification:
C permits an implementation to insert padding into structures (but not into arrays) to ensure that all fields have a useful alignment ...
These words mean that, in the aim to optimize (optimize for speed, probably, but also to avoid architecture restrictions on data/address buses), the compiler can make use of hidden, not-used, bits or bytes. NOT-USED because they would be forbidden or costly to address.
This also imply that those bytes or bits should not be visible from a programming perspective, and it should be considered a programming error to try to access those hidden data.
About those added data, the standard says that their content is "unspecified", and there is really no better way to state what an implementation can do with them. Think at those bitfield declarations, where you can declare integers with any bit width: no normal hardware will permit to read/write from memory in chunks smaller that 8 bits, so the CPU will always read or write at least 8 bits (sometimes, even more). Why should a compiler (an implementation) take care of doing something useful to those other bits, which the programmer specified he does not care about? It's a non sense: the programmer didn't give a name to some memory address, but then he wants to manipulate it?
The padding bytes between fields is pretty much the same matter as before: those added bytes are necessary, but the programmer is not interested in them - and he SHOULD NOT change its mind later!
Of course, one can study an implementation and arrive at some conclusion like "padding bytes will always be zeroed" or something like that. This is risky (are you sure they will be always-always zeroed?) but, more important, it is totally useless: if you need more data in a structure, simply declare them! And you will have no problem, never, even porting the source to different platforms or implementations.

It is reasonable to start with the expectation that what is listed in the standard is correctly implemented. You're looking for further assurances for a particular architecture. Personally, if I could find documented details about that particular architecture, I would be reassured; if not, I would be cautious.
What constituted "cautious" would depend on how confident I needed to be. For example, building a detailed test set and running this periodically on my target architecture would give me a reasonable degree of confidence, but it's all about how much risk you want to take. If it's really, really important, stick to what they standards guarantee you; if it's less so, test and see if you can get enough confidence for what you need.

Casting element of uint8_t to 32-bit variable on 8-bit platform or 32-bit platform

Lets consider two examples
1: 8 bit MCU/MPU/Platform - Little endian
uint8_t arr[5] = {0x1,0x2,0x3,0x4,0x5};//assume &arr[0] == 0x0
uint32_t *ui32 = (uint32_t*)&arr[1];
What is the value of *ui32? 0x2030405?
Is it necessary uint32_t variable to be placed to an address multiple of 4 at this platform?
1: 32 bit MCU/MPU/Platform - Little endian
Pretty much the same example:
uint8_t arr[] = {0x1,0x2,0x3,0x4,0x5, 0x6, 0x7, 0x8}; //again assume &arr[0] == 0x0
uint32_t *ui32 = (uint32_t*)&arr[1];
What is the value of *ui32?
I know that 32bit variables should reside in address multiple of 4.
Where I can find specification on this?

Language Lawyering
Your code contains undefined behavior and is non-portable. For example, on some UNIX workstations I’ve programmed on, memory accesses must be aligned to the size of the operand, so most but not all of the time, attempting to dereference (uint32_t*)&arr[1] would crash the program with SIGBUS, a hardware error caused by the memory bus. The compiler allows you to shoot yourself in the foot like that. Casting a pointer like you did violates the strict aliasing rules of C, which causes undefined behavior.
You can get around this issue by writing uint32_t x; memcpy( &x, &array[1], sizeof(x) ), which the standard explicitly allows. From this point on, I’ll be assuming you’re doing the equivalent of this. If you were not using an offset into the array, you could also type=pun with fields of a union in C (although the rules are different in C++).
By the standard, the elements of an array must be stored contiguously, with no padding between them. A memcpy() between some object x and an array of unsigned char[sizeof(x)] is legal, and the result is called its object representation.
Copying arbitrary bits to the object representation of any of the exact-width types in <stdint.h> with memcpy() is unspecified behavior, not undefined behavior. It is a well-formed program, and you will get some valid uint32_t out of it, even though the language standard does not say what that has to be. You aren’t giving the compiler permission to do whatever it wants, such as Kill All Humans. This is only because the standard does not permit the exact-width integral types to have any bits other than value bits, and therefore, they cannot have trap representations, invalid bit patterns that cause undefined behavior if copied into a value of that type. (The example in the standard is an implementation that stores a parity bit in every word.)
However, the other side of that guarantee is that the types uint8_t and uint32_t are not guaranteed to exist, and there have been a few architectures in the real world for which conforming versions of them could never exist. (However, unsigned char array[sizeof(uint_least32_t) + 1] is guaranteed to work.)
Tl;dr
A real-world little-endian implementation on which that code runs correctly would probably tell you that *u32 is 0x05040302. Otherwise, we would call it something other than little-endian. However, some compilers put the onus on the programmer to follow the strict-aliasing rules carefully. They are known to produce optimized code that doesn’t do what you expect if you write through either pointer.

1: 8 bit MCU/MPU/Platform - Little endian
uint8_t arr[5] = {0x1,0x2,0x3,0x4,0x5};//assume &arr[0] == 0x0
uint32_t *ui32 = (uint32_t*)&arr[1];
What is the value of *ui32?
C explicitly declares the effect of reading the value of *ui32 to be undefined in that case, on account of reading the value of an object (part of arr) via an lvalue of a different type.
0x2030405?
It is by no means guaranteed, yet not so uncommon in practice, that the value obtained by reading *ui32 would be that of interpreting the bit pattern comprising elements 1 - 4 of arr as that of a uint32_t, but what number that represents is unspecified. It is left to implementations to determine how to map physical bytes to logical ones.
However, if by "little-endian" you mean that the C implementation's uint32_t is represented by a four-8-bit-byte sequence in least-significant to most-significant order, and if you suppose that dereferencing the pointer indeed does successfully interpret the pointed-to bit pattern as that of a uint32_t, then the resulting value would be the same as that represented by the integer constant 0x05040302u.
Is it necessary uint32_t
variable to be placed to an address multiple of 4 at this platform?
You have not specified a platform, nor even a particularly narrow class of platforms. I would generally expect an 8-bit platform not to require 4-byte alignment for objects of type uint32_t, but C does not specify, and platforms and implementations may vary.
1: 32 bit MCU/MPU/Platform - Little endian
Pretty much the same example:
Exactly the same answer, except that it is more likely -- but by no means certain -- that 4-byte alignment would be required for objects of type uint32_t.
I know that 32bit variables should reside in address multiple of 4.
Not necessarily. Some 32-bit platforms indeed do require it; some do not require it, but offer faster access for aligned objects; and some don't care at all.
Where I can find specification on this?
Such details of your C implementation of interest as are available at all would be found in that implementation's documentation. The underlying system's ABI and / or hardware documentation might serve as a secondary source.
Overall, however, the best recommendation is usually to avoid such questions altogether. Avoiding unspecified, implentation-defined, and especially undefined behaviors would allow you to rely wholly on the C standard to predict the behavior of your program.

8 bit MCU/MPU/Platform - Little endian
The answer will assume the platform, somehow, support longer integers even if the CPU might not, and that they are little-endian.
Note that, if a uC is truly 8-bit and has no notion of longer integers, then it does not make much sense to talk about its (byte) endianness. We could say, for instance, that it is both little-endian and big-endian (or that it is not any of those).
//assume &arr[0] == 0x0
This may be hinting that this is coming from some exercise about misaligned accesses.
What is the value of *ui32? 0x2030405? Is it necessary uint32_t variable to be placed to an address multiple of 4 at this platform?
It depends on the platform and on the options of the compiler (e.g. if the compiler is assuming strict aliasing, then this is undefined behaviour to begin with).
However, since this is a 8-bit platform (and assuming you tell the compiler to do what you seem to want to do), a fair guess is that uint32_t has to be supported in software and that unaligned accesses are not a problem. Assuming the integer is kept in memory as little-endian (as explained above) by that software implementation, then yes, a good guess would be 0x05040302.
32 bit MCU/MPU/Platform - Little endian
What is the value of *ui32?
Again, in this case, it would depend on the platform/compiler. In some of them, there wouldn't be any value even, since the CPU would trap when you try to read such an address (since &arr[0] == 0, ui32 == 1 which is unaligned to e.g. 4).
I know that 32bit variables should reside in address multiple of 4.
Typically, but depends on the platform. Also, even if a platform supports unaligned accesses, it may be the case that it is slower than aligned accesses (so you want them aligned anyhow).
Where I can find specification on this?
On top of the C specification, you would need to check your compiler's documentation and your architecture's manuals.

Using bit fields for AVR ports

I'd like to be able to use something like this to make access to my ports clearer:
typedef struct {
unsigned rfid_en: 1;
unsigned lcd_en: 1;
unsigned lcd_rs: 1;
unsigned lcd_color: 3;
unsigned unused: 2;
} portc_t;
extern volatile portc_t *portc;
But is it safe? It works for me, but...
1) Is there a chance of race conditions?
2) Does gcc generate read-modify-write cycles for code that modifies a single field?
3) Is there a safe way to update multiple fields?
4) Is the bit packing and order guaranteed? (I don't care about portability in this case, so gcc-specific options to make it Do What I Mean are fine.)

Handling race conditions must be done by operating system level calls (which will indeed use read-modify-writes), GCC won't do that.
Idem., and no GCC does not generate read-modify-write instructions for volatile. However, a CPU will normally do the write atomically (simply because it's one instruction). This holds true if the bit-field stays within an int for example, but this is CPU/implementation dependent; I mean some may guarantee this up to 8-byte value, while other only up to 4-byte values. So under that condition, bits can't be mixed up (i.e. a few written from one thread, and others from another thread won't occur).
The only way to set multiple fields at the same time, is to set these values in an intermediate variable, and then assign this variable to the volatile.
The C standard specifies that bits are packed together (it seems that there might be exceptions when you start mixing types, but I've never seen that; everyone always uses unsigned ...).
Note: Defining something volatile does not cause a compiler to generate read-modify-writes. What volatile does is telling the compiler that an assignment to that pointer/address must always be made, and may not be optimised away.
Here's another post about the same subject matter. I found there to be quite a few other places where you can find more details.

The keyword volatile has nothing to do with race conditions, or what thread is accessing code. The keyword tells the compiler not to cache the value in registers. It tells the compiler to generate code so that every access goes to the location allocated to the variable, because each access may see a different value. This is the case with memory mapped peripherals. This doesn't help if your MPU has it's own cache. There are usually special instructions or un-cached areas of the memory map to ensure the location, and not a cached copy, is read.
As for being thread safe, just remember that even a memory access may not be thread safe is it is done in two instructions. E.g. in 8051 assembler, you have to get a 16 bit value one byte at a time. The instruction sequence can be interrupted by an IRQ or another thread and the second byte read or written, potentially corrupted.

Read and Write atomic operation implementation in the Linux Kernel

Recently I've peeked into the Linux kernel implementation of an atomic read and write and a few questions came up.
First the relevant code from the ia64 architecture:
typedef struct {
int counter;
} atomic_t;
#define atomic_read(v) (*(volatile int *)&(v)->counter)
#define atomic64_read(v) (*(volatile long *)&(v)->counter)
#define atomic_set(v,i) (((v)->counter) = (i))
#define atomic64_set(v,i) (((v)->counter) = (i))
For both read and write operations, it seems that the direct approach was taken to read from or write to the variable. Unless there is another trick somewhere, I do not understand what guarantees exist that this operation will be atomic in the assembly domain. I guess an obvious answer will be that such an operation translates to one assembly opcode, but even so, how is that guaranteed when taking into account the different memory cache levels (or other optimizations)?
On the read macros, the volatile type is used in a casting trick. Anyone has a clue how this affects the atomicity here? (Note that it is not used in the write operation)

I think you are misunderstanding the (very much vague) usage of the word "atomic" and "volatile" here. Atomic only really means that the words will be read or written atomically (in one step, and guaranteeing that the contents of this memory position will always be one write or the other, and not something in between). And the volatile keyword tells the compiler to never assume the data in that location due to an earlier read/write (basically, never optimize away the read).
What the words "atomic" and "volatile" do NOT mean here is that there's any form of memory synchronization. Neither implies ANY read/write barriers or fences. Nothing is guaranteed with regards to memory and cache coherence. These functions are basically atomic only at the software level, and the hardware can optimize/lie however it deems fit.
Now as to why simply reading is enough: the memory models for each architecture are different. Many architectures can guarantee atomic reads or writes for data aligned to a certain byte offset, or x words in length, etc. and vary from CPU to CPU. The Linux kernel contains many defines for the different architectures that let it do without any atomic calls (CMPXCHG, basically) on platforms that guarantee (sometimes even only in practice even if in reality their spec says the don't actually guarantee) atomic reads/writes.
As for the volatile, while there is no need for it in general unless you're accessing memory-mapped IO, it all depends on when/where/why the atomic_read and atomic_write macros are being called. Many compilers will (though it is not set in the C spec) generate memory barriers/fences for volatile variables (GCC, off the top of my head, is one. MSVC does for sure.). While this would normally mean that all reads/writes to this variable are now officially exempt from just about any compiler optimizations, in this case by creating a "virtual" volatile variable only this particular instance of a read/write is off-limits for optimization and re-ordering.

The reads are atomic on most major architectures, so long as they are aligned to a multiple of their size (and aren't bigger than the read size of a give type), see the Intel Architecture manuals. Writes on the other hand many be different, Intel states that under x86, single byte write and aligned writes may be atomic, under IPF (IA64), everything use acquire and release semantics, which would make it guaranteed atomic, see this.
the volatile prevents the compiler from caching the value locally, forcing it to be retrieve where ever there is access to it.

If you write for a specific architecture, you can make assumptions specific to it.
I guess IA-64 does compile these things to a single instruction.
The cache shouldn't be an issue, unless the counter crosses a cache line boundry. But if 4/8 byte alignment is required, this can't happen.
A "real" atomic instruction is required when a machine instruction translates into two memory accesses. This is the case for increments (read, increment, write) or compare&swap.
volatile affects the optimizations the compiler can do.
For example, it prevents the compiler from converting multiple reads into one read.
But on the machine instruction level, it does nothing.

Does a CPU assigns a value atomically to memory?

A quick question I've been wondering about for some time; Does the CPU assign values atomically, or, is it bit by bit (say for example a 32bit integer).
If it's bit by bit, could another thread accessing this exact location get a "part" of the to-be-assigned value?
Think of this:
I have two threads and one shared "unsigned int" variable (call it "g_uiVal").
Both threads loop.
On is printing "g_uiVal" with printf("%u\n", g_uiVal).
The second just increase this number.
Will the printing thread ever print something that is totally not or part of "g_uiVal"'s value?
In code:
unsigned int g_uiVal;
void thread_writer()
{
g_uiVal++;
}
void thread_reader()
{
while(1)
printf("%u\n", g_uiVal);
}

Depends on the bus widths of the CPU and memory. In a PC context, with anything other than a really ancient CPU, accesses of up to 32 bit accesses are atomic; 64-bit accesses may or may not be. In the embedded space, many (most?) CPUs are 32 bits wide and there is no provision for anything wider, so your int64_t is guaranteed to be non-atomic.

I believe the only correct answer is "it depends". On what you may ask?
Well for starters which CPU. But also some CPUs are atomic for writing word width values, but only when aligned. It really is not something you can guarantee at a C language level.
Many compilers offer "intrinsics" to emit correct atomic operations. These are extensions which act like functions, but emit the correct code for your target architecture to get the needed atomic operations. For example: http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html

You said "bit-by-bit" in your question. I don't think any architecture does operations a bit at a time, except with some specialized serial protocol busses. Standard memory read/writes are done with 8, 16, 32, or 64 bits of granularity. So it is POSSIBLE the operation in your example is atomic.
However, the answer is heavily platform dependent.
It depends on the CPU's capabilities.
Can the hardware do an atomic 32-bit
operation? Here's a hint: If the
variable you are working on is larger
than the native register size (e.g.
64-bit int on a 32-bit system), it's
definitely NOT atomic.
It depends on how the compiler
generates the machine code. It could
have turned your 32-bit variable
access into 4x 8-bit memory reads.
It gets tricky if the address of what
you are accessing is not aligned
across a machine's natural word
boundary. You can hit a a cache
fault or page fault.
It is VERY POSSIBLE that you would see a corrupt or unexpected value using the code example that you posted.
Your platform probably provides some method of doing atomic operations. In the case of a Windows platform, it is via the Interlocked functions. In the case of Linux/Unix, look at the atomic_t type.

To add to what has been said so far - another potential concern is caching. CPUs tend to work with the local (on die) memory cache which may or may not be immediately flushed back to the main memory. If the box has more than one CPU, it is possible that another CPU will not see the changes for some time after the modifying CPU made them - unless there is some synchronization command informing all CPUs that they should synchronize their on-die caches. As you can imagine such synchronization can considerably slow the processing down.

Don't forget that the compiler assumes single-thread when optimizing, and this whole thing could just go away.

POSIX defines the special type sig_atomic_t which guarentees that writes to it are atomic with respect to signals, which will make it also atomic from the point of view of other threads like you want. They don't specifically define an atomic cross-thread type like this, since thread communication is expected to be mediated by mutexes or other sychronization primitives.

Considering modern microprocessors (and ignoring microcontrollers), the 32-bit assignment is atomic, not bit-by-bit.
However, now completely off of your question's topic... the printing thread could still print something that is not expected because of the lack of synchronization in this example, of course, due to instruction reordering and multiple cores each with their own copy of g_uiVal in their caches.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight