Creating a counter with atomic_fetch_add_explicit - c

#include <stdatomic.h>
void request_number(request_t *request)
{
static atomic_int counter;
request->id = atomic_fetch_add_explicit(&counter, 1, memory_order_relaxed);
printf("Request assigned ID %u\n" request->id);
}
In the above C snippet, I believe it's ok to use memory_order_relaxed, because even without memory fences the compiler will not re-arrange the call to printf and fetch of request->id, before the store of the value for request->id.
Is this correct? I'm fairly certain it is but wanted to be absolutely sure in case there were other things that need to be taken into account with atomics.

You are doing only one atomic operation and when you return from it you have your value. Everything else is done with the "normal" memory model as it would be for sequential code, like it always has been.
The ; at the end of the assignment is a sequence point. So your approach is perfectly fine. Indeed, the only thing that you need from your atomic operation here is that it is undivided, you don't need the sequencing guarantees of the "normal" atomic operations.

Related

Global variable reads inside tight loops in C

Say I have a tight loop in C, within which I use the value of a global variable to do some arithmetics, e.g.
double c;
// ... initialize c somehow ...
double f(double*a, int n) {
double sum = 0.0;
int i;
for (i = 0; i < n; i++) {
sum += a[i]*c;
}
return sum;
}
with c the global variable. Is c "read anew from global scope" in each loop iteration? After all, it could've been changed by some other thread executing some other function, right? Hence would the code be faster by taking a local (function stack) copy of c prior to the loop and only use this copy?
double f(double*a, int n) {
double sum = 0.0;
int i;
double c_cp = c;
for (i = 0; i < n; i++) {
sum += a[i]*c_cp;
}
return sum;
}
Though I haven't specified how c is initialized, let's assume it's done in some way such that the value is unknown at compile time. Also, c is really a constant throughout runtime, i.e. I as the programmer knows that its value won't change. Can I let the compiler in on this information, e.g. using static double c in the global scope? Does this change the a[i]*c vs. a[i]*c_cp question?
My own research
Reading e.g. the "Global variables" section of this, it seems clear that taking a local copy of the global variable is the way to go. However, they want to update the value of the global variable, whereas I only ever want to read its value.
Using godbolt I fail to notice any real difference in the assembly for both c vs. c_cp and double c vs. static double c.
Any decently smart compiler will optimize your code so it will behave as your second code snippet. Using static won't change much, but if you want to ensure read on each iteration then use volatile.
Great point there about changes from a different thread. Compiler will maintain integrity of your code as far as single-threaded execution goes. That means that it can reorder your code, skip something, add something -- as long as the end result is still the same.
With multiple threads it is your job to ensure that things still happen in a specific order, not just that the end result is right. The way to ensure that are memory barriers. It's a fun topic to read, but one that is best avoided unless you're an expert.
Once everything translated to machine code, you will get no difference whatsoever. If c is global, any access to c will reference the address of c or most probably, in a tight loop c will be kept in a register, or in the worst case the L1 cache.
On a Linux machine you can easily generate the assembly and examine the resultant code.
You can also run benchmarks.

Unpermitted operand to operator '++' [MISRA 2012 Rule 10.1, required]

I am trying to fix the Misra warning for the modules written by others. I observed that ++ operation is being used on the enum.
I referred SE question which talks on the same topic. How do I resolve this error? Do I need to suggest the module owner, to change the implementation?
#include <stdio.h>
typedef enum
{
COMPARE = 0,
INCONSISTENT = 10,
WRITE,
READ,
FINISHED
}TestsType;
static TestsType CurrentTest;
void fun1(void)
{
if(READ != CurrentTest)
{
CurrentTest++;
}
else
{
CurrentTest = FINISHED;
}
}
int main(void) {
// your code goes here
CurrentTest = COMPARE;
fun1();
printf("%d", CurrentTest);
return 0;
}
I kept the enum like this in code purposefully to understand any impact. However, in actual code, it is as below.
typedef enum
{
COMPARE,
INCONSISTENT,
WRITE,
READ,
FINISHED
}TestsType;
Incrementing an enum is just wrong!
enums were added to the language as a better alternative to #define for a number of constants, and were considered ints in other respects (i.e. a const array of ints). To enforce anything more would require run-time checking.
As enum values don't have to be contiguous, incrementing them makes no sense when they're treated as integers. If a compiler does allow it, it thinks it's incrementing an int, which can mean your value doesn't correspond to any value in the enum afterwards.
So my advice is "don't do it" even if a particular compiler lets you. Rewrite it to something explicit.
If you want to cycle through a particular range of states represented by contiguous integers, you CAN use an enum but only if you make its values contiguous too. Put lots of warnings about the definition explaining not to tinker. Then increment an int representing the state, which can then be compared to the enum safely.
The whole point of using a standard like MISRA is to avoid risky code. And there's no question but that incrementing enums is risky.
If you've got some code that increments enums, and it works well (under all conditions), it's only because of a number of interlocked assumptions and conventions which probably aren't all written down and which almost certainly won't be obvious to (and honored by) a later maintenance programmer.
So, indeed, there is no simple fix for this. Any simple fix (which might get your MISRA checker to shut up) will likely leave the inherent risks in the practice all intact -- that is, you might satisfy the letter of MISRA, but not the spirit (which is obviously backwards).
So yes, you should require (not just suggest) that the module owner change the implementation.
What might the revised implementation look like? I think it should have one or more of the following aspects:
Use an int and some #defined constants.
Have a separate, encapsulated function to map from one state to the next.
Use an explicit transition table to map one state to the next.
If there is a large number of states, and if most of them follow in sequence, such that a +1 increment would nicely encapsulate this (more cleanly and reliably than a bunch of arbitrary state transitions), go ahead and use a +1 increment, but with some accompanying assertions to ensure that the various assumptions hold. For example:
enum state {
OFF = 0,
LOW = 3,
MEDIUM,
HIGH,
EXCEPTIONAL = 10
};
/* States LOW..HIGH are assumed to be contiguous. Make sure you keep them so! */
/* If (and only if) you add or subtract states to the contiguous list, */
/* make sure to also update N_CONTIGUOUS_STATES. */
#define N_CONTIGUOUS_STATES 3
enum state nextstate(enum state oldstate)
{
/* Normally performing arithmetic on enums is wrong. */
/* We're doing so here in a careful, controlled, constrained way, */
/* limited just to the values LOW..HIGH which we're calling "contiguous". */
assert((int)LOW + N_CONTIGUOUS_STATES - 1 == (int)HIGH);
if(oldstate >= LOW && oldstate < HIGH) {
return (enum state)((int)oldstate + 1);
} else {
/* perform arbitrary mappings between other states */
}
}
The intent here is both to document what's going on, and ensure that if a later maintenance programmer changes the enum definition in any way that breaks the assumption that there are some consecutive states between which straight incrementation is allowed, the assertion will fail.
...But I hasten to add that this is not a complete solution. An even more important guarantee to preserve is that every state transition is handled, and this is even easier to violate if a later maintenance programmer adds new states but forgets to update the transition mappings. One good way to have the compiler help you guarantee this is to use a switch statement, although this then just about forces you to make every transition explicit (that is, not to use the +1 shortcut):
enum state nextstate(enum state oldstate)
{
switch(oldstate) {
case OFF: return ... ;
case LOW: return MEDIUM;
case MEDIUM: return HIGH;
case HIGH: return ... ;
case EXCEPTIONAL: return ... ;
}
}
The advantage of using a switch is that modern compilers will warn you if you leave an enum value out of a switch like this.

Is array value as loop stop condition read every time?

Example:
for (int i = 0; i < a[index]; i++) {
// do stuff
}
Would a[index] be read every time? If no, what if someone wanted to change the value at a[index] in the loop? I've never seen it myself, but does the compiler make such an assumption?
If the condition was instead i < val-2, would it be evaluated every time?
The compiler will perform optimizations normally when the system is not impacted by other parts of the program. So if you make changes inside the for loop on the condition parameter, the compiler will not optimize.
As mentioned, the compiler should read the array and check it before each iteration in your code snippet. You can optimize your code as follows, then it will read the array only once for loop condition checking.
int cond = a[index];
for (int i = 0; i < cond; i++) {
// do stuff
}
well, maybe.
A standards compliant compiler will produce code that behaves as-if it
is read every time.
If index and/or array are of storage class volatile the they will be re-evaluated every time.
If they are not and the loops content doesn't use them in a way that can be expected to modify their value the optimiser may decide to use a cached result instead.
Co does not store results of expressions in temporary variables. So, all expressions re evaluated in-place. Note that any for loop can be changed to a while loop:
for ( def_or_expr1 ; expr2 ; expr3 ) {
...
}
becomes:
def_or_expr1;
while ( expr2 ) {
...
cont:
expr3;
}
Update: continue in the for loop would be the same as goto cont; int the while loop above. I.e. expr3 is evaluated for every iteration.
The compiler can bascially apply any optimization it can proof not to change the program's essence. Describing full details would be too far for this, but in general, it can (and will) optimize:
a[index] is not changed in the loop: read once before loop and keep in a temp (e.g. register).
a[index] is changed in the loop: update the temp (register) with the new value, avoiding memory access (and the index calculations).
For this, the compiler must assume the array is not changed outside the visible control flow. This is typically the file being compiled (with all included files). For modern systems using link time optimization (LTO), this can be the whole final program - minus dynamic libraries.
Note this is a very brief description. Actually, the C standard defines pretty clear how a program has to be executed, so what/how the compiler may optimize.
If the array is changed, for example by an interrupt handler or another thread, things become complicated. Depending on your target, you need from volatile, atomic operations (stdatomic.h, since C11) up to thread locks/mutexes/semapores/etc. to control accesses to the share resource.

How to prevent the compiler from optimizing memory access to benchmark read() vs mmap() performance?

I would like to benchmark read() vs mmap() performance of a C program reading 10GB of data. If I have read or mmap'ed the data to a buffer, what should be done in order to make sure the data was actually read?
At the moment I use the following function after each single read() and after the one mmap() operation to make sure data is actually in memory:
void use_data(void *data, size_t length) {
volatile int c = 0;
for (size_t i = 0; i < length; i++) {
c += *((char *) data + i);
}
}
However, I feel this might even introduce overhead? Maybe one can even distinguish between read() and mmap():
In the read() case I think no explicit data access is needed, because the read() call will copy the data to a buffer anyway. In the case of mmap() however, I think some kind of summing up/counting need to be performed in order to make the kernel load every page.
Any recommendations?
You don't need to access the volatile variable for each byte you process. Sum all bytes into a local. Then, write the sum into a volatile variable.
In fact you don't need a volatile variable. You can use any opaque sink that the compiler cannot prove as unneeded. Writing the sum to a temp file would be guaranteed to work as well.
Note, that this is not just a hack to make the compiler cooperate. This is guaranteed to touch every byte (because it could influence the result). The result is needed for an external IO. This cannot be optimized away under the standard.
If alignment allows, sum in bigger units such as 32 or 64 bits. Use unsigned types to avoid UB on overflow. You want to be memory/IO bound, not ALU bound. You can create instruction-level parallelism by summing multiple independent streams using multiple local accumulator variables.

atomic compare(not equal) and swap

I want to use atomic compare and swap, but instead of equal to, I want to swap only if the memory location is not equal to the old value. Is it possible in C?
How about this:
void compare_and_swap_if_not_equal(word_t const required_non_value, word_t const new_value, word_t* ptr_to_shared_variable) {
for (;;) {
word_t const snapshot_value = *ptr_to_shared_variable;
if (required_non_value == snapshot_value) {
break;
// or (sleep and) 'continue;', if you want to wait for the stored value to be different
// -- but you might of course miss a transient change to a different state and back.
} else {
if (compare_and_swap_if_equal(ptr_to_shared_variable, snapshot_value, new_value)) {
// we know the stored value was different; if this 'theory' still matches reality: swap! done!
break;
}
}
}
}
Untested. Uncompiled. The 'const' used because I like it that way :). 'word_t' is a type placeholder, I don't know what the real type should be. And I don't know how what 'compare_and_swap_if_equal' is called in stdatomic.h.
(added) atomic_compare_exchange_weak() is the ticket. For some reason, it takes a pointer to the 'expected' argument, so you'll have to modify above code to
if (atomic_compare_exchange_weak(ptr_to_shared_variable, &snapshot_value, new_value)) ...
The 'weak' version should work in the above code; returning 'false' spuriously will only add another trip around the loop. Still uncompiled, untested; don't rely on this code at home.
It depends on your architecture, but in general it is not possible to do this in C.
Typically compare and swap is implemented with an instruction that atomically loads from a location in memory and stores a value to that location if the location in memory matches some existing value that you specify.
At least on x86 there is no provision for only doing this load if the values don't match. Also it's not clear why you would want to do something like that. Perhaps another architecture would support something like this, but that would be architecture dependent, not something that could be done in C in a portable way.

Resources