How to implement TAS ("test and set")? - c

Did anyone know the implementation of TAS?
This code should later be able to protect the code between acquiring and release from multithread based loss of states.
typedef volatile long lock_t;
void acquire(lock_t* lock){
while(TAS(lock))
;
}
void release(lock_t* lock){
*lock = 0;
}
lock_t REQ_lock = 0;
int main(){
acquire(&REQ_lock);
//not Atomar Code
release(&REQ_lock);
}

The C11 standard brings atomic types into the language.
Among them is the atomic_flag type, which has an associated function atomic_flag_test_and_set
Here it is from the standard:
C11 working draft section 7.17.8.1:
Synopsis
#include <stdatomic.h>
_Bool atomic_flag_test_and_set(
volatile atomic_flag *object);
_Bool atomic_flag_test_and_set_explicit(
volatile atomic_flag *object,memory_order order);
Description
Atomically sets the value pointed to by
object
to true. Memory is affected according
to the value of
order
. These operations are atomic read-modify-write operations
Returns
Atomically, the value of the object immediately before the effects.
And along with it, you'll want its sister operation, atomic_flag_clear
Section 7.17.8.2:
Synopsis
#include <stdatomic.h>
void atomic_flag_clear(volatile atomic_flag *object);
void atomic_flag_clear_explicit(
volatile atomic_flag *object, memory_order order);
Description
The order argument shall not be memory_order_acquire nor memory_order_acq_rel. Atomically sets the value pointed to by object to false.
Memory is affected according to the value of order.
Returns
The atomic_flag_clear functions return no value

Yeah, no. One of the virtues of C is to bring you as close to the machine-language boundary as possible, but no further. This is further.
You could go for atomic support, which is provided in some C augmentation-standard, but is a nest of problems.
Write TAS in assembly. Restrict the visibility of the variables used for TAS, for containment. For any given architecture, it should be no more than a handful of lines of assembly. Your program will have contained its dependency, and the lack of containment is amongst the critical flaw in the last decade of C augmentations.

Related

Questions regarding (non-)volatile and optimizing compilers

I have the following C code:
/* the memory entry points to can be changed from another thread but
* it is not declared volatile */
struct myentry *entry;
bool isready(void)
{
return entry->status == 1;
}
bool isready2(int idx)
{
struct myentry *x = entry + idx;
return x->status == 1;
}
int main(void) {
/* busy loop */
while (!isready())
;
while (!isready2(5))
;
}
As I note in the comment, entry is not declared as volatile even though the array it points to can be changed from another thread (or actually even directly from kernel space).
Is the above code incorrect / unsafe? My thinking is that no optimization could be performed in the bodies of isready, isready2 and since I repeatedly perform function calls from within main the appropriate memory location should be read on every call.
On the other hand, the compiler could inline these functions. Is it possible that it does it in a way that results in a single read happening (hence causing an infinite loop) instead of multiple reads (even if these reads come from a load/store buffer)?
And a second question. Is it possible to prevent the compiler from doing optimizations by casting to volatile only in certain places like that?
void func(void)
{
entry->status = 1;
while (((volatile struct myentry *) entry)->status != 2)
;
}
Thanks.
If the memory entry points to can be modified by another thread, then the program has a data race and therefore the behaviour is undefined . This is still true even if volatile is used.
To have a variable accessed concurrently by multiple threads, in ISO C11, it must either be an atomic type, or protected by correct synchronization.
If using an older standard revision then there are no guarantees provided by the Standard about multithreading so you are at the mercy of any idiosyncratic behaviour of your compiler.
If using POSIX threads, there are no portable atomic operations, but it does define synchronization primitives.
See also:
Why is volatile not considered useful in multithreaded C or C++ programming?
The second question is a bugbear, I would suggest not doing it because different compilers may interpret the meaning differently, and the behaviour is still formally undefined either way.

SPSC thread safe with fences

I just want my code as simple as possible and thread safe.
With C11 atomics
Regarding part "7.17.4 Fences" of the ISO/IEC 9899/201X draft
X and Y , both operating on some atomic object M, such that A is
sequenced before X, X modifies M, Y is sequenced before B, and Y reads
the value written by X or a value written by any side effect in the
hypothetical release sequence X would head if it were a release
operation.
Is this code thread safe (with "w_i" as "object M") ?
Are "w_i" and "r_i" need both to be declared as _Atomic ?
If only w_i is _Atomic, can the main thread keep an old value of r_i in cache and consider the queue as not full (while it's full) and write data ?
What's going on if I read an atomic without atomic_load ?
I have made some tests but all of my attempts seems to give the right results.
However, I know that my tests are not really correct regarding multithread : I run my program several times and look at the result.
Even if neither w_i not r_i are declared as _Atomic, my program work, but only fences are not sufficient regarding C11 standard, right ?
typedef int rbuff_data_t;
struct rbuf {
rbuff_data_t * buf;
unsigned int bufmask;
_Atomic unsigned int w_i;
_Atomic unsigned int r_i;
};
typedef struct rbuf rbuf_t;
static inline int
thrd_tryenq(struct rbuf * queue, rbuff_data_t val) {
size_t next_w_i;
next_w_i = (queue->w_i + 1) & queue->bufmask;
/* if ring full */
if (atomic_load(&queue->r_i) == next_w_i) {
return 1;
}
queue->buf[queue->w_i] = val;
atomic_thread_fence(memory_order_release);
atomic_store(&queue->w_i, next_w_i);
return 0;
}
static inline int
thrd_trydeq(struct rbuf * queue, rbuff_data_t * val) {
size_t next_r_i;
/*if ring empty*/
if (queue->r_i == atomic_load(&queue->w_i)) {
return 1;
}
next_r_i = (queue->r_i + 1) & queue->bufmask;
atomic_thread_fence(memory_order_acquire);
*val = queue->buf[queue->r_i];
atomic_store(&queue->r_i, next_r_i);
return 0;
}
I call theses functions as follow :
Main thread enqueue some data :
while (thrd_tryenq(thrd_get_queue(&tinfo[tnum]), i)) {
usleep(10);
continue;
}
Others threads dequeue data :
static void *
thrd_work(void *arg) {
struct thrd_info *tinfo = arg;
int elt;
atomic_init(&tinfo->alive, true);
/* busy waiting when queue empty */
while (atomic_load(&tinfo->alive)) {
if (thrd_trydeq(&tinfo->queue, &elt)) {
sched_yield();
continue;
}
printf("Thread %zu deq %d\n",
tinfo->thrd_num, elt);
}
pthread_exit(NULL);
}
With asm fences
Regarding a specific platform x86 with lfence and sfence,
If I remove all C11 code and just replace fences by
asm volatile ("sfence" ::: "memory");
and
asm volatile ("lfence" ::: "memory");
(My understanding of these macro is : compiler fence to prevent memory access to be reoganized/optimized + hardware fence)
do my variables need to be declared as volatile for instance ?
I have already seen this ring buffer code above with only these asm fences but with no atomic types and I was really surprised, I want to know if this code was correct.
I just reply regarding C11 atomics, platform specifics are too complicated and should be phased out.
Synchronization between threads in C11 is only guaranteed through some system calls (e.g for mtx_t) and atomics. Don't even try to do it without.
That said, sychronization works via atomics, that is visibility of side effects is guaranteed to propagate via the visibility of effects on atomics. E.g for the simplest consistency model, sequential, whenever thread T2 sees a modification thread T1 has effected on an atomic variable A, all side effects before that modication in thread T1 are visible to T2.
So not all your shared variables need to be atomic, you only must ensure that your state is properly propagated via an atomic. In that sense fences buy you nothing when you use sequential or acquire-release consistency, they only complicate the picture.
Some more general remarks:
Since you seem to use the sequential consistency model, which is the
default, the functional writing of atomic operations (e.g
atomic_load) is superfluous. Just evaluating the atomic variable is
exactly the same.
I have the impression that you are attempting optimization much too
early in your development. I think you should do an implementation
for which you can prove correctness, first. Then, if and only if
you notice a performance problem, you should start to think about
optimization. It is very unlikely that such an atomic data structure
is a real bottleneck for your applcation. You'd have to have a very
large number of threads that all simultaneously hammer on your poor
little atomic variable, to see a measurable bottleneck here.

Why there shall be no more than one read access with volatile-qualified type within one sequence point?

Given the following code:
static volatile float32_t tst_mtr_dutycycle;
static volatile uint8_t tst_mtr_direction;
static volatile uint32_t tst_mtr_update;
void TST_MTR_Task(void)
{
if (tst_mtr_update == 1U)
{
tst_mtr_update = 0;
MTR_SetDC(tst_mtr_dutycycle, tst_mtr_direction);
}
}
I found problems with MISRA C 2012 Rule-13.2 and I decided to make some research. I found here (http://archive.redlizards.com/docs/misrac2012-datasheet.pdf) that:
there shall be no more than one read access with volatile-qualified type within one sequence point
The thing here is that I haven't been able to find an example or explanation that makes clear why there shall be no more than one read access with volatile-qualified type within one sequence point.
I need to find a solution for the violating code but is not really clear to me what to do.
I know now that there shall be no more than one read access with volatile-qualified type within one sequence point. The question is, why? and I need to know why in order to implement a solution and to explain everybody here why I am changing the code.
Regards.
The justification for the rule is:
(Required) The value of an expression
and its persistent side effects
shall be the same under all permitted
evaluation orders
If more than one volatile-qualified variable is read between sequence points, then it is unspecified which is read first. Reading a volatile variable is a side effect.
The solution is to explicitly order the reads:
void TST_MTR_Task(void)
{
if (tst_mtr_update == 1U)
{
tst_mtr_update = 0;
float32_t dutycycle = tst_mtr_dutycycle;
uint8_t direction = tst_mtr_direction;
MTR_SetDC(dutycycle, direction);
}
}
There are no sequence points between fetching the arguments of a function call.
So the order they are fetched is undefined by the standard. OTOH, the compiler has to maintain the order of accesses to volatile objects, so this is a contradiction.
Fetch the variables to non-volatile temps and use those for the function call:
float32_t t1 = tst_mtr_dutycycle;
uint8_t t2 = tst_mtr_direction;
MTR_SetDC(t1, t2);
Note this is actually an issue for standard C and not just related to MISRA-compliance.
As you appear to have multiple problems regarding standard compliance, you might want to keep the standard under your pillow.

When can a volatile variable be optimized away completely?

Consider this code example:
int main(void)
{
volatile int a;
static volatile int b;
volatile int c;
c = 20;
static volatile int d;
d = 30;
volatile int e = 40;
static volatile int f = 50;
return 0;
}
Without volatile a compiler could optimize away all the variables since they are never read from.
I think a and b can be optimized away since they are completely unused, see unused volatile variable.
I think c and d can not be removed since they are written to, and writes to volatile variables must actually happen. e should be equivalent to c.
GCC does not optimize away f, but it also does not emit any instruction to write to it. 50 is set in a data section. LLVM (clang) removes f completely.
Are these statements true?
A volatile variable can be optimized away if it is never accessed.
Initialization of a static or global variable does not count as access.
Writes to volatile variables (even automatic ones) count as observable behaviour.
C11 (N1570) 5.1.2.3/6:
The least requirements on a conforming implementation are:
— Accesses to volatile objects are evaluated strictly according to the rules of the abstract
machine.
— At program termination, all data written into files shall be identical to the result that
execution of the program according to the abstract semantics would have produced.
— The input and output dynamics of interactive devices shall take place as specified in
7.21.3. The intent of these requirements is that unbuffered or line-buffered output
appear as soon as possible, to ensure that prompting messages actually appear prior to
a program waiting for input.
This is the observable behavior of the program.
The question is: does initialization (e, f) count as an "access"? As pointed out by Sander de Dycker, 6.7.3 says:
What constitutes an access to an object that has volatile-qualified type is implementation-defined.
which means it's up to the compiler whether or not e and f can be optimized away - but this must be documented!
Strictly speaking any volatile variable that is accessed (read or written to) cannot be optimized out according to the C Standard. The Standard says that an access to a volatile object may have unknown side-effects and that accesses to volatile object has to follow the rules of the C abstract machine (where all expressions are evaluated as specified by their semantics).
From the mighty Standard (emphasis mine):
(C11, 6.7.3p7) "An object that has volatile-qualified type may be modified in ways unknown to the
implementation or have other unknown side effects. Therefore any expression referring
to such an object shall be evaluated strictly according to the rules of the abstract machine,
as described in 5.1.2.3."
Following that even a simple initialization of a variable should be considered as an access. Remember the static specifier also causes the object to be initialized (to 0) and thus accessed.
Now compilers are known to behave differently with the volatile qualifier, and I guess a lot of them will just optimize out most of the volatile objects of your example program except the ones with the explicit assignment (=).

In C, how do you declare the members of a structure as volatile?

How do you declare a particular member of a struct as volatile?
Exactly the same as non-struct fields:
#include <stdio.h>
int main (int c, char *v[]) {
struct _a {
int a1;
volatile int a2;
int a3;
} a;
a.a1 = 1;
a.a2 = 2;
a.a3 = 3;
return 0;
}
You can mark the entire struct as volatile by using "volatile struct _a {...}" but the method above is for individual fields.
Should be pretty straight forward according to this article:
Finally, if you apply volatile to a
struct or union, the entire contents
of the struct/union are volatile. If
you don't want this behavior, you can
apply the volatile qualifier to the
individual members of the
struct/union.
I need to clarify volatile for C/C++ because there was a wrong answer here. I've been programming microcontroleurs since 1994 where this keyword is very useful and needed often.
volatile will never break your code, it is never risky to use it. The keyword will basically make sure the variable is not optimized by the compiler. The worst that shold happen if you overuse this keyword is that your program will be a bit bigger and slower.
Here is when you NEED this keyword for a variable :
- You have a variable that is written to inside an interrupt function.
AND
- This same variable is read or written to outside interrupt functions.
OR
If you have 2 interrupt functions of different priority that use the variable, then you should also use 'volatile'.
Otherwise, the keyword is not needed.
As for hardware registers, they should be treated as volatile even without the keyword if you don't do weird stuff in your program.
I just finished a data structure in which it was obvious where the volatile qualifier was required, but for a different reason than the ones stated above: It is simply because the struct requires a forceful locking mechanism because of (i) direct access and (ii) equivalent invocation.
Direct access deals with sustained RAM reading and writing.
Equivalent invocation deals with interchangeable method flows.
I haven't had much luck with this keyword unless the compiler knows exactly what to do about it. And that's my own personal experience. But I am interested in studying how it directly impacts a cross-platform compilation such as between a low-level system call and a back-end database.

Resources