I'm feeling a bit overwhelmed when using multiple threads in embedded programming since each and every shared resource ends up with a getter/setter protected by a mutex.
I would really like to understand if a getter of the following sort
static float static_raw;
float get_raw() {
os_mutex_get(mutex, OS_WAIT_FOREVER);
float local_raw = static_raw;
os_mutex_put(mutex);
return local_raw ;
}
makes sense or if float assignement can be considered atomic e.g. for ARM (differently from e.g 64bit variables) making this superfluous.
I can understand something like this:
raw = raw > VALUE ? raw + compensation() : raw;
where the value is handled multiple times, but what about when reading or returning it?
Can you make my mind clear?
EDIT 1:
Regarding the second question below.
let's assume we have an "heavy" function in terms of time execution let's call it
void foo(int a, int b, int c)
where a,b,c are potentially values from shared resources.
When the foo function is called should it be enveloped by a mutex, locking it for plenty of time even if it just needs a copy of the value? e.g.
os_mutex_get(mutex, OS_WAIT_FOREVER);
foo(a,b,c);
os_mutex_put(mutex);
does it make any sense to do
os_mutex_get(mutex, OS_WAIT_FOREVER);
int la = a;
int lb = b;
int lc = c;
os_mutex_put(mutex);
foo(la,lb,lc);
locking only the copy of the variable instead of the full execution?
EDIT2:
Given there could exist getter and setter for "a", "b" and "c".
Is it problematic in terms of performance/or anything else in doing something like this?
int static_a;
int get_a(int* la){
os_mutex_get(mutex, OS_WAIT_FOREVER);
*la = static_a;
os_mutex_put(mutex);
}
or
int static_b;
int get_b(){
os_mutex_get(mutex, OS_WAIT_FOREVER);
int lb = static_b;
os_mutex_put(mutex);
return lb;
}
using them as
void main(){
int la = 0;
get_a(&la);
foo(la,get_b());
}
I'm asking this because im locking and relocking on the same mutex sequentially for potential no reason.
if float assignement can be considered atomic
Nothing can be considered atomic in C unless you use C11 _Atomic or inline assembler. The underlying hardware is irrelevant, because even if a word of a certain size can be read in a single instruction on the given hardware, there is never a guarantee that a certain C instruction will only result in a single instruction.
does it make any sense to do
os_mutex_get(mutex, OS_WAIT_FOREVER);
int la = a;
int lb = b;
int lc = c;
os_mutex_put(mutex);
foo(a,b,c);
Assuming you mean foo(la,lb,lc);, then yes it makes lots of sense. This is how you should ideally use mutex: minimize the code between mutex locks so that it is just raw variable copying and nothing else.
The C standard does not dictate anything about the atomicity of the assignment operator. You cannot consider an assignment atomic, as it is completely implementation dependent.
However, in C11 the _Atomic type qualifier (C11 ยง6.7.3, page 121 here) can be used (if supported by your compiler) to declare variables to be atomically read and written, so you could for example do the following:
static _Atomic float static_raw;
float get_raw(void) {
return static_raw;
}
Don't forget to compile with -std=c11 if you do so.
Addressing your first edit:
When the foo function is called should it be enveloped by a mutex, locking it for plenty of time even if it just needs a copy of the value?
While it would be correct it certainly would not be the best solution. If the funcion only needs a copy of the variables then your second snippet is without a doubt much better, and should be the ideal solution:
os_mutex_get(mutex, OS_WAIT_FOREVER);
int la = a;
int lb = b;
int lc = c;
os_mutex_put(mutex);
foo(la,lb,lc);
If you lock the whole function, you'll block any other thread trying to acquire the lock for much longer than needed, slowing down everything. Locking before calling the function and passing copies of the values will instead only lock for the needed amount of time leaving much more free time to other threads.
Addressing your second edit:
Given there could exist getter and setter for "a", "b" and "c". Is it problematic in terms of performance/or anything else in doing something like this?
That code is correct. In terms of performance it would certainly be much better to have one mutex per variable, if you can. With only one mutex, any thread holding the mutex will "block" any other thread that is trying to lock it, even if they are trying to access a different variable.
If you cannot use multiple mutexes then it's a matter of chosing between these two options:
Lock inside the getters:
void get_a(int* la){
os_mutex_get(mutex, OS_WAIT_FOREVER);
*la = static_a;
os_mutex_put(mutex);
}
void get_b(int* lb){
os_mutex_get(mutex, OS_WAIT_FOREVER);
*lb = static_b;
os_mutex_put(mutex);
}
/* ... */
int var1, var2;
get_a(&var1);
get_b(&var2);
Lock outside the getters (leave the duty to the caller):
int get_a(void){
return static_a;
}
int get_b(void){
return static_b;
}
/* ... */
os_mutex_get(mutex, OS_WAIT_FOREVER);
int var1 = get_a();
int var2 = get_b();
os_mutex_put(mutex);
At this point you wouldn't even need to have getters and you could just do:
os_mutex_get(mutex, OS_WAIT_FOREVER);
int var1 = a;
int var2 = b;
os_mutex_put(mutex);
If your code frequently requests multiple values, then locking/unlocking outside the getters is better since it will cause less overhead. As an alternative, you could also keep the locking inside, but create different functions to retrieve multiple variables so that the mutex is only locked and released once.
On the other hand, if your code is only rarely requesting multiple values, then it's ok to keep the locking inside each getter.
It isn't possible to say what's the best solution ahead of time, you should run different tests and see what's best for your scenario.
Related
Let's suppose I want to set atomic instructions into a function.
I declared
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
as a global variable.
Instead of:
int main() {
myFoo();
...
}
void myFoo() {
pthread_mutex_lock(&mutex);
myGlobal++;
pthread_mutex_unlock(&mutex);
}
can I do:
int main() {
pthread_mutex_lock(&mutex);
myFoo();
pthread_mutex_unlock(&mutex);
...
}
void myFoo() {
myGlobal++;
}
So that every instructions in myFoo become atomic?
In first example, you are protecting myGlobal and in 2nd you are protecting myFoo. Your code works as you expect (if you call it everywhere between lock/unlock), but you need to use terms correctly or its meaning will be wrong.
No it will not be atomic, but access to myFoo will be synchronized, meaning no other thread can access that part code when a another thread is using it.
Atomic operation term normally is used showing that an instruction is run without any interruption (sometimes considered lock-free). For example, C11's atomic_flag provides such functionality. On the other hand, mutex is for creating mutual exclusion. You can protect a part of your code from simultaneous access from different threads. These 2 terms are not similar.
Side note:
Only atomic_ type that is guaranteed to be really atomic and lock-free is atomic_flag is both C and C++. Other ones such as atomic_int may be implemented using synchronization method and is not lock-free.
Your use of the term atomic is not really correct but I guess the question is more about whether the two code snippets will behave the same.
If myFoo is only called between lock/unlock, the answer is yes, they are the same.
However, in the second case you have lost protection of myFoo. Another thread could call myFoo without calling lock first which would cause problems.
So the second example is bad as it opens up for more mistakes. Stick to the first one, i.e. keep the lock/unlock inside the function.
Also notice:
Since myGlobal is a global variable, you can't make sure that the threads do not access it directly. There are several ways to avoid that. The example below shows a single function with a static variable. The function can be used to receive the static variable and do an increment if desired.
int myFoo(int doIncrement)
{
static int myStatic = 0;
int result;
pthread_mutex_lock(&mutex);
if (doIncrement) myStatic++;
result = myStatic;
pthread_mutex_unlock(&mutex);
return result;
}
Now the variable myStatic is hidden from all the threads and can only be accessed through myFoo.
int x = myFoo(1); // Increment and read
int y = myFoo(0); // Read only
The question is simple. Does/Should a variable used with multi-threads be volatile even accessed in critical section(i.e. mutex, semaphore) in C? Why / Why not?
#include <pthread.h>
volatile int account_balance;
pthread_mutex_t flag = PTHREAD_MUTEX_INITIALIZER;
void debit(int amount) {
pthread_mutex_lock(&flag);
account_balance -= amount;//Inside critical section
pthread_mutex_unlock(&flag);
}
What about the example or equivalently thinking for semaphore?
Does/Should a variable used with multi-threads be volatile even accessed in critical section(i.e. mutex, semaphore) in C? Why / Why not?
No.
volatile is logically irrelevant for concurency, because it's not sufficient.
Actually, that's not really true - volatile is not irrelevant because it can hide concurrency problems in your code, so it works "most of the time".
All volatile does is tell the compiler "this variable can change outside the current thread of execution". Volatile in no way enforces any ordering, atomicity, or - critically - visibility. Just because thread 2 on CPU A changes int x, that doesn't mean thread 1 on CPU D can even see the change at any specific time - it has it's own cached value, and volatile means almost nothing with respect to memory coherence because it doesn't guarantee ordering.
The last comment at the bottom of the Intel article Volatile: Almost Useless for Multi-Threaded Programming says it best:
If you are simply adding 'volatile' to variables that are shared
between threads thinking that fixes your shared-data problem without
bothering to understand why it may not, you will eventually reap the
reward you deserve.
Yes, lock-free code can make use of volatile. Such code is written by people who can likely write tutorials on the use of volatile, multithreaded code, and other extremely detailed subjects regarding compilers.
No, volatile should not be used on shared variables which are accessed under the protection of pthreads synchronisation functions like pthread_mutex_lock().
The reason is that the synchronisation functions themselves are guaranteed by POSIX to provide all the necessary compiler barriers and synchronisation to ensure consistency (as long as you follow the POSIX rules on concurrent access - ie. that you have used pthreads synchronisation functions to ensure that no thread can be writing to a shared variable whilst another thread is writing to or reading from it).
I have no idea why there's so much misinformation about volatile everywhere on the internet. The answer to your question is yes, you should make variables you use within a critical section volatile.
I'll give a contrived example. Let's say you want to run this function on multiple threads:
int a;
void inc_a(void) {
for (int i = 0; i < 5; ++i) {
a += 5;
}
}
Everybody, as it would seem, on this site will tell you that it's enough to put a += 5 in a critical section like so:
int a;
void inc_a(void) {
for (int i = 0; i < 5; ++i) {
enter_critical_section();
a += 5;
exit_critical_section();
}
}
As i said, it's contrived, but people will tell you this is correct, and it absolutely is not! If the compiler wasn't given prior knowledge as to what the critical section functions are, and what their semantic meaning is, there's nothing stopping the compiler from outputting this code:
int a;
void inc_a(void) {
register eax = a;
for (int i = 0; i < 5; ++i) {
enter_critical_section();
eax += 5;
exit_critical_section();
}
a = eax;
}
This code produces the same output in a single threaded context, so the compiler is allowed to do that. But in a multithreaded context, this can output anything between 25 and 25 times the thread count. One way to solve this issue is to use an atomic construct, but that has performance implications, instead what you should do is make the variable volatile. That is, unless you want to be like the rest of this community and blindly put your faith in your C compiler.
I just want my code as simple as possible and thread safe.
With C11 atomics
Regarding part "7.17.4 Fences" of the ISO/IEC 9899/201X draft
X and Y , both operating on some atomic object M, such that A is
sequenced before X, X modifies M, Y is sequenced before B, and Y reads
the value written by X or a value written by any side effect in the
hypothetical release sequence X would head if it were a release
operation.
Is this code thread safe (with "w_i" as "object M") ?
Are "w_i" and "r_i" need both to be declared as _Atomic ?
If only w_i is _Atomic, can the main thread keep an old value of r_i in cache and consider the queue as not full (while it's full) and write data ?
What's going on if I read an atomic without atomic_load ?
I have made some tests but all of my attempts seems to give the right results.
However, I know that my tests are not really correct regarding multithread : I run my program several times and look at the result.
Even if neither w_i not r_i are declared as _Atomic, my program work, but only fences are not sufficient regarding C11 standard, right ?
typedef int rbuff_data_t;
struct rbuf {
rbuff_data_t * buf;
unsigned int bufmask;
_Atomic unsigned int w_i;
_Atomic unsigned int r_i;
};
typedef struct rbuf rbuf_t;
static inline int
thrd_tryenq(struct rbuf * queue, rbuff_data_t val) {
size_t next_w_i;
next_w_i = (queue->w_i + 1) & queue->bufmask;
/* if ring full */
if (atomic_load(&queue->r_i) == next_w_i) {
return 1;
}
queue->buf[queue->w_i] = val;
atomic_thread_fence(memory_order_release);
atomic_store(&queue->w_i, next_w_i);
return 0;
}
static inline int
thrd_trydeq(struct rbuf * queue, rbuff_data_t * val) {
size_t next_r_i;
/*if ring empty*/
if (queue->r_i == atomic_load(&queue->w_i)) {
return 1;
}
next_r_i = (queue->r_i + 1) & queue->bufmask;
atomic_thread_fence(memory_order_acquire);
*val = queue->buf[queue->r_i];
atomic_store(&queue->r_i, next_r_i);
return 0;
}
I call theses functions as follow :
Main thread enqueue some data :
while (thrd_tryenq(thrd_get_queue(&tinfo[tnum]), i)) {
usleep(10);
continue;
}
Others threads dequeue data :
static void *
thrd_work(void *arg) {
struct thrd_info *tinfo = arg;
int elt;
atomic_init(&tinfo->alive, true);
/* busy waiting when queue empty */
while (atomic_load(&tinfo->alive)) {
if (thrd_trydeq(&tinfo->queue, &elt)) {
sched_yield();
continue;
}
printf("Thread %zu deq %d\n",
tinfo->thrd_num, elt);
}
pthread_exit(NULL);
}
With asm fences
Regarding a specific platform x86 with lfence and sfence,
If I remove all C11 code and just replace fences by
asm volatile ("sfence" ::: "memory");
and
asm volatile ("lfence" ::: "memory");
(My understanding of these macro is : compiler fence to prevent memory access to be reoganized/optimized + hardware fence)
do my variables need to be declared as volatile for instance ?
I have already seen this ring buffer code above with only these asm fences but with no atomic types and I was really surprised, I want to know if this code was correct.
I just reply regarding C11 atomics, platform specifics are too complicated and should be phased out.
Synchronization between threads in C11 is only guaranteed through some system calls (e.g for mtx_t) and atomics. Don't even try to do it without.
That said, sychronization works via atomics, that is visibility of side effects is guaranteed to propagate via the visibility of effects on atomics. E.g for the simplest consistency model, sequential, whenever thread T2 sees a modification thread T1 has effected on an atomic variable A, all side effects before that modication in thread T1 are visible to T2.
So not all your shared variables need to be atomic, you only must ensure that your state is properly propagated via an atomic. In that sense fences buy you nothing when you use sequential or acquire-release consistency, they only complicate the picture.
Some more general remarks:
Since you seem to use the sequential consistency model, which is the
default, the functional writing of atomic operations (e.g
atomic_load) is superfluous. Just evaluating the atomic variable is
exactly the same.
I have the impression that you are attempting optimization much too
early in your development. I think you should do an implementation
for which you can prove correctness, first. Then, if and only if
you notice a performance problem, you should start to think about
optimization. It is very unlikely that such an atomic data structure
is a real bottleneck for your applcation. You'd have to have a very
large number of threads that all simultaneously hammer on your poor
little atomic variable, to see a measurable bottleneck here.
With the following code
#include <unistd.h>
int a = getpagesize();
int main() {
return a;
}
I receive the following compilation error
3:1: error: initializer element is not constant
What is an "initializer element", and why does it need to be constant? Does that relate to the const qualifier?
The value used to initialize global variables needs to be determined at compile time. The return value of a function (in C, at least) won't be evaluated until run time. So something like:
int a = 4;
is OK, but:
int a = somefunction();
is not. In C++ you can have constexpr functions, but in C you can't.
If you must do something like this, you can always use:
int a;
int main(void) {
a = getpagesize();
/* Rest of your program */
}
Obviously you can't make your global const doing this (since you can only set the value of a const variable at initialization, and you can't initialize globals with functions). Frankly, there's probably no reason why you can't just call getpagesize() when you need it and forget about a global variable altogether - unless you call it a billion times, you won't notice the overhead. If you must have a global variable, then just don't make it const.
If immutability is an absolute requirement, and the problem is avoiding an expensive function call rather than avoiding function calls altogether, then one option is to replace it with an inexpensive function call, like so:
int poor_mans_global(void) {
static int a = -1;
if ( a == -1 ) {
a = getpagesize(); /* Only call the expensive function once */
}
return a;
}
and call poor_mans_global() instead of using your global variable. Note that this example is illustrative only, and doesn't imply that getpagesize() is an expensive function call.
A final option is to package all your code that needs access to this global into a separate translation unit, and make the global static, i.e. file scope rather than truly global. The benefits of const - which are never all that great to begin with, in C - decrease dramatically when you can exert tight control over which code gets to access that variable.
How do you declare a particular member of a struct as volatile?
Exactly the same as non-struct fields:
#include <stdio.h>
int main (int c, char *v[]) {
struct _a {
int a1;
volatile int a2;
int a3;
} a;
a.a1 = 1;
a.a2 = 2;
a.a3 = 3;
return 0;
}
You can mark the entire struct as volatile by using "volatile struct _a {...}" but the method above is for individual fields.
Should be pretty straight forward according to this article:
Finally, if you apply volatile to a
struct or union, the entire contents
of the struct/union are volatile. If
you don't want this behavior, you can
apply the volatile qualifier to the
individual members of the
struct/union.
I need to clarify volatile for C/C++ because there was a wrong answer here. I've been programming microcontroleurs since 1994 where this keyword is very useful and needed often.
volatile will never break your code, it is never risky to use it. The keyword will basically make sure the variable is not optimized by the compiler. The worst that shold happen if you overuse this keyword is that your program will be a bit bigger and slower.
Here is when you NEED this keyword for a variable :
- You have a variable that is written to inside an interrupt function.
AND
- This same variable is read or written to outside interrupt functions.
OR
If you have 2 interrupt functions of different priority that use the variable, then you should also use 'volatile'.
Otherwise, the keyword is not needed.
As for hardware registers, they should be treated as volatile even without the keyword if you don't do weird stuff in your program.
I just finished a data structure in which it was obvious where the volatile qualifier was required, but for a different reason than the ones stated above: It is simply because the struct requires a forceful locking mechanism because of (i) direct access and (ii) equivalent invocation.
Direct access deals with sustained RAM reading and writing.
Equivalent invocation deals with interchangeable method flows.
I haven't had much luck with this keyword unless the compiler knows exactly what to do about it. And that's my own personal experience. But I am interested in studying how it directly impacts a cross-platform compilation such as between a low-level system call and a back-end database.