Keep pthread variables local - c

Is there a way while using pthread.h on a Linux GCC to keep variables local to the thread-function:
int i = 42; // global instance of i
int main() {
pthread_t threads[2];
long t;
pthread_create(&threads[t], NULL, ThreadFunction, (void *) t;
pthread_create(&threads[t], NULL, ThreadFunction2, (void *) t;
}
I wonder whether there is a parameter at the POSIX function creating the new thread and keeping the variables local:
void *ThreadFunction(void *threadid)
{
int i=0;
i++; // this is a local instance of i
printf("i is %d", i); // as expected: 1
}
void *ThreadFunction2(void *threadid)
{
i += 3; // another local instance -> problem
}
Where afterwards i is 42. Even if I have defined an i previously I want this i not to be within my threads.

In gcc, you can make a global variable thread-local by using the __thread specifier:
__thread int i = 42;
Don't do that. There are better solutions, depending on you want to do.

Global variables are always available in the whole compilation unit (or even more compilation units if you use external declarations). This has nothing to do with threads, it's the default behavior of C/C++. The recommended solution is not to use globals - globals are evil. If you still need to use globals, you may want to prefix them, such as g_i. Another solution is to put your thread functions into another compilation unit (c file).

The sample code is wrong (by itself) and has undefined behavior. You are trying to read an uninitialized variable t four times - two times to index an array and two times in a cast expression - and depending on the (undefined) meaning of &threads[t], the function pthread_create may cause more UB.
Besides, its obviously not the code you have used because the pthread_create functions are lacking closing parentheses.
Regarding the variable i: declaring a new variable i (i.e. int i = 0) in the local scope hides any possible i's in the more broad scope - so there should not be any problems using i locally as a variable name inside the function.

phtread has a notion of thread local storage, and gcc offers an easy interface to it with a __thread storage class. Such variables suffer from all the problems of global variables, and then some more. But sometimes they are handy as all other solutions are worse in context.

Related

Convert static local variabel to global variable / Accessing static local variable from outside scope

I have this code that have no idea why it works in online compiler (https://www.programiz.com/c-programming/online-compiler/) (copy this code below and run in that online compiler to verify it).
#include <stdio.h>
void foo(){
static int locl = 0;
locl++;
printf("accessing locl from INSIDE scope: %i\n", locl);
}
int main() {
for (int i=0; i<3; i++){
foo();
printf("accessing locl from OUTSIDE scope: %i\n",*((int *)((unsigned long long)(&foo) + 11979)));
}
return 0;
}
OUTPUT:
accessing locl from INSIDE scope: 1
accessing locl from OUTSIDE scope: 1
accessing locl from INSIDE scope: 2
accessing locl from OUTSIDE scope: 2
accessing locl from INSIDE scope: 3
accessing locl from OUTSIDE scope: 3
So the output is what I expect.
Variable with name locl is static integer that will increment 1 every time foo is called, but focus on this part when I tried accessing static local variable outside its scope:
*((int *)((unsigned long long)(&foo) + 11979))
Where the constant 11979 is comefrom?
Is that constant is universal when applying convert static local variable to global in another platform?
Is there guarantee that part code is always success executed during runtime?
If not, is there a way UNIVERSALLY to convert static local variable to global aka accessing static local variable from outside its scope?
Writing code like that is a crime against nature. Exposing unprepared students to it is a crime against humanity.
Its author figured out that specific code with that specific compiler and that specific linker (including which versions of them), building with the specific options that that online compiler is using, the locl object was stored 11,979 bytes away from where the code for the foo function was stored. The code (int *)((unsigned long long)(&foo) + 11979) converts the address of &foo to an integer type, adds 11,979 to it, and converts it to a pointer to an int.
This code may break if other variables are added, if other functions are added, if various other modifications to the source code are made, if the target system of the build is changed, if the compiler is changed, and so on. There is essentially nothing that guarantees code like this will work except that if you change absolutely nothing about it, it may behave the same as it did before, like avoiding any vibration or air movement near a coin balanced on its edge might guarantee the coin does not fall over.
It does demonstrate that inside a process, the memory assigned to that process is generally visible and can be manipulated in devious ways, which is something programmers should know because malicious people can take advantage of it.
Where the constant 11979 is comefrom? Is that constant is universal
when applying convert static local variable to global in another
platform?
I have absolutely no idea what the magic number does :(
Is there guarantee that part code is always success executed during
runtime?
No, it shows a different result in godbolt with gcc 12.1
if not, is there a way UNIVERSALLY to convert static local variable to
global aka accessing static local variable from outside its scope?
Yes, there is, returning a pointer to the static variable:
#include <stdio.h>
int *foo(void)
{
static int x = 0;
return &x;
}
int main(void)
{
*foo() = 42;
printf("%d\n", *foo());
return 0;
}
Output:
42

Using local variables with functions that take pointers in FreeRTOS

I need some explanation/clarification about using local variables in FreeRTOS when I need to pass them to another functions as pointers.
For example I have some function that modifies data under pointer 'data'.
void modify_data(int * data){
*data = 10;
}
Can I use it like this?
void some_function(void){
int d; // local variable
modify_data(&d);
}
Or maybe I should make global variable?
int d;
void some_function(void){
modify_data(&d);
}
Or static variable?
void some_function(void){
static int d;
modify_data(&d);
}
My question in general is:
How to use (or replace) local variables with functions that take pointers in FreeRTOS?
Edit:
At this moment my understanding of this is:
local variables within a function have no use if I want to pass their pointers to another function (or do anything with pointers pointing these variables) because task switching can cause change of memory location where local variable is stored
I have to declare variables as static or global if I want to do anything with their pointers
this is a bit annoying, because a lot of variables in my big program must be declared globally and passing pointers to global data makes no sense except for the readability of the code
I'm using FreeRTOS 10.2.1, CMSIS 1.02 and code runs on STM32 microcontroller.
For starters this statement
*data = (*data)++;
invokes undefined behavior.
As for your question then to change a variable within a function you need to pass it to the function by reference that is indirectly through a pointer to it. For example
void f( int *px )
{
*px = 10;
}
void g( void )
{
static int x;
f( &x );
}
Depends on what you want to do. If you want to use d outside of that function you need to define it as a global, if you are only gonna use it inside the function declare it as local.
So after some discussion in comments and some research I found the answer for my concerns.
Local variables can be used in any way, JUST EXCEPT local variables declared in main function (and in functions called by main) (before FreeRTOS scheduler is started).
Source of my concerns was that I have read in some tutorial, that local functions created in "main context" may be messed up or not exist. It was not explained clearly that it applies to "main c function" and I missunderstood everything thinking that "main context" is context related with Main Stack Pointer, not just "main function".
FAQ on FreeRTOS website says:
The context of main() does not exist after the FreeRTOS scheduler has
started as, from that point, only RTOS tasks and interrupts have a
context. To maximise the amount of RAM available to the FreeRTOS
application, and as allowed by the C standard as the context of main()
no longer exists, some FreeRTOS ports re-use the stack allocated to
main as the system or interrupt stack. Therefore never allocate
variables or buffers that are needed or in any way accessed by the
FreeRTOS application on the stack used by main() because they are
likely to get overwritten.
So this simplified example would be OK:
int x;
int main(void)
{
x = 10;
createTasks();
vTaskStartScheduler();
}
But something like this will not work in FreeRTOS:
int * px; // this pointer will be not valid after vTaskStartScheduler()
int main(void)
{
int x = 10;
px = &x;
createTasks();
vTaskStartScheduler();
}
Someone might ask why I used a local variable in main and want to access it in the rest of the application without declaring it as a global variable. I was doing this because I have developed specyfic/weird way of attaching my code to CMSIS/STM32Cube generated code (which forces the programmer to write in "user code regions") that was working until I started using FreeRTOS.

Can I insert a function inside a pthread_mutex_lock and unlock statements?

Let's suppose I want to set atomic instructions into a function.
I declared
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
as a global variable.
Instead of:
int main() {
myFoo();
...
}
void myFoo() {
pthread_mutex_lock(&mutex);
myGlobal++;
pthread_mutex_unlock(&mutex);
}
can I do:
int main() {
pthread_mutex_lock(&mutex);
myFoo();
pthread_mutex_unlock(&mutex);
...
}
void myFoo() {
myGlobal++;
}
So that every instructions in myFoo become atomic?
In first example, you are protecting myGlobal and in 2nd you are protecting myFoo. Your code works as you expect (if you call it everywhere between lock/unlock), but you need to use terms correctly or its meaning will be wrong.
No it will not be atomic, but access to myFoo will be synchronized, meaning no other thread can access that part code when a another thread is using it.
Atomic operation term normally is used showing that an instruction is run without any interruption (sometimes considered lock-free). For example, C11's atomic_flag provides such functionality. On the other hand, mutex is for creating mutual exclusion. You can protect a part of your code from simultaneous access from different threads. These 2 terms are not similar.
Side note:
Only atomic_ type that is guaranteed to be really atomic and lock-free is atomic_flag is both C and C++. Other ones such as atomic_int may be implemented using synchronization method and is not lock-free.
Your use of the term atomic is not really correct but I guess the question is more about whether the two code snippets will behave the same.
If myFoo is only called between lock/unlock, the answer is yes, they are the same.
However, in the second case you have lost protection of myFoo. Another thread could call myFoo without calling lock first which would cause problems.
So the second example is bad as it opens up for more mistakes. Stick to the first one, i.e. keep the lock/unlock inside the function.
Also notice:
Since myGlobal is a global variable, you can't make sure that the threads do not access it directly. There are several ways to avoid that. The example below shows a single function with a static variable. The function can be used to receive the static variable and do an increment if desired.
int myFoo(int doIncrement)
{
static int myStatic = 0;
int result;
pthread_mutex_lock(&mutex);
if (doIncrement) myStatic++;
result = myStatic;
pthread_mutex_unlock(&mutex);
return result;
}
Now the variable myStatic is hidden from all the threads and can only be accessed through myFoo.
int x = myFoo(1); // Increment and read
int y = myFoo(0); // Read only

Selecting Static over Global. Why?

The output of the following code is 0.
int account=2;
int main()
{
static int account;
printf("%d",account);
return 0;
}
Why it picked static variable over global variable? Because what I understand is that both global and static variables are stored in the heap and not in function stack , right? So what method it uses to use select one over another?
If multiple variables exist with the same name at multiple scopes, the one in the innermost scope is the one that is accessible. Variables at higher scope are hidden.
In this case you have account defined in main. This hides the variable named account declared at file scope. The fact that the inner variable inside main is declared static doesn't change that.
While the static declaration on a local variable means that it is typically stored in the same place as a global variable, that has no bearing on which is visible when the names are the same.
Consider this small self explaining program:
#include <stdio.h>
int bar = 123; // global variable, can be accessed from other source files
static int blark; // global variable, but can be accessed only in the same
// source file
void foo()
{
static int bar; // static variable : will retain it's value from
// one call of foo to the next
// most compilers will warn here:
// warning declaration of 'bar' hides global declaration
printf("foo() : bar = %d\n", bar); // won't use the global bar but the
// local static bar
bar++;
}
void quork()
{
int bar = 777; // local variable exists only during the execution of quork
// most compilers will warn here as well:
// warning declaration of 'bar' hides global declaration
printf("quork() : bar = %d\n", bar); // won't use the global bar but the
// local bar
bar++;
}
int main() {
foo();
foo();
printf("main() 1 : bar = %d\n", bar);
bar++;
quork();
quork();
foo();
printf("main() 2 : bar = %d\n", bar);
printf("blark = %d\n", blark);
}
Output:
foo() : bar = 0
foo() : bar = 1
main() 1 : bar = 123
quork() : bar = 777
quork() : bar = 777
foo() : bar = 2
main() 2 : bar = 124
blark = 0
Just to clarify for future readers, global and static variables are not stored in heap or stack memory.
https://www.geeksforgeeks.org/memory-layout-of-c-program/
They will either be stored in initialized data or uninitialized data.
Thats not the main question here, which was answered by dbush, but it is a misunderstanding in the original question.
Short answer: encapsulation.
static describes both lifetime and visibility of a variable, and its meaning changes depending on the context. My opinion is that it is one of the more useful and important language features for encapsulation in c. Ignoring the complex relationship to extern, here's a simplified description:
static variables defined at the file level have program lifetime and compilation unit visibility. This means all functions in a .c file can access/modify the variable, but other .c files won't know about the variable. This is super useful for making sure variables used across functions with a compilation unit don't accidentally link with variables in other compilation units. Personally, I highly recommend all file variables to be static by default. Only remove the static specifier if you really want another compilation unit to have access to it (although a getter function may be safer)
Variables declared static within a block scope (most importantly function scope) have program lifetime, and scope visibility. That means it functions as if you declared the variable globally in the file, but only code within that block scope can see it. It also means from one call to the next, the variable does not get destroyed and state can be transferred from call to call.
One really important difference with static variables is that they are default-initialized to zero. This differs from all other variables in c, and is the reason your program prints the value 0. Often times with trivial programs we don't notice the difference because the stack hasn't been polluted with variables yet, but it becomes critical for any program of size.
The most common use for this that I have seen is to enable one-time initialization within a scope. They are also extremely useful for synchronization primitives like pthread_mutex_t. One time I even implemented a state-machine with function-scope static variable.
an example:
int started;//oops, anybody in the entire program can change this value, especially with such a common name!
static int lastCall;
int callCount(void)
{
// This is default-initialized to 0
static int functionStaticVariable;
//Increment each time I'm called
++functionStaticVariable;
//tell the outside world that I'm the one who was called last
lastCall = 1;
//return (a copy of) my internal state.
return functionStaticVariable;
}
char *getSharedMemory(unsigned int bytes)
{
// Here I cannot see functionStaticVariable, but I can see globalVariable
//functionStaticVariable++; // this would cause a compilation failure
// static pointer is default-initialized to zero (i.e. NULL)
static char *sharedMemory;
if(sharedMemory == 0)
{
// This block only executes once, the first time the function is called.
// Actually this is a nice side-effect because it means if the function is never called we don't clutter the stack with unused memory
// Although we will probably never free this memory
sharedMemory = (char *)malloc(bytes);
}
// tell the outside world that this function has been called
lastCall = 2;//valid
//Woah, this is such a bad idea, but actually does _not_ return memory that gets invalidated
return sharedMemory;
}
Hopefully you can see with this pattern you could protect a variable by placing it inside a function and doing optional things like acquiring a mutex-lock in order to allocate the memory. You could even implement the double-lock pattern this way.
I secretly wish that all C++ programmers learned good c encapsulation, because actually the language really encourages it. You can do an incredible amount by placing only functions that need to communicate with each other together in a compilation unit. In a non-OOP language, this can be very powerful.
Full details of static and extern are described by https://en.cppreference.com/w/c/language/storage_duration.
The pragmatic reasoning behind why innermost variable decaration should be the one used: you're not always in control of what's outside your code. You want to be able to write a function that certainly works. If other programmers (say, in a larger team) could break your code just by the way they name variables in other parts of the code, programming would be more of a pain than it is now.

Why pthread_self is marked with attribute(const)?

In Glibc's pthread.h the pthread_self function is declared with the const attribute:
extern pthread_t pthread_self (void) __THROW __attribute__ ((__const__));
In GCC that attribute means:
Many functions do not examine any values except their arguments, and have no effects except the return value. Basically this is just slightly more strict class than the pure attribute below, since function is not allowed to read global memory.
I wonder how that's supposed to be? Since it does not take any argument, pthread_self is therefore allowed only to always return the same value, which is obviously not the case. That is, I would have expected pthread_self to read global memory, and therefore eventually be marked as pure instead:
Many functions have no effects except the return value and their return value depends only on the parameters and/or global variables. Such a function can be subject to common subexpression elimination and loop optimization just as an arithmetic operator would be. These functions should be declared with the attribute pure.
The implementation on x86-64 seems to be actually reading global memory:
# define THREAD_SELF \
({ struct pthread *__self; \
asm ("mov %%fs:%c1,%0" : "=r" (__self) \
: "i" (offsetof (struct pthread, header.self))); \
__self;})
pthread_t
__pthread_self (void)
{
return (pthread_t) THREAD_SELF;
}
strong_alias (__pthread_self, pthread_self)
Is this a bug or am I not seeing something?
The attribute was most likely added in the assumption that GCC would only use it locally (within a function), and would never be able to use it for inter-procedural optimizations. Today, some of Glibc developers are questioning the correctness of the attribute exactly because powerful inter-procedural optimization could, potentially, lead to miscompilation; quoting post by Torvald Riegel to Glibc developers' mailing list,
The const attribute is specified as asserting that the function does not
examine any data except the arguments. __errno_location has no
arguments, so it would have to return the same values every time.
This works in a single-threaded program, but not in a multi-threaded
one. Thus, I think that strictly speaking, it should not be const.
We could argue that this magically is meant to always be in the context
of a specific thread. Ignoring that GCC doesn't define threads itself
(especially in something like NPTL which is about creating a notion of
threads), we could still assume that this works because in practice, the
compiler and its passes can't leak knowledge across a function used in
one thread and other one used in another thread.
(__errno_location() and pthread_self() both are marked with __attribute__((const)) and receive no arguments).
Here's a small example that could plausibly be miscompiled with powerful interprocedural analysis:
#include <pthread.h>
#include <errno.h>
#include <stdlib.h>
static void *errno_pointer;
static void *thr(void *unused)
{
if (!errno_pointer || errno_pointer == &errno)
abort();
return 0;
}
int main()
{
errno_pointer = &errno;
pthread_t t;
pthread_create(&t, 0, thr, 0);
pthread_join(t, 0);
}
(the compiler can observe that errno_pointer is static, it does not escape the translation unit, and the only store into it assigns the same "const" value, given by __errno_location(), that is tested in thr()). I've used this example in my email asking to improve documentation of pure/const attributes, but unfortunately it didn't get much traction.
I wonder how that's supposed to be?
This attribute is telling the compiler that in a given context pthread_self will always return the same value. In other words, the two loops below are exactly equivalent, and the compiler is allowed to optimize out the second (and all subsequent) calls to pthread_self:
// loop A
std::map<pthread_t, int> m;
for (int j = 0; j < 1000; ++j)
m[pthread_self()] += 1;
// loop B
std::map<pthread_t, int> m;
const pthread_t self = pthread_self();
for (int j = 0; j < 1000; ++j)
m[self] += 1;
The implementation on x86-64 seems to be actually reading global memory
No, it does not. It reads thread-local memory.

Resources