What happens to initialization of static variable inside a function - c

After stumbling onto this question and reading a little more here (c++ but this issue works the same in C/C++ AFAIN) I saw no mention to what is realy happening inside the function.
void f(){
static int c = 0;
printf("%d\n",c++);
}
int main(){
int i = 10;
while(i--)
f();
return 0;
}
In this snippet, c lifetime is the entire execution of the program, so the line static int c = 0; has no meaning in the next calls to f() since c is already a defined (static) variable, and the assignment part is also obsolete (in the next calls to f()), since it only takes place at the first time.
So, what does the compiler do? does it split f into 2 functions - f_init, f_the_real_thing where f_init initializes and f_the_real_thing prints, and calls 1 time f_init and from that onward, only calls f_the_real_thing?

The first assignment is not "obsolete" - it ensures c is zero the first time f() is called. Admittedly that is the default for statics: if no initialiser is specified, it will be initialised to zero. But a static int c = 42 will ensure c has the value 42 the first time the function is called, and the sequence of values will continue from there.
The static keyword means that the variable has static storage duration. It is only initialised once (so will have that value the first time the function is called) but changes then persist - any time the value is retrieved, the value retrieved will be the last stored in the variable.
All the compiler does is place the variable c into an area of memory that will exist - and hold whatever value it was last set to - for as long as the program is running. The specifics of how that is achieved depends on the compiler.
However, I have never seen a compiler that splits the logic of the function into multiple parts to accommodate the static.

Although the standard does not dictate how compilers must implement behavior, most compilers do a much less sophisticated thing: they place c into static memory segment, and tell the loader to place zero into c's address. This way f comes straight to pre-initialized c, and proceeds to printing and incrementing as if the declaration line where not there.
In C++ it optionally adds code to initialize c to static initialization function, which initializes all static variables. In this case, no call is required.
In essence, this amounts to c starting its lifetime before the first call to f. You can think of c's behavior as if it were a static variable outside f() with its visibility constrained to f()'s scope.

The C standard doesn't specify how the required behaviour for static storage duration must be implemented.
If you're curious about how your particular implementation handles this, then you can always check the generated assembly.
(Note that in your particular case, your code is vulnerable to concurrency issues centred around c++ not necessarily being atomic; also its vulnerability to int overflow, although i-- does act as an adequate termination condition.)

Related

Why can't a global variable be initialised from another global?

int a = 5;
int b = a; //error, a is not a constant expression
int main(void)
{
static int c = a; //error, a is not a constant expression
int d = a; //okay, a don't have to be a constant expression
return 0;
}
I don't understand what happens when a C compiler handles those variable declarations.
Why was C designed to be unable to handle int b = a?
The specific rule that applies here is C 2018 6.7.9 4:
All the expressions in an initializer for an object that has static or thread storage duration shall be constant expressions or string literals.
The primary reason for this arises from the way that C is typically implemented, with a compiler generating object modules (which are later linked into executables and run). In such a C implementation, initializing an object with static storage duration requires the compiler to know the initial value, whereas initializing an object with automatic storage duration does not require the compiler to know the initial value.
This is because, to initialize an object with static storage duration, the compiler must put the initial value into the object module being generated. To initialize an object with automatic storage duration, the compiler merely needs to generate instructions to get or calculate the initial value and store it in the object.
When int a = 5; and int b = a; appear outside of any function, a and b have static storage duration. At this point, the compiler could, in theory, see that a was initialized with the value five, so it has that value when used in int b = a;. However, there are some issues with this:
It requires the compiler to maintain knowledge about objects it is not currently required to maintain.
It can only be done in certain circumstances, as when the initializer for b uses only objects that have been initialized earlier in the same translation unit. Extending the language to require the compiler to support this would require more complicated rules.
It requires extending the language semantics to specify what values objects have before the program is executed at all.
Another possibility would be that, to make initializations like int b = a; work, the compiler does it by generating instructions to be executed when the program starts, possibly immediately before main is called. This adds complications, including:
If int b = a; appears in a translation unit other than the one containing main, how is the code necessary to initialize b inserted in main or in the program start-up code?
When there are multiple such initializations, some of which may depend on each other, how do you ensure they are executed in the desired order?
These problems could be solved, in theory, but C is intended to be a “simple” language.
This happens because variables declared outside any function (at file scope) or variables declared as static both get static storage duration. Such variables are initialized before main() is even called. And that early on in the execution, values that aren't pure constant expressions might not yet be available.
RAM-based hosted systems like PC might even initialize such variables in one go, at the point where the executable is copied from the hard-drive to RAM, which gives very fast initialization. Which wouldn't be possible if you did something like static x = func(); because then you definitely have to execute func() before setting x.
There's also the issue of static initialisation order. Suppose a and b were declared in separate files, which should get initialized first, a or b?
C++ unlike C has decided to allow more complex forms of initialization and as a result, C++ has very complex initialization rules. And also it is a very common C++ bug to write code depending on the order of static initialization. There are valid arguments for/against both the C and the C++ way.

Is function pointer address always static?

If a function pointer scopes out before being used in another thread to run, will the pointer be invalid? Or are function pointers always valid since they point to executable code which doesn't "move around"?
I think my real question is whether what the pointer points to (the function) will ever change, or is that value static throughout lifetime of program
Pseudo-code:
static void func(void) { printf("hi\n"); }
int main(void)
{
start_thread();
{
void (*f)(void) = func;
// edit: void run_on_other_thread(void (*f)(void));
run_on_other_thread(f); // non-blocking.
}
join_thread();
}
In the C base language, the values of function pointers never become invalid. They point to functions, and functions exist for the entire time a program is executing. The value of a pointer is valid for the entire program.
An object that contains a pointer may have a limited lifetime. (Note: The question mentioned scope, but scope is where in the source code an identifier is visible. Lifetime is when during program execution an object exists.) In the question void (*f)(void) = func;, f is an object with automatic storage duration. Once execution of the block it is defined in ends, f no longer exists, and references to it have undefined behavior. However, the value that was assigned to f is still a valid value. For example, if we define int x = 37;, and the lifetime of x ends, that does not mean you can no longer use the value 37 in a program. In this case, the value that f had, which is the address of func, is still valid. The address of func can continue to be used throughout the program’s execution.
The situations discussed in Xypron’s answer regarding dynamically linked functions or dynamically created functions would be extensions to the C language. In these situations, it is not the lifetime of the pointer object that is in question but rather the fact that the function itself is being removed from memory that causes the pointer to be no longer a valid pointer to the original function.
Whether a function pointer remains valid depends on its usage.
If it points to a function in the source code of your process it stays valid during the runtime of the process.
If you use a function pointer to point to a function in a dynamic link library, the pointer becomes invalid when unloading the library.
Code can be written that relocates itself. E.g. when the Linux kernel is started it relocates itself changing the addresses of functions.
You could call a runtime compiler which creates functions in memory during program execution possibly reusing the memory when an object goes out of scope.
As said it depends.

Global Variable Access Relative to Function Calls and Returns

I have been researching this topic and I can not find a specific authoritative answer. I am hoping that someone very familiar with the C spec can answer - i.e. confirm or refute my assertion, preferably with citation to the spec.
Assertion:
If a program consists of more than one compilation unit (separately compiled source file), the compiler must assure that global variables (if modified) are written to memory before any call to a function in another unit or before the return from any function. Also, in any function, the global must be read before its first use. Also after a call of any function, not in the same unit, the global must be read before use. And these things must be true whether the variable is qualified as "volatile" or not because a function in another compilation unit (source file) could access the variable without the compiler's knowledge. Otherwise, "volatile" would always be required for global variables - i.e. non-volatile globals would have no purpose.
Could the compiler treat functions in the same compilation unit differently than ones that aren't? All of the discussions I have found for the "volatile" qualifier on globals show all functions in the same compilation unit.
Edit: The compiler cannot know whether functions in other units use the global or not. Therefore I am assuming the above conditions.
I found these two other questions with information related to this topic but they don't address it head on or they give information that I find suspect:
Are global variables refreshed between function calls?
When do I need to use volatile in ISRs?
[..] in any function, the global must be read before its first use.
Definitely not:
static int variable;
void foo(void) {
variable = 42;
}
Why should the compiler bother generating code to read the variable?
The compiler must assure that global variables are written to memory before any function call or before the return from a function.
No, why should it?
void bar(void) {
return;
}
void baz(void) {
variable = 42;
bar();
}
bar is a pure function (should be determinable for a decent compiler), so there's no chance of getting any different behaviour when writing to memory after the function call.
The case of "before returning from a function" is tricky, though. But I think the general statement ("must") is false if we count inlined (static) functions, too.
Could the compiler treat functions in the same compilation unit differently than ones that aren't?
Yes, I think so: for a static function (whose address is never taken) the compiler knows exactly how it is used, and this information could be used to apply some more radical optimisations.
I'm basing all of the above on the C version of the As-If rule, specified in §5.1.2.3/6 (N1570):
The least requirements on a conforming implementation are:
Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place as specied in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.
This is theobservable behaviorof the program.
In particular, you might want to read the following "EXAMPLE 1".

Declaring a function level static variable inside an if block that is never hit

My understanding about static variables declared inside a function is:
If no initial value is specified, the static variable will reside in .bss, otherwise in .data
The memory for statics are allocated along with globals - i.e., well before the execution enters main
are these two assumptions correct ?
When the execution hits the function for the first time, statics are initialized to the user specified value (or zero in case no initial value is specified).
...and they retain their values across subsequent invocations of the function
But what if I declare my static variable inside an if block? I assume my third point should be updated to "when the execution hits the line where the static variable is declared, they're initialized to ... " - am I right ?
Now, what if the if block in which they're declared is never hit (and the compiler is able to figure this out) - I understand that the variable will never be initialized; but does any memory get allocated for that variable?
I wrote two functions to try to figure out what's happening:
#include <stdio.h>
void foo()
{
static foo_var_out;
if(0){
static foo_var_in_0;
printf("%d %d\n", foo_var_in_0);
} else {
static foo_var_in_1;
printf("%d %d\n", foo_var_in_1);
}
}
static void bar(int flag)
{
static bar_var_out;
if(flag){
static bar_var_in_0;
printf("%d %d\n", bar_var_in_0);
} else {
static bar_var_in_1;
printf("%d %d\n", bar_var_in_1);
}
}
int main()
{
foo();
bar(0);
}
And I took the object dump:
$ gcc -o main main.c
$ objdump -t main | grep var
45:080495c0 l O .bss 00000004 foo_var_in_1.1779
46:080495c4 l O .bss 00000004 foo_var_out.1777
47:080495c8 l O .bss 00000004 bar_var_in_1.1787
48:080495cc l O .bss 00000004 bar_var_in_0.1786
49:080495d0 l O .bss 00000004 bar_var_out.1785
From the output it looks like foo_var_in_0 was not created at all (presumably because it is inside an explicit if(0)), whereas bar_var_in_0 was created (as it is possible for the caller to pass a non-zero value - although the only caller is explicitly passing zero).
I guess my question is: is it correct to assume that no memory was allocated for the variable foo_var_in_0 at all? I am asking about this specific case; am I reading the objdump correctly - or should I be doing something more to verify if the variable will take some memory while the program is ran?
In other words, if the line that declares a function level static variable is never hit, is the variable actually declared at all?
If it will not be created at all, is this according to the C standard (less likely), or a compile time optimization and at what level - how do I turn it ON/OFF (in gcc 4.1.1)?
I understand that one int is not a big deal to care about, but I am more interested in how it works; also, what if the variable was a big array of size, say 5000 elements of a 10 byte struct?
is it correct to assume that no memory was allocated for the variable foo_var_in_0 at all?
No, I don't think it would be correct to assume that. As far as I know, optimizations like this are not part of the standard.
If you know for a fact that you compiler does this and you want to assume it, go ahead. If you write anything that needs this to be the case, you might want to write a post-build test to make sure that it happened.
Probably, what you are seeing is a side-effect of the compiler just pruning out some code that it knew would never run. Meaning, it's not specifically looking to remove statics, but it did remove an entire branch, so any code in it just got removed as well.
The C standard does not prescribe where to place variables and stuff. It just prescribes that a conforming implementation shall have equivalent behaviour (to a reference behaviour which is specified by the standard), where "equivalent" is also defined by the standard.
So the simple answer is that it is an optimization, and how to turn it on/off depends on the particular compiler.
An implementation that does interprocedural analysis would probably be able to get rid of bar_var_in_0 as well.
Just to add to the correct answers from the others. Your assumptions about initialization of static variables are not correct.
Variables with static storage are
always initialized. Either explicitly
if you provide an initializer or
implicitly from 0.
The initializer for such a variable
must always be a compile time
constant expression. So the value is
computed at compile time and written
directly into the object. (Well if it
is all zero, some systems have tricks
/ special sections that avoid an
explicit storage of the variable in
the object file.)
So, no, there will be no initializer
statement run when the variable is
accessed the first time (how would
the system know, by the value of
another static variable?) but whenever
you start the program the variable is
already initialized.

using static keyword in C local scope to function

Is there any difference in these two? If so, what exactly is the difference? Assume they are in a C function that may be called multiple times.
declare and assign in same statement
static uint32_t value = x; // x varies and may be passed into function.
declare in one statement and assign in next statment.
static uint32_t value;
value = x; // x varies;
Is value updated only the first time it is declared/initialized or even on subsequent calls.
My understanding of (1) is that it is only set the first time that line is executed so even if x changes the next time the line executes, value will remain the same. I am not sure about (2) but clarification on both will be very helpful
EDIT: Compiler ARM(ADS1.20).
EDIT: A follow up question on (2) from the answers given so far. Is the declaration(not the assignment) repeated on every call or just the first time?
The first should not compile; the static variable requires a constant initializer.
The second sets value each time the function is called, so there was no need to make it static in the first place.
If the first notation was correct - initialized value to 1, say - then it would be initialized once when the program starts and would thereafter only take new values when the code changed it. The second notation still sets value on each call to the function, and so renders the use of static pointless. (Well, if you try hard enough, you can devise scenarios under which the second version has a use for static. For example, if the function returns a pointer to it that other code then modifies, then it might be needed, but that is esoteric in the extreme and would be a pretty bad 'code smell'.)
1 is only executed once, but for 2 value will be reassigned every time.
Static variables are initialized only once.
These are very different declarations.
The first one is declaring a static local variable and giving it an initial value (this should not actually compile given that x is not a constant). This will only occur once before the function is every executed. This is almost certainly the initialization you want.
The second declaration is updating the value every time the function is called. If you want the variable to always start the function with the same value this is the right approach. But if this is truly what you want, then why use a static at all? Just use a local variable.
Your intuition is right. In the second example value is set to x each time the method is called. Static variables need to be initialized and declared in one statement if you only want it to run once.
If you always want value to have the value x, don't declare it as static.
When compiled, the first one will be put into the ".data" section, where data is initialized, while the second one will be put into the ".bss" section, where data is uninitialized.
Use readelf -S xx.o can check the section size of compiled object file.
example 1:
static int i;
void test(){
i = 2;
}
example 2:
static int i=1;
void test(){
i = 2;
}
Folks - In C, the first declaration is perfectly legal. It will compile, and could be used to initialize the value. You could combine the line of code from the second declaration to ensure it gets updated every subsequent function execution. This is commonly used, in particular in embedded programs where memory and resources are more scarce than computers or distributed applications.
The reason why you would use a static is to ensure the variable has a data lifecyle that continues throughout program execution while limiting its access to only the function the static is declared, or any function in the file if the static declaration is on top of the file, otherwise, the data will be lost every time the function is exited. This is good programming practice to avoid inadvertent access to data objects that must be secured and restricted. That comment only applies to the C programming language, don't mistake this to apply for C++ (where it does in some instances), or JAVA. Static in Java has a completely different meaning. From what I've read in this thread, few seem to understand how the keyword static works in C, and are confusing the keyword static form other languages to apply in C. In C, static is a very important keyword that helps manage function and data access, and you can initialize a static variable with another variable provided it is within scope, and you can update that value throughout program execution, which would probably what you need to do anyways.

Resources