I have code similar to the following in our product. According to me, the output is '0 1 2 3'. But the output of the similar code is '1 1 1 1'.
for(i = 0 ;i < 5;i++){
int j;
if(i)
printf("%d ",j);
j = i;
}
My understanding is that the j is allocated on the stack only once during the entire period of 'for' loop and the same value is used during iterations. Also, if I move the declaration of j outside for loop, I'm getting the expected result. What am I missing here?
PS - When I run the same code on my personal machine, I am getting the expected output. But on production it is different.
First, to clear things about the storage duration of an automatic local variable, let me quote the C11 standard, chapter §6.2.4, (emphasis mine)
An object whose identifier is declared with no linkage and without the storage-class
specifier static has automatic storage duration, [...]
and,
For such an object that does not have a variable length array type, its lifetime extends
from entry into the block with which it is associated until execution of that block ends in
any way. (Entering an enclosed block or calling a function suspends, but does not end,
execution of the current block.) If the block is entered recursively, a new instance of the
object is created each time. The initial value of the object is indeterminate.
So, in your code, each iteration gets a new instance of j. Nothing is retained.
In your code,
int j; //not initialized
if(i)
printf("%d ",j); //this one here
you're trying to use an unitialized automatic local variable j, which has indeterminate value. It invokes undefined behavior.
As per C11, chapter §6.7.9
If an object that has automatic storage duration is not initialized explicitly, its value is
indeterminate
and related, for UB, annex §J.2
The value of an object with automatic storage duration is used while it is
indeterminate.
Once your code hits UB, the output cannot be justified, anyway.
OTOH, when you declare j outside the loop, it has function scope. Then, unlike above case, there will be only one instance of j for all iterations of the loop.
As per the execution flow, first time, i being 0, if will evaluate to false, printf() will be skipped and j will get initialized. Then, in next iteration, when you hit the printf(), j is initialized and it's all well thereafter.
For some clarity, I think the for loop would be converted under the hood to something like:
i = 0;
LoopStart:
if(!(i<5)){ goto LoopEnd;}
{
int j;
if(i)
printf("%d ",j);
j = i;
}
i++;
goto LoopStart;
LoopEnd:
Actual implementations would differ, but this serves to highlight this point:
The block is entered and exited for each iteration of the loop, meaning each auto in the block is created and destroyed 5 times in this example.
as others mentioned, this means you are using an uninitialized j each time in your printf.
As for why the code might work on some platform / compilers. its probably because j is allocated the same stack address each time, and the compiler doesn't clear the stack when it creates or destroys j, so it just so happens the last value assigned to old, dead j, is accessible through the new, uninitialized one.
Related
Let's assume that I have a for loop, and a very large struct as a stack variable:
for (int x=0 ; x <10; x++)
{
MY_STRUCT structVar = {0};
…code using structVar…
}
Will every compiler actually zero out the struct at the start of every loop? Or do I need to use memset to zero it out?
This is a very large struct and I want to allocate it on the stack, and I need to make sure every member of it is zeroed out at the start of every iteration. So do I need to use memset?
I can manually inspect the executable that I compile, but I need to make sure if there is any standard for this, or it just depends on the compiler.
Note that this code does compile. I am using Visual Studio.
Will every compiler actually zero out the struct at the start of every loop?
Any compiler that conforms to the C Standard will do this. From this Draft C11 Standard (bold emphasis mine):
6.8 Statements and blocks
…
3 A block allows a set of declarations and statements to be grouped into
one syntactic unit. The initializers of objects that have automatic
storage duration, and the variable length array declarators of
ordinary identifiers with block scope, are evaluated and the values
are stored in the objects (including storing an indeterminate value in
objects without an initializer) each time the declaration is reached
in the order of execution, as if it were a statement, and within each
declaration in the order that declarators appear.
In the case of a for or while loop, a declaration/initializer inside the loop's scope block is reached repeatedly on each and every iteration of the loop.
See 6.2.4, paragraph 6:
If an initialization is specified for the object, it is performed each
time the declaration or compound literal is reached in the execution
of the block; otherwise, the value becomes indeterminate each time the
declaration is reached
Will every compiler actually zero out the struct at the start of every loop?
Yes, or it will produce machine code with equivalent functionality ("observable behavior") as if you had performed a zero-out.
As long as you initialize one single member in the struct, then the rest of them will get set to zero/null ("as if they had static storage duration"). Similarly, any padding bytes added to the struct by the compiler will get set to zero. This is guaranteed by the C standard ISO:9899:2018 6.7.9 §10, §19 and §21.
Generally, the place where the zero-out actually occurs in the resulting executable depends on how the data is used. If you for example zero the struct at the beginning of the loop body, then write to various members and print it all in the end of the loop body, the compiler don't have many other choices but to zero-out everything at each lap of the loop. Example:
for (int x=0 ; x <10; x++)
{
MY_STRUCT structVar = {0};
...
structVar.foo = a;
structVar.bar = b;
printf("%d %d\n", structVar.foo, structVar.bar);
}
On the other hand, the compiler might in this case be smart enough to realize that the struct is just a pointless middle man and replace this all with the equivalent printf("%d %d\n", a, b);, meaning that the struct would be removed entirely from the machine code.
Overall, discussing optimizations like this can't be done without a specific use-case, compiler and target system.
Or do I need to use memset to zero it out?
No. MY_STRUCT structVar = {0}; is functionally 100% equivalent of memset(structVar, 0, sizeof structVar);.
This is a very large struct and I want to allocate it on the stack
That's a different matter than initialization. It is indeed unwise to allocate large objects on the stack. In that case consider replacing it with for example this:
MY_STRUCT* structVar = malloc(sizeof *structVar);
for (int x=0 ; x <10; x++)
{
memset(structVar, 0, sizeof *structVar);
...
}
free(structVar);
For the variables (of automatic storage class) defined within the body of a loop, the variables are (notionally) recreated for each iteration of the loop and have indeterminate values for each iteration if not initialized.
Regarding use of the {0} initializer for an object with automatic storage duration, note the following:
As per 6.7.9/19 and 6.7.9/21, elements of the object that have no explicit initializer will be initialized implicitly the same as object of static storage duration. As per 6.7.9/10, those elements will be initialized to value zero or a null pointer as appropriate, and any padding will be initialized to zero bits.
Using memset(ptr, 0, size) sets size bytes from address ptr onwards to 0, but note the following:
For some unusual execution environments, a pointer object with all bytes zero might not represent a null pointer value (so might not compare equal to 0).
For some unusual execution environments, a floating point object with all bytes zero might not represent a valid floating point value or might not compare equal to 0.0.
In summary, using the {0} initializer is the most portable way to set all elements of the object to compare equal to 0, and to set all padding bits or bytes to 0. Using memset instead is generally OK except for some weird execution environments.
#include <stdio.h>
int main()
{
for(int i=0;i<100;i++)
{
int count=0;
printf("%d ",++count);
}
return 0;
}
output of the above program is: 1 1 1 1 1 1..........1
Please take a look at the code above. I declared variable "int count=0" inside the for loop.
With my knowledge, the scope of the variable is within the block, so count variable will be alive up to for loop execution.
"int count=0" is executing 100 times, then it has to create the variable 100 times else it has to give the error (re-declaration of the count variable), but it's not happening like that — what may be the reason?
According to output the variable is initializing with zero every time.
Please help me to find the reason.
Such simple code can be visualised on http://www.pythontutor.com/c.html for easy understanding.
To answer your question, count gets destroyed when it goes outside its scope, that is the closing } of the loop. On next iteration, a variable of the same name is created and initialised to 0, which is used by the printf.
And if counting is your goal, print i instead of count.
The C standard describes the C language using an abstract model of a computer. In this model, count is created each time the body of the loop is executed, and it is destroyed when execution of the body ends. By “created” and “destroyed,” we mean that memory is reserved for it and is released, and that the initialization is performed with the reservation.
The C standard does not require compilers to implement this model slavishly. Most compilers will allocate a fixed amount of stack space when the routine starts, with space for count included in this fixed amount, and then count will use that same space in each iteration. Then, if we look at the assembly code generated, we will not see any reservation or release of memory; the stack will be grown and shrunk only once for the whole routine, not grown and shrunk in each loop iteration.
Thus, the answer is twofold:
In C’s abstract model of computing, a new lifetime of count begins and ends in each loop iteration.
In most actual implementations, memory is reserved just once for count, although implementations may also allocate and release memory in each iteration.
However, even if you know your C implementation allocates stack space just once per routine when it can, you should generally think about programs in the C model in this regard. Consider this:
for (int i = 0; i < 100; ++i)
{
int count = 0;
// Do some things with count.
float x = 0;
// Do some things with x.
}
In this code, the compiler might allocate four bytes of stack space to use for both count and x, to be used for one of them at a time. The routine would grow the stack once, when it starts, including four bytes to use for count and x. In each iteration of the loop, it would use the memory first for count and then for x. This lets us see that the memory is first reserved for count, then released, then reserved for x, then released, and then that repeats in each iteration. The reservations and releases occur conceptually even though there are no instructions to grow and shrink the stack.
Another illuminating example is:
for (int i = 0; i < 100; ++i)
{
extern int baz(void);
int a[baz()], b[baz()];
extern void bar(void *, void *);
bar(a, b);
}
In this case, the compiler cannot reserve memory for a and b when the routine starts because it does not know how much memory it will need. In each iteration, it must call baz to find how much memory is needed for a and how much for b, and then it must allocate stack space (or other memory) for them. Further, since the sizes may vary from iteration to iteration, it is not possible for both a and b to start in the same place in each iteration—one of them must move to make way for the other. So this code lets us see that a new a and a new b must be created in each iteration.
int count=0 is executing 100 times, then it has to create the variable 100 times
No, it defines the variable count once, then assigns it the value 0 100 times.
Defining a variable in C does not involve any particular step or code to "create" it (unlike for example in C++, where simply defining a variable may default-construct it). Variable definitions just associate the name with an "entity" that represents the variable internally, and definitions are tied to the scope where they appear.
Assigning a variable is a statement which gets executed during the normal program flow. It usually has "observable effects", otherwise the compiler is allowed to optimize it out entirely.
OP's example can be rewritten in a completely equivalent form as follows.
for(int i=0;i<100;i++)
{
int count; // definition of variable count - defined once in this {} scope
count=0; // assignment of value 0 to count - executed once per iteration, 100 times total
printf("%d ",++count);
}
Eric has it correct. In much shorter form:
Typically compilers determine at compile time how much memory is needed by a function and the offsets in the stack to those variables. The actual memory allocations occur on each function call and memory release on the function return.
Further, when you have variables nested within {curly braces} once execution leaves that brace set the compiler is free to reuse that memory for other variables in the function. There are two reasons I intentionally do this:
The variables are large but only needed for a short time so why make stacks larger than needed? Especially if you need several large temporary structures or arrays at different times. The smaller the scope the less chance of bugs.
If a variable only has a sane value for a limited amount of time, and would be dangerous or buggy to use out of that scope, add extra curly braces to limit the scope of access so improper use generates immediate compiler errors. Using unique names for each variable, even if the compiler doesn't insist on it, can help the debugger, and your mind, less confused.
Example:
your_function(int a)
{
{ // limit scope of stack_1
int stack_1 = 0;
for ( int ii = 0; ii < a; ++ii ) { // really limit scope of ii
stack_1 += some_calculation(i, a);
}
printf("ii=%d\n", ii); // scope error
printf("stack_1=%d\n", stack_1); // good
} // done with stack_1
{
int limited_scope_1[10000];
do_something(a,limited_scope_1);
}
{
float limited_scope_2[10000];
do_something_else(a,limited_scope_2);
}
}
A compiler given code like:
void do_something(int, int*);
...
for (int i=0; i<100; i++)
{
int const j=(i & 1);
doSomething(i, &j);
}
could legitimately replace it with:
void do_something(int, int*);
...
int const __compiler_generated_0 = 0;
int const __compiler_generated_1 = 1;
for (int i=0; i<100; i+=2)
{
doSomething(i, &compiler_generated_0);
doSomething(i+1, &compiler_generated_1);
}
Although a compiler would typically allocate space on the stack once for j, when the function was entered, and then not reuse the storage during the loop (or even the function), meaning that j would have the same address on every iteration of the loop, there is no requirement that the address remain constant. While there typically wouldn't be an advantage to having the address vary on different iterations, compilers are be allowed to exploit such situations should they arise.
Consider the program
#include <stdio.h>
int main(void) {
for (int curr = 0; curr < 3; curr++) {
int prev;
if (curr) {
printf("%d\n", prev); //valid; prev has 0 or 1
}
prev = curr;
}
}
Is it valid?
What's the lifetime and scope of prev?
There will be 3 distinct prevs with lifetime and scope inside the for loop.
The distinct prevs may (but are not required to) share the same address.
Program is not valid.
There will be 3 prevs with lifetime and scope inside the for loop.
The prevs will share the same address, behaving as if defined with static.
Program is valid.
There will be 1 prev, as if it was defined outside the for loop.
Program is valid.
Note: question originated during discussion on this answer
Three different and distinct prevs that do not (need to) share address or value.
On every loop a different prev will be "created" and "deleted".
1
prev is treated as an automatic variable that has a scope inside the loop instance
This means that at each iteration all the automatic variables are released and then
re-acquired when the loop executes again.
Actually, the definition of the for-loop is that the for statement executes each time (incrementing the variable, in this case) and testing the condition,
so even looking at the code you can see that the bracketed scope is finished
each time the loop ends.
Any guarantee of the address at which the local variable is allocated (or a register holding the value) is void. It is totally implementation-dependent.
When using C89, or when using only the supplied looping constructs constructs in later versions of C, it's impossible for code to reach a location above an automatic object's declaration within the lifetime of the object. C99, however, added the ability to use goto to transfer control from a point below a declaration to a point above within the lifetime of the object declared thereby. I'm not sure to what extent any non-contrived programs rely upon the fact that using goto to transfer control above the declaration of a non-VLA object does not end their lifetime, but the Standard requires that implementations make allowance for such behavior, e.g.
void test(void)
{
int pass=0;
int temp;
int *p;
int result;
firstPass:
if (pass)
{
temp = *p;
goto secondPass;
}
pass++;
int q=1;
p=&q;
q++;
goto firstPass;
secondPass:
return temp + q;
}
the lifetime of q would start when code enters test, and extend throughout the execution of the function even if code branches to a point above the declaration. If execution reaches a declaration with an initializer, the value of the object is assigned at that time; if it reaches a declaration withou an initializer, the value of the object becomes indeterminate at that time, but if code jumps over the declaration the object retains its value.
ISO/IEC 9899:TC3 is perfectly clear about that:
In the original post, prev has a constant address within the loop, but its value is indeterminate when evaluated. Therefore, the code exhibits undefined behavior.
I think this makes all of the answers in the original post (1-4) incorrect.
This is a question for my Programming Langs Concepts/Implementation class. Given the following C code snippet:
void foo()
{
int i;
printf("%d ", i++);
}
void main()
{
int j;
for (j = 1; j <= 10; j++)
foo();
}
The local variable i in foo is never initialized but behaves similarly to a static variable on most systems. Meaning the program will print 0 1 2 3 4 5 6 7 8 9. I understand why it does this (the memory location of i never changes) but the question in the homework asks to modify the code (without changing foo) to alter this behavior. I've come up with a solution that works and makes the program print ten 0's but I don't know if it's the "right" solution and to be honest I don't exactly know why it works.
Here is my solution:
void main()
{
int j;
void* some_ptr = NULL;
for (j = 1; j <= 10; j++)
{
some_ptr = malloc(sizeof(void*));
foo();
free(some_ptr);
}
}
My original thought process was that i wasn't changing locations because there was no other memory manipulation happening around the calls of foo, so allocating a variable should disrupt that, but ince some_ptr is allocated in the heap and i is on the stack, shouldn't the allocation of some_ptr have no effect on i? My thought is that the compiler is playing some games with the optimization of that subroutine call, could anyone clarify?
There cannot be a "right" solution. But there can be a class of solutions which work for a particular CPU architecture, ABI, compiler, and compiler options.
Changing the code to something like this will have the effect of altering the memory above the stack in a way which should affect many, if not most, environments in the targeted way.
void foo()
{
int i;
printf("%d ", i++);
}
void main()
{
int j;
int a [2];
for (j = 1; j <= 10; j++)
{
foo();
a [-5] = j * 100;
}
}
Output (gcc x64 on Linux):
0 100 200 300 400 500 600 700 800 900
a[-5] is the number of words of stack used for overhead and variables spanning the two functions. There is the return address, saved stack link value, etc. The stack likely looks like this when foo() writes to a[-5]:
i
saved stack link
return address
main's j
(must be something else)
main's a[]
I guessed -5 on the second try. -4 was my first guess.
When you call foo() from main(), the (uninitialized) variable i is allocated at a memory address. In the original code, it so happens that it is zero (on your machine, with your compiler, and your chosen compilation options, your environment settings, and given the current phase of the moon — it might change when any of these, or a myriad other factors, changes).
By calling another function before calling foo(), you allow the other function to overwrite the memory location that foo() will use for i with a different value. It isn't guaranteed to change; you could, by bad luck, replace the zero with another zero.
You could perhaps use another function:
static void bar(void)
{
int j;
for (j = 10; j < 20; j++)
printf("%d\n", j);
}
and calling that before calling foo() will change the value in i. Calling malloc() changes things too. Calling pretty much any function will probably change it.
However, it must be (re)emphasized that the original code is laden with undefined behaviour, and calling other functions doesn't make it any less undefined. Anything can happen and it is valid.
The variable i in foo is simply uninitialized, and uninitialized value have indeterminate value upon entering the block. The way you saw it print certain value is entirely by coincident, and to write standard conforming C, you should never rely on such behavior. You should always initialize automatic variables before using it.
From c11std 6.2.4p6:
For such an object that does not have a variable length array type, its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time. The initial value of the object is indeterminate. If an initialization is specified for the object, it is performed each time the declaration or compound literal is reached in the execution of the block; otherwise, the value becomes indeterminate each time the declaration is reached.
The reason the uninitialized value seems to keep its value from past calls is that it is on the stack and the stack pointer happens to have the same value every time the function is called.
The reason your code might be changing the value is that you started calling other functions: malloc and free. Their internal stack variables are using the same location as i in foo().
As for optimization, small programs like this are in danger of disappearing entirely. GCC or Clang might decide that since using an uninitialized variable is undefined behavior, the compiler is within its rights to completely remove the code. Or it might put i in a register set to zero. Then decide all printf calls output zero. Then decide that your entire program is simply a single puts("0000000000") call.
About the following code:
#include <stdio.h>
int lastval(void)
{
`static int k = 0;
return k++;
}
int main(void)
{
int i = 0;
printf("I previously said %d\n", lastval());
i++;
i++;
i++;
i++;
i++;
printf("I previously said %d\n", lastval());
i++;
i++;
i++;
printf("I previously said %d\n", lastval());
i++;
i++;
i++;
printf("I previously said %d", lastval());
i++;
i++;
i++;
return 0;
}
can anyone explain to me how does static maintain its value ? I though it was because the stack frame for the function wasnt destroyed after the return so I wrote this code to run it under gdb and I after doing backtraces after every single line only main's stack frame show up (it doesnt even list lastval when I do a backtrace sitting on a printf call, but anyway).
How its k actually stored ? I know that's not like a normal variable since the first k++ returns 1 instead of 0, and its not like a global since i cant access k inside main for example, so .. what's going on ?
`on a local k, K++ // Always returns 0
`on a global k = 0, k++ // returns 0, 1, 2
`on a static k, k++ // returns 1, 2 ,3
Can anyone help me to understand these 2 issues ?
A static definition inside a function is just like a static definition outside a function other than the scope of the symbol (where in your the program you can refer to it). It has nothing to do with stack frames ... k isn't on the stack, it's in "static"/global/.data memory.
I know that's not like a normal variable since the first k++ returns 1 instead of 0
No, it returns 0, but k's value is then 1.
on a static k, k++ // returns 1, 2 ,3
No, that's not correct ... it returns 0, 1, 2 ...
and its not like a global since i cant access k inside main for example
That's simply how name scopes work; it has nothing to do with how k is stored.
variable k is defined inside function lastval so its scope is in this function only.But since you defined it with static keyword, its lifetime becomes equal to lifetime of programme.
so as per definition of static this variable k will be initialized only once and it will retain its last value and that is what happening in your case.Initialized static variable will go in .data section of memory.so k will get memory in .data.
The static variables stored as globals, except their scope. Your program outputs 0 1 2 3 on my computer, so double-check it.
Not to be snarky, but any reference on C will explain this.
The static keyword used on a variable within a function allows the value to persist across function calls. It's stored like a global, but you can only access it by name within the scope of the function where it is defined.
Some typical uses of static variables in the standard C library occur in the implementations of rand(3) and strtok(3). Their use is commonly discouraged because it can lead to functions that are not reentrant and thus play havoc when multiple threads are used. POSIX defines a strtok_r function for use when reentrancy is required.
The static variables are instantiated in the global memory space (Not in a stack frame) before the program start executing the main.
C,
That variable is initialized before the program start executing the main.
C++,
That variable is initialized when the program encounters this line for the 1st time.
When you execute the "lastval()" function, that "static int k = 0" will not be executed again.