Consider the program
#include <stdio.h>
int main(void) {
for (int curr = 0; curr < 3; curr++) {
int prev;
if (curr) {
printf("%d\n", prev); //valid; prev has 0 or 1
}
prev = curr;
}
}
Is it valid?
What's the lifetime and scope of prev?
There will be 3 distinct prevs with lifetime and scope inside the for loop.
The distinct prevs may (but are not required to) share the same address.
Program is not valid.
There will be 3 prevs with lifetime and scope inside the for loop.
The prevs will share the same address, behaving as if defined with static.
Program is valid.
There will be 1 prev, as if it was defined outside the for loop.
Program is valid.
Note: question originated during discussion on this answer
Three different and distinct prevs that do not (need to) share address or value.
On every loop a different prev will be "created" and "deleted".
1
prev is treated as an automatic variable that has a scope inside the loop instance
This means that at each iteration all the automatic variables are released and then
re-acquired when the loop executes again.
Actually, the definition of the for-loop is that the for statement executes each time (incrementing the variable, in this case) and testing the condition,
so even looking at the code you can see that the bracketed scope is finished
each time the loop ends.
Any guarantee of the address at which the local variable is allocated (or a register holding the value) is void. It is totally implementation-dependent.
When using C89, or when using only the supplied looping constructs constructs in later versions of C, it's impossible for code to reach a location above an automatic object's declaration within the lifetime of the object. C99, however, added the ability to use goto to transfer control from a point below a declaration to a point above within the lifetime of the object declared thereby. I'm not sure to what extent any non-contrived programs rely upon the fact that using goto to transfer control above the declaration of a non-VLA object does not end their lifetime, but the Standard requires that implementations make allowance for such behavior, e.g.
void test(void)
{
int pass=0;
int temp;
int *p;
int result;
firstPass:
if (pass)
{
temp = *p;
goto secondPass;
}
pass++;
int q=1;
p=&q;
q++;
goto firstPass;
secondPass:
return temp + q;
}
the lifetime of q would start when code enters test, and extend throughout the execution of the function even if code branches to a point above the declaration. If execution reaches a declaration with an initializer, the value of the object is assigned at that time; if it reaches a declaration withou an initializer, the value of the object becomes indeterminate at that time, but if code jumps over the declaration the object retains its value.
ISO/IEC 9899:TC3 is perfectly clear about that:
In the original post, prev has a constant address within the loop, but its value is indeterminate when evaluated. Therefore, the code exhibits undefined behavior.
I think this makes all of the answers in the original post (1-4) incorrect.
Related
This is a simple question, but I would just like to throw this out there and appreciate if anyone could validate if my understanding is correct or provide some more insight. My apologies in advance if this is a duplicate post.
Eg. In the code below:
(1) stack_overflow.c
int main() {
while (1) {
int some_int = 0;
char* some_pointer = some_function();
// ... some other code ....
}
}
(2) no_overflow.c
int main() {
int some_int;
char* some_pointer;
while (1) {
some_int = 0;
some_pointer = some_function();
// ... some other code ....
}
}
Am I correct in saying that in the 1st code snippet, this code would eventually cause a stack overflow because of the variables continually being declared inside of the infinite loop? (The infinite loop is intentional)
Whereas, in the second code snippet we would actually be reusing the same parts of memory, which is the desired outcome as it would be more efficient.
Would compiler optimisation be able to detect this and prevent the stack overflow, if so which c compilers & optimisation levels would achieve this?
This is not the case.
Each time the body of the loop is entered, two variables are created on the stack. Those variables are then destroyed when the loop hits the ending }. After the loop condition is tested and the body is reentered, a new pair of variables are created.
Am I correct in saying that in the 1st code snippet, this code would eventually cause a stack overflow because of the variables continually being declared inside of the infinite loop?
No:
6.2.4 Storage durations of objects
...
6 For such an object that does not have a variable length array type, its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time. The initial value of the object is indeterminate. If an initialization is specified for the object, it is performed each time the declaration or compound literal is reached in the execution of the block; otherwise, the value becomes indeterminate each time the declaration is reached.
7 For such an object that does have a variable length array type, its lifetime extends from the declaration of the object until execution of the program leaves the scope of the declaration.35) If the scope is entered recursively, a new instance of the object is created each time. The initial value of the object is indeterminate.
35) Leaving the innermost block containing the declaration, or jumping to a point in that block or an embedded block prior to the declaration, leaves the scope of the declaration.
C 2011 Online Draft
Each time through the loop new instances of some_int and some_pointer will be created at the beginning of the loop body and destroyed at the end of it - logically speaking, anyway. On most implementations I've used, storage for those items will be allocated once at function entry and released at function exit. However, that's an implementation detail, and while it's common you shouldn't rely on it being true everywhere.
If some_function dynamically allocates memory or some other resource that doesn't get freed before the end of the loop you could exhaust your dynamic memory pool or whatever, but it shouldn't cause a stack overflow as such.
normally no but it's not the best practice.
however, if some_function() is allocating a variable with every call, and it's not being freed in the same loop, you will lose the location of your allocated variable and have memory errors.
please share the rest of the code for a clearer answer.
If I define a variable anywhere in a function (not at the beginning), when the program is compiled and executed to this function, will space be allocated to the defined variable first or will it be allocated when it runs to the defined statement?
If it runs in order, Will it reduce some overhead when problems arise?
like so:
if(!(Size && Packs))
{
ret = false;
return SendAck(ret);
}
uint8_t *pSrc = (uint8_t *)pRcv->data;
uint8_t crc = 0;
The Standard C Computing Model
The C standard specifies memory reservation requirements in terms of storage duration and lifetime. Objects declared inside functions without extern, static, or _Thread_local have automatic storage duration. (Other storage durations, not discussed here, are static, thread, allocated, and temporary.) This includes parameters of functions, since they are declared inside the function declaration.
Each declaration inside a function has an associated block. Blocks are groups of statements (sometimes just a single statement). A compound statement bracketed with { and } is a block, each selection statement (if, switch) and each loop statement (for, while, do) is a block, and each substatement of those statements is a block. The block associated with a declaration inside a function is the innermost block it is in. For a function parameter, its associated block is the compound statement that defines the function.
For an automatic object that is not a variable length array, its lifetime starts when execution enters the block it is in, and it ends when execution of the block ends. (Calling a function suspends execution of the block b ut does not end it.) So in:
{
Label:
foo();
int A;
}
A exists as soon as execution reaches Label, because execution of A’s block has started.
This means that, as soon as the block is entered, all automatic objects in it other than variable length arrays should have memory reserved for them.
This fact is generally of little use, as there is no way to refer to A at Label. However, if we do this:
{
int i = 0;
int *p;
Label:
if (0 < i)
*p += foo();
int A = 0;
p = &A;
if (++i < 3)
goto Label;
bar(A);
}
then we can use A at Label after the first iteration because p points to it. We could imagine motivation for code like this could arise in a loop that needs to treat its first iteration specially. However, I have never seen it used in practice.
For an automatic object that is a variable length array, its lifetime starts when execution reaches its declaration and ends when execution leaves the scope of the declaration. So with this code:
int N = baz();
{
int i = 0;
Label:
foo();
int A[N];
if (++i < 3)
goto Label;
}
A does not exist at Label. Its lifetime begins each time execution reaches the declaration int A[N]; and ends, in the first few iterations, when the goto Label; transfers execution out of the scope of the declaration or, in the last iteration, when execution of the block ends.
Practical Implementation
In general-purpose C implementations, automatic objects are implemented with a hardware stack. A region of memory is set aside to be used as a stack, and a particular processor register, call the stack pointer, keeps track of how much is in use, by recording the address of the current “top” of stack. (For historic reasons, stacks usually start at high addresses and grow toward lower addresses, so the logical top of a stack is at the bottom of its used addresses.)
Because the lifetimes of automatic objects other than variable length arrays start when execution of their associate blocks begins, a compiler could implement this by adjusting the stack pointer whenever entering or ending a block. This can provide memory efficiency in at least two ways. Consider this code:
if (foo(0))
{
int A[100];
bar(A, x, 0);
}
if (foo(1))
{
int B[100];
bar(B, x, 1);
}
if (foo(2))
{
int C[1000];
bar(C, x, 2);
}
Because A and B do not exist at the same time, the compiler does not have to reserve memory for both of them when the function starts. It can adjust the stack pointer when each block is entered and ended. And for the large array C, the space might never be reserved at all; a compiler could choose to allocate space for 1000 int only if the block is actually entered.
However, I do not think GCC and Clang are taking advantage of this. In practice, I think they generally figure out the maximum space will be needed at any one time in the function and allocate that much space on the stack and use it through the function. This does include optimizations like using the same space for A and B, since they are never in use at the same time, but it does not include optimizating for the possibility that C is never used. However, I could be wrong; I have not checked on this compiler behavior lately.
In the cases of variable length arrays, the compiler generally cannot plan the memory use in advance, since it does not know the array size. So, for a variable length array, space has to be reserved for it on the stack when its declaration is reached.
Additionally, note that the compiler does not have to implement the computing model the C standard uses literally. It can make any optimizations that get the same observable behavior. This means that, for example, if it can tell part of an array is not used, it does not have to allocate memory for that at all. This means the answer to your question, “… will space be allocated to the defined variable first or will it be allocated when it runs to the defined statement?”, is that a compiler designer may choose either method, as long as the observable behavior of the resulting program matches what the C standard specifies.
The observable behavior includes data written to files, input/output interactions, and accesses to volatile objects.
I'm aware of the fact that you should never return the address of a local variable from a function. But while demonstrating the fact I faced a problem. Consider the following program:
int *test()
{
int x = 12;
int *p = &x;
return p;
}
int main()
{
int *p = test();
printf("%p",p);
return 0;
}
It prints an address like 0061fed0 as expected. But if I return the address of x directly using & i.e. if the test function is changed as follows:
int *test()
{
int x = 12;
return &x;
}
then the output becomes 00000000. So, can you please explain what is happening here? I'm using gcc compiler that comes bundled with Code:: Blocks 17.12 IDE in windows 10.
Regarding Duplicate Questions:
Suggested duplicate question:
C - GCC generates wrong instructions when returning local stack address
Explains the behaviour of using the address of operator directly but doesn't address the scenario where a pointer variable is used to return the address of a local variable, which is explained here in StoryTeller's answer: "The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime".
Strictly speaking, from the point of view of C language specification, that's a valid outcome.
6.2.4 Storage durations of objects (emphasis mine)
2 The lifetime of an object is the portion of program execution
during which storage is guaranteed to be reserved for it. An object
exists, has a constant address, and retains its last-stored value
throughout its lifetime. If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes
indeterminate when the object it points to (or just past) reaches the
end of its lifetime.
Therefore the value that function returns is indeterminate in either case. You cannot predict what it will be, or even use it in a meaningful way. A program that uses an indeterminate value has undefined behavior, anything can occur.
So what your compiler does is return null when you return the address of a local directly. It's a valid value according to the C language itself, since the object is dead soon anyway. But it has the benefit of likely crashing your program early in a modern hosted implementation, and allowing you to fix a bug.
I have code similar to the following in our product. According to me, the output is '0 1 2 3'. But the output of the similar code is '1 1 1 1'.
for(i = 0 ;i < 5;i++){
int j;
if(i)
printf("%d ",j);
j = i;
}
My understanding is that the j is allocated on the stack only once during the entire period of 'for' loop and the same value is used during iterations. Also, if I move the declaration of j outside for loop, I'm getting the expected result. What am I missing here?
PS - When I run the same code on my personal machine, I am getting the expected output. But on production it is different.
First, to clear things about the storage duration of an automatic local variable, let me quote the C11 standard, chapter §6.2.4, (emphasis mine)
An object whose identifier is declared with no linkage and without the storage-class
specifier static has automatic storage duration, [...]
and,
For such an object that does not have a variable length array type, its lifetime extends
from entry into the block with which it is associated until execution of that block ends in
any way. (Entering an enclosed block or calling a function suspends, but does not end,
execution of the current block.) If the block is entered recursively, a new instance of the
object is created each time. The initial value of the object is indeterminate.
So, in your code, each iteration gets a new instance of j. Nothing is retained.
In your code,
int j; //not initialized
if(i)
printf("%d ",j); //this one here
you're trying to use an unitialized automatic local variable j, which has indeterminate value. It invokes undefined behavior.
As per C11, chapter §6.7.9
If an object that has automatic storage duration is not initialized explicitly, its value is
indeterminate
and related, for UB, annex §J.2
The value of an object with automatic storage duration is used while it is
indeterminate.
Once your code hits UB, the output cannot be justified, anyway.
OTOH, when you declare j outside the loop, it has function scope. Then, unlike above case, there will be only one instance of j for all iterations of the loop.
As per the execution flow, first time, i being 0, if will evaluate to false, printf() will be skipped and j will get initialized. Then, in next iteration, when you hit the printf(), j is initialized and it's all well thereafter.
For some clarity, I think the for loop would be converted under the hood to something like:
i = 0;
LoopStart:
if(!(i<5)){ goto LoopEnd;}
{
int j;
if(i)
printf("%d ",j);
j = i;
}
i++;
goto LoopStart;
LoopEnd:
Actual implementations would differ, but this serves to highlight this point:
The block is entered and exited for each iteration of the loop, meaning each auto in the block is created and destroyed 5 times in this example.
as others mentioned, this means you are using an uninitialized j each time in your printf.
As for why the code might work on some platform / compilers. its probably because j is allocated the same stack address each time, and the compiler doesn't clear the stack when it creates or destroys j, so it just so happens the last value assigned to old, dead j, is accessible through the new, uninitialized one.
As we know, local variables have local scope and lifetime. Consider the following code:
int* abc()
{
int m;
return(&m);
}
void main()
{
int* p=abc();
*p=32;
}
This gives me a warning that a function returns the address of a local variable.
I see this as justification:
Local veriable m is deallocated once abc() completes. So we are dereferencing an invalid memory location in the main function.
However, consider the following code:
int* abc()
{
int m;
return(&m);
int p=9;
}
void main()
{
int* p=abc();
*p=32;
}
Here I am getting the same warning. But I guess that m will still retain its lifetime when returning. What is happening? Please explain the error. Is my justification wrong?
First, notice that int p=9; will never be reached, so your two versions are functionally identical. The program will allocate memory for m and return the address of that memory; any code below the return statement is unreacheable.
Second, the local variable m is not actually de-allocated after the function returns. Rather, the program considers the memory free space. That space might be used for another purpose, or it might stay unused and forever hold its old value. Because you have no guarantee about what happens to the memory once the abc() function exits, you should not attempt to access or modify it in any way.
As soon as return keyword is encountered, control passes back to the caller and the called function goes out of scope. Hence, all local variables are popped off the stack. So the last statement in your second example is inconsequential and the warning is justified
Logically, m no longer exists when you return from the function, and any reference to it is invalid once the function exits.
Physically, the picture is a bit more complicated. The memory cells that m occupied are certainly still there, and if you access those cells before anything else has a chance to write to them, they'll contain the value that was written to them in the function, so under the right circumstances it's possible for you to read what was stored in m through p after abc has returned. Do not rely on this behavior being repeatable; it is a coding error.
From the language standard (C99):
6.2.4 Storage durations of objects
...
2 The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address,25) and retains
its last-stored value throughout its lifetime.26) If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to reaches the end of its lifetime.
25) The term ‘‘constant address’’ means that two pointers to the object constructed at possibly different
times will compare equal. The address may be different during two different executions of the same
program.
26) In the case of a volatile object, the last store need not be explicit in the program.
Emphasis mine. Basically, you're doing something that the language definition explicitly calls out as undefined behavior, meaning the compiler is free to handle that situation any way it wants to. It can issue a diagnostic (which your compiler is doing), it can translate the code without issuing a diagnostic, it can halt translation at that point, etc.
The only way you can make m still valid memory (keeping the maximum resemblance with your code) when you exit the function, is to prepend it with the static keyword
int* abc()
{
static int m;
m = 42;
return &m;
}
Anything after a return is a "dead branch" that won't be ever executed.
int m should be locally visible. You should create it as int* m and return it directly.