Will variable declaration inside infinite loop in c cause stack overflow - c

This is a simple question, but I would just like to throw this out there and appreciate if anyone could validate if my understanding is correct or provide some more insight. My apologies in advance if this is a duplicate post.
Eg. In the code below:
(1) stack_overflow.c
int main() {
while (1) {
int some_int = 0;
char* some_pointer = some_function();
// ... some other code ....
}
}
(2) no_overflow.c
int main() {
int some_int;
char* some_pointer;
while (1) {
some_int = 0;
some_pointer = some_function();
// ... some other code ....
}
}
Am I correct in saying that in the 1st code snippet, this code would eventually cause a stack overflow because of the variables continually being declared inside of the infinite loop? (The infinite loop is intentional)
Whereas, in the second code snippet we would actually be reusing the same parts of memory, which is the desired outcome as it would be more efficient.
Would compiler optimisation be able to detect this and prevent the stack overflow, if so which c compilers & optimisation levels would achieve this?

This is not the case.
Each time the body of the loop is entered, two variables are created on the stack. Those variables are then destroyed when the loop hits the ending }. After the loop condition is tested and the body is reentered, a new pair of variables are created.

Am I correct in saying that in the 1st code snippet, this code would eventually cause a stack overflow because of the variables continually being declared inside of the infinite loop?
No:
6.2.4 Storage durations of objects
...
6 For such an object that does not have a variable length array type, its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time. The initial value of the object is indeterminate. If an initialization is specified for the object, it is performed each time the declaration or compound literal is reached in the execution of the block; otherwise, the value becomes indeterminate each time the declaration is reached.
7 For such an object that does have a variable length array type, its lifetime extends from the declaration of the object until execution of the program leaves the scope of the declaration.35) If the scope is entered recursively, a new instance of the object is created each time. The initial value of the object is indeterminate.
35) Leaving the innermost block containing the declaration, or jumping to a point in that block or an embedded block prior to the declaration, leaves the scope of the declaration.
C 2011 Online Draft
Each time through the loop new instances of some_int and some_pointer will be created at the beginning of the loop body and destroyed at the end of it - logically speaking, anyway. On most implementations I've used, storage for those items will be allocated once at function entry and released at function exit. However, that's an implementation detail, and while it's common you shouldn't rely on it being true everywhere.
If some_function dynamically allocates memory or some other resource that doesn't get freed before the end of the loop you could exhaust your dynamic memory pool or whatever, but it shouldn't cause a stack overflow as such.

normally no but it's not the best practice.
however, if some_function() is allocating a variable with every call, and it's not being freed in the same loop, you will lose the location of your allocated variable and have memory errors.
please share the rest of the code for a clearer answer.

Related

What is the order of execution of the statements that define variables in functions in the c language?

If I define a variable anywhere in a function (not at the beginning), when the program is compiled and executed to this function, will space be allocated to the defined variable first or will it be allocated when it runs to the defined statement?
If it runs in order, Will it reduce some overhead when problems arise?
like so:
if(!(Size && Packs))
{
ret = false;
return SendAck(ret);
}
uint8_t *pSrc = (uint8_t *)pRcv->data;
uint8_t crc = 0;
The Standard C Computing Model
The C standard specifies memory reservation requirements in terms of storage duration and lifetime. Objects declared inside functions without extern, static, or _Thread_local have automatic storage duration. (Other storage durations, not discussed here, are static, thread, allocated, and temporary.) This includes parameters of functions, since they are declared inside the function declaration.
Each declaration inside a function has an associated block. Blocks are groups of statements (sometimes just a single statement). A compound statement bracketed with { and } is a block, each selection statement (if, switch) and each loop statement (for, while, do) is a block, and each substatement of those statements is a block. The block associated with a declaration inside a function is the innermost block it is in. For a function parameter, its associated block is the compound statement that defines the function.
For an automatic object that is not a variable length array, its lifetime starts when execution enters the block it is in, and it ends when execution of the block ends. (Calling a function suspends execution of the block b ut does not end it.) So in:
{
Label:
foo();
int A;
}
A exists as soon as execution reaches Label, because execution of A’s block has started.
This means that, as soon as the block is entered, all automatic objects in it other than variable length arrays should have memory reserved for them.
This fact is generally of little use, as there is no way to refer to A at Label. However, if we do this:
{
int i = 0;
int *p;
Label:
if (0 < i)
*p += foo();
int A = 0;
p = &A;
if (++i < 3)
goto Label;
bar(A);
}
then we can use A at Label after the first iteration because p points to it. We could imagine motivation for code like this could arise in a loop that needs to treat its first iteration specially. However, I have never seen it used in practice.
For an automatic object that is a variable length array, its lifetime starts when execution reaches its declaration and ends when execution leaves the scope of the declaration. So with this code:
int N = baz();
{
int i = 0;
Label:
foo();
int A[N];
if (++i < 3)
goto Label;
}
A does not exist at Label. Its lifetime begins each time execution reaches the declaration int A[N]; and ends, in the first few iterations, when the goto Label; transfers execution out of the scope of the declaration or, in the last iteration, when execution of the block ends.
Practical Implementation
In general-purpose C implementations, automatic objects are implemented with a hardware stack. A region of memory is set aside to be used as a stack, and a particular processor register, call the stack pointer, keeps track of how much is in use, by recording the address of the current “top” of stack. (For historic reasons, stacks usually start at high addresses and grow toward lower addresses, so the logical top of a stack is at the bottom of its used addresses.)
Because the lifetimes of automatic objects other than variable length arrays start when execution of their associate blocks begins, a compiler could implement this by adjusting the stack pointer whenever entering or ending a block. This can provide memory efficiency in at least two ways. Consider this code:
if (foo(0))
{
int A[100];
bar(A, x, 0);
}
if (foo(1))
{
int B[100];
bar(B, x, 1);
}
if (foo(2))
{
int C[1000];
bar(C, x, 2);
}
Because A and B do not exist at the same time, the compiler does not have to reserve memory for both of them when the function starts. It can adjust the stack pointer when each block is entered and ended. And for the large array C, the space might never be reserved at all; a compiler could choose to allocate space for 1000 int only if the block is actually entered.
However, I do not think GCC and Clang are taking advantage of this. In practice, I think they generally figure out the maximum space will be needed at any one time in the function and allocate that much space on the stack and use it through the function. This does include optimizations like using the same space for A and B, since they are never in use at the same time, but it does not include optimizating for the possibility that C is never used. However, I could be wrong; I have not checked on this compiler behavior lately.
In the cases of variable length arrays, the compiler generally cannot plan the memory use in advance, since it does not know the array size. So, for a variable length array, space has to be reserved for it on the stack when its declaration is reached.
Additionally, note that the compiler does not have to implement the computing model the C standard uses literally. It can make any optimizations that get the same observable behavior. This means that, for example, if it can tell part of an array is not used, it does not have to allocate memory for that at all. This means the answer to your question, “… will space be allocated to the defined variable first or will it be allocated when it runs to the defined statement?”, is that a compiler designer may choose either method, as long as the observable behavior of the resulting program matches what the C standard specifies.
The observable behavior includes data written to files, input/output interactions, and accesses to volatile objects.

C - Why variables created in a loop have the same memory address?

Just a simple example of my problem:
while(condition){
int number = 0;
printf("%p", &number);
}
That variable will always be in the same memory address. Why?
And what's the real difference between declaring it inside or outside the loop then?
Would I need to malloc the variable every iteration to get different addresses?
That variable will always be in the same memory address. Why?
It's not required to, but your code is so simple that it probably will be across all platforms. Specifically, because it's stored on the stack, it's always in the same place relative to your stack pointer. Keep in mind you're not allocating memory here (no new or malloc), you're just naming existing (stack-relative) memory.
And what's the real difference between declaring it inside or outside the loop then?
In this case, scope. The variable doesn't exist outside the braces where it lives in. Also outside of the braces, another variable can take its place if it fits in memory and the compiler chooses to do this.
Would I need to malloc the variable every iteration to get different addresses?
Yes, but I have yet to see a good use of malloc to allocate space for an int that a simple stack variable or a standard collection wouldn't do better.
That variable will always be in the same memory address. Why?
The compiler decides where the variable should be, given the operating system constraints, it's much more efficient to maintain the variable at the same address than having it relocated at every iteration, but this could, theoretically, happen.
You can't rely on it being in the same address every time.
And what's the real difference between declaring it inside or outside the loop then?
The difference is lifetime of the variable, if declared within the loop it will only exist inside the loop, you can't access it after the loop ends.
When execution of the block ends the lifetime of the object ends and it can no longer be accessed.
Would I need to malloc the variable every iteration to get different addresses?
malloc is an expensive operation, it does not make much sense to malloc the variable at every iteration, that said, again, the compiler decides where the memory for it is allocated, it may very well be at the same address or not.
Once again you can't rely on the variable location in the previous iteration to assert where it will be on the next one.
There is a difference in the the variables are stored, allocated variables will be on the heap, as opposed to the stack like in the previous case.
It is being put into the same memory address to save memory.
The only real difference between declaring it within and without the loop is that the variable will no longer be within scope outside the loop if it was declared within the loop.
You would have to use malloc to get a different address each time. Also, you would have to leave the frees until after all the mallocs to get this guarantee.
That variable will always be in the same memory address. Why?
The object that number designates has auto storage duration and only exists for the lifetime of the loop body, so logically speaking a new instance is created and destroyed on each loop iteration.
Practically speaking, it's easier to just re-use the same memory location for each loop iteration, which is what most (if not all) C compilers do. It's just not guaranteed to retain its last value from one iteration to the next (especially if you initialize it each iteration).
And what's the real difference between declaring it inside or outside the loop then?
The lifetime of the object (the period of program execution where storage is guaranteed to be reserved for it) changes from the body of the loop to the body of the function. The scope of the identifier (the region of program text where the identifier is visible) changes from the body of the loop to the body of the entire function.
Again, practically speaking, most compilers will allocate stack space for auto objects that are in blocks at function entry - for example, given the code
void foo( void )
{
int bar;
while ( bar = 0; bar < 10; bar++ )
{
int bletch = 2 * bar;
...
}
}
most compilers will generate instructions to reserve stack space for both bar and bletch at function entry, rather than waiting until loop entry to reserve space for bletch. It's just easier to set the stack pointer once and get it over with. Storage is guaranteed to be reserved for bletch over the lifetime of the loop body, but there's nothing in the language definition that says you can't reserve it before then.
However, if you have a situation like this:
void foo( void )
{
int bar;
while ( bar = 0; bar < 10; bar++ )
{
if ( bar % 2 == 0 ) // bar is even
{
int bletch = 2 * bar;
...
}
else
{
int blurga = 3 * bar + 1;
...
}
}
bletch and blurga cannot exist at the same time, so the compiler may only allocate space for one additional int object, and that same space will be used for either bletch or blurga depending on the value of bar.
There are compilers that, despite you declaring the variable in the inner loop, just allocate them at the entry to the function block.
Modern compilers tend to allocate all memory for local variables in a single shot at function entry, so that only represents a single stack pointer move, against several push pop instructions to get the same result.
Despite of that, there's another issue you have not considered. The variable in the inner loop is not visible outside the loop, and the memory used by it can be used for a different variable outside. You know that the memory address is always the same... but you don't know when you are out of scope if any of the other variables you use for a different thing are given the same address by the compiler (that's perfectly legal, as your variable is automatic, and so, it ceases to exist as soon as you get out of the block (the pair of curly brackets you put around the loop)

Why do variables declared with the same name in different scopes get assigned the same memory addresses?

I know that declaring a char[] variable in a while loop is scoped, having seen this post: Redeclaring variables in C.
Going through a tutorial on creating a simple web server in C, I'm finding that I have to manually clear memory assigned to responseData in the example below, otherwise the contents of index.html are just continuously appended to the response and the response contains duplicated contents from index.html:
while (1)
{
int clientSocket = accept(serverSocket, NULL, NULL);
char httpResponse[8000] = "HTTP/1.1 200 OK\r\n\n";
FILE *htmlData = fopen("index.html", "r");
char line[100];
char responseData[8000];
while(fgets(line, 100, htmlData) != 0)
{
strcat(responseData, line);
}
strcat(httpResponse, responseData);
send(clientSocket, httpResponse, sizeof(httpResponse), 0);
close(clientSocket);
}
Correct by:
while (1)
{
...
char responseData[8000];
memset(responseData, 0, strlen(responseData));
...
}
Coming from JavaScript, this was surprising. Why would I want to declare a variable and have access to the memory contents of a variable declared in a different scope with the same name? Why wouldn't C just reset that memory behind the scenes?
Also... Why is it that variables of the same name declared in different scopes get assigned the same memory addresses?
According to this question: Variable declared interchangebly has the same pattern of memory address that ISN'T the case. However, I'm finding that this is occurring pretty reliably.
Not completely correct. You don't need to clear the whole responseData array - clearing its first byte is just enough:
responseData[0] = 0;
As Gabriel Pellegrino notes in the comment, a more idiomatic expression is
responseData[0] = '\0';
It explicitly defines a character via its code point of zero value, while the former uses an int constant zero. In both cases the right-side argument has type int which is implicitly converted (truncated) to char type for assignment. (Paragraph fixed thx to the pmg's comment.)
You could know that from the strcat documentation: the function appends its second argument string to the first one. If you need the very first chunk to get stored into the buffer, you want to append it to an empty string, so you need to ensure the string in the buffer is empty. That is, it consists of the terminating NUL character only. memset-ting the whole array is an overkill, hence a waste of time.
Additionally, using a strlen on the array is asking for troubles. You can't know what the actual contents of the memory block allocated for the array is. If it was not used yet or was overwritten with some other data since your last use, it may contain no NUL character. Then strlen will run out of the array causing Undefined Behavior. And even if it returns successfuly, it will give you the string's length bigger than the size of the array. As a result memset will run out of the array, possibly overwriting some vital data!
Use sizeof whenever you memset an array!
memset(responseData, 0, sizeof(responseData));
EDIT
In the above I tried to explain how to fix the issue with your code, but I didn't answer your questions. Here they are:
Why do variables (...) in different scopes get assigned the same memory addresses?
In regard of execution each iteration of the while(1) { ... } loop indeed creates a new scope. However, each scope terminates before the new one is created, so the compiler reserves appropriate block of memory on the stack and the loop re-uses it in every iteration. That also simplifies a compiled code: every iteration is executed by exactly the same code, which simply jumps at the end to the beginning. All instructions within the loop that access local variables use exactly the same addressing (relative to the stack) in each iteration. So, each variable in the next iteration has precisely the same location in memory as in all previous iterations.
I'm finding that I have to manually clear memory
Yes, automatic variables, allocated on the stack, are not initialized in C by default. We always need to explicitly assign an initial value before we use it – otherwise the value is undefined and may be incorrect (for example, a floating-point variable can appear not-a-number, a character array may appear not terminated, an enum variable may have a value out of the enum's definition, a pointer variable may not point at a valid, accessible location, etc.).
otherwise the contents (...) are just continuously appended
This one was answered above.
Coming from JavaScript, this was surprising
Yes, JavaScript apparently creates new variables at the new scope, hence each time you get a brand new array – and it is empty. In C you just get the same area of a previously allocated memory for an automatic variable, and it's your responsibility to initialize it.
Additionally, consider two consecutive loops:
void test()
{
int i;
for (i=0; i<5; i++) {
char buf1[10];
sprintf(buf1, "%d", i);
}
for (i=0; i<1; i++) {
char buf2[10];
printf("%s\n", buf2);
}
}
The first one prints a single-digit, character representation of five numbers into the character array, overwriting it each time - hence the last value of buf1[] (as a string) is "4".
What output do you expect from the second loop? Generally speaking, we can't know what buf2[] will contain, and printf-ing it causes UB. However we may suppose the same set of variables (namely a single 10-items character array) from both disjoint scopes will get allocated the same way in the same part of a stack. If this is the case, we'll get a digit 4 as an output from a (formally uninitialized) array.
This result depends on the compiler construction and should be considered a coincidence. Do not rely on it as this is UB!
Why wouldn't C just reset that memory behind the scenes?
Because it's not told to. The language was created to compile to effective, compact code. It does as little 'behind the scenes' as possible. Among others things it does not do is not initializing automatic variables unless it's told to. Which means you need to add an explicit initializer to a local variable declaration or add an initializing instruction (e.g. an assignment) before the first use. (This does not apply to global, module-scope variables; those are initialized to zeros by default.)
In higher-level languages some or all variables are initialized on creation, but not in C. That's its feature and we must live with it – or just not use this language.
With this line:
char responseData[8000];
You are saying to your compiler: Hey big C, give me a 8000 bytes chunk and name it responseData.
In runtime, if you don't specify, no one will ever clean or give you a "brand-new" chunk of memory. That means that the 8000 bytes chunk you get in every single execution can hold all the possible permutations of bits in this 8000 bytes. Something extraordinary that can happens, is that you're getting in every execution the same memory region and thus, the same bits in this 8000 bytes your big C gave to you in the first time. So, if you don't clean, you have the impression that you're using the same variable, but you're not! You're just using the same (never cleaned) memory region.
I'd add that it's part of the programmer's responsibilities to clean, if you need to, the memory you're allocating, in dynamic or static way.
Why would I want to declare a variable and have access to the memory contents of a variable declared in a different scope with the same name? Why wouldn't C just reset that memory behind the scenes?
Objects with auto storage duration (i.e., block-scope variables) are not automatically initialized - their initial contents are indeterminate. Remember that C is a product of the early 1970s, and errs on the side of runtime speed over convenience. The C philosophy is that the programmer is in the best position to know whether something should be initialized to a known value or not, and is smart enough to do it themselves if needed.
While you're logically creating and destroying a new instance of responseData on each loop iteration, it turns out the same memory location is being reused each time through. We like to think that space is allocated for each block-scope object as we enter the block and released as we leave it, but in practice that's (usually) not the case - space for all block-scope objects within a function is allocated on function entry, and released on function exit1.
Different objects in different scopes may map to the same memory behind the scenes. Consider something like
void bletch( void )
{
if ( some_condition )
{
int foo = some_function();
printf( "%d\n", foo );
}
else
{
int bar = some_other_function();
printf( "%d\n", bar );
}
It's impossible for both foo and bar to exist at the same time, so there's no reason to allocate separate space for both - the compiler will (usually) allocate space for one int object at function entry, and that space gets used for either foo or bar depending on which branch is taken.
So, what happens with responseData is that space for one 8000-character array is allocated on function entry, and that same space gets used for each iteration of the loop. That's why you need to clear it out on each iteration, either with a memset call or with an initializer like
char responseData[8000] = {0};
As M.M points out in a comment, this isn't true for variable-length arrays (and potentially other variably modified types) - space for those is set aside as needed, although where that space is taken from isn't specified by the language definition. For all other types, though, the usual practice is to allocate all necessary space on function entry.

The difference between the block and function scopes in C

What is the difference between the block and function scopes in C99 in terms of what happens on stack when a function / block is entered and left?
In theory, a compiler could generate code to allocate a stack frame on entry to any block that contains local variables. In such a case, there wouldn't be much difference at all.
In practice, most compilers compute the maximum size of local variables that could be used by any path through a function, then allocate that size of stack frame on entry. Variables in any block inside the function are simply different offsets from the stack pointer. Note that in such a case, two (or more) blocks may use the same addresses. For example, with source code like this:
void f(int x) {
if (x) {
long y;
}
else {
float z;
}
}
...chances are quite good that y and z will end up at the same address.
Like this:
void foo(int n) // <-- beginning of function scope
{ // <-- beginning of function body scope
int x = n;
for (;;)
{ // <-- beginning of block scope
int q = n;
x *= q;
} // <-- end of block scope
foo(x);
{ // <-- another block scope
int w = x;
}
} // <-- end of function body scope
// and of function scope
Nothing "happens" when a scope ends, but a variable only lives inside the scope where it is declared (with some arcane exceptions). It is up to the implementation to reuse the space of variables of previous, nested scopes that have ended.
The only thing an implementation is required to do when control enters either a function or a block scope is to behave as if new instances have been created of all data objects directly in that scope with "automatic storage duration." Behave as if means it can do something different as long as the program being compiled can't tell the difference (or could only tell the difference by doing something whose behavior is undefined). For instance, if a variable is declared at function scope but only used within one subblock, the compiler can collapse its live range to that subblock, and probably will, because this makes register allocation easier.
An implementation is not required to do anything when control exits a function or block scope. The lifetimes of all automatic-storage-duration objects directly in that scope end, but no program can tell that this has happened without triggering undefined behavior.
There is no requirement for a C implementation to have a stack, and a stack is not the only way to implement the above requirements. See for instance "Cheney on the M.T.A." and c2:SpaghettiStack.
C implementations that do have a stack will normally try to avoid adjusting the stack pointer in the middle of a function, for reasons too complicated to go into here. This can mean that a value with block scope survives on the stack longer than its declared lifetime, but it's still undefined behavior to access it. The compiler is allowed to recycle storage for values that are no longer in scope, but it is also allowed to recycle storage for values that are still in scope but will not be accessed anymore ("dead" in compiler jargon). Historically compilers have been much more aggressive about doing that for values in registers than for values in stack slots, but again, that's a distinction that doesn't necessarily exist on your implementation.

Ampersand bug and lifetime in c

As we know, local variables have local scope and lifetime. Consider the following code:
int* abc()
{
int m;
return(&m);
}
void main()
{
int* p=abc();
*p=32;
}
This gives me a warning that a function returns the address of a local variable.
I see this as justification:
Local veriable m is deallocated once abc() completes. So we are dereferencing an invalid memory location in the main function.
However, consider the following code:
int* abc()
{
int m;
return(&m);
int p=9;
}
void main()
{
int* p=abc();
*p=32;
}
Here I am getting the same warning. But I guess that m will still retain its lifetime when returning. What is happening? Please explain the error. Is my justification wrong?
First, notice that int p=9; will never be reached, so your two versions are functionally identical. The program will allocate memory for m and return the address of that memory; any code below the return statement is unreacheable.
Second, the local variable m is not actually de-allocated after the function returns. Rather, the program considers the memory free space. That space might be used for another purpose, or it might stay unused and forever hold its old value. Because you have no guarantee about what happens to the memory once the abc() function exits, you should not attempt to access or modify it in any way.
As soon as return keyword is encountered, control passes back to the caller and the called function goes out of scope. Hence, all local variables are popped off the stack. So the last statement in your second example is inconsequential and the warning is justified
Logically, m no longer exists when you return from the function, and any reference to it is invalid once the function exits.
Physically, the picture is a bit more complicated. The memory cells that m occupied are certainly still there, and if you access those cells before anything else has a chance to write to them, they'll contain the value that was written to them in the function, so under the right circumstances it's possible for you to read what was stored in m through p after abc has returned. Do not rely on this behavior being repeatable; it is a coding error.
From the language standard (C99):
6.2.4 Storage durations of objects
...
2 The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address,25) and retains
its last-stored value throughout its lifetime.26) If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to reaches the end of its lifetime.
25) The term ‘‘constant address’’ means that two pointers to the object constructed at possibly different
times will compare equal. The address may be different during two different executions of the same
program.
26) In the case of a volatile object, the last store need not be explicit in the program.
Emphasis mine. Basically, you're doing something that the language definition explicitly calls out as undefined behavior, meaning the compiler is free to handle that situation any way it wants to. It can issue a diagnostic (which your compiler is doing), it can translate the code without issuing a diagnostic, it can halt translation at that point, etc.
The only way you can make m still valid memory (keeping the maximum resemblance with your code) when you exit the function, is to prepend it with the static keyword
int* abc()
{
static int m;
m = 42;
return &m;
}
Anything after a return is a "dead branch" that won't be ever executed.
int m should be locally visible. You should create it as int* m and return it directly.

Resources