I'm trying to learn c programming and can't understand how stacks work.
Everywhere I read I find that when a function is called stack frame is created in the stack which contains all the data for the function call- parameters, return address and local variables. And the stack frame is removed releasing the memory when the function returns.
But what if we had a compound statement inside the function which have its own variables. Is the memory for the local variables for block is also allocated inside the stack frame when the function call and released when it returns.
Example
int main(){
int a = 10;
if(int a<50){
int b=9;
}
else{
int c=10;
}
}
Is the memory for b and c is allocated with a when the function starts executing?
And deallocated when the function returns?
If so than there is no difference other than the visibility of the variable when declaring it in the beginning of the function or inside a another block in the function.
Please explain.
The C standard doesn't specify how such things are to be implemented. The C standard doesn't even mention a stack! A stack is a common way of implementing function calls but nothing in the standard requires a stack. All such things are implementation specific details. For the posted code, the standard only specifies when the variables are in scope.
So there is no general answer to your question. The answer depends on your specific system, i.e. processor, compiler, etc.
Provided that your system uses a stack (which is likely), the compiler may reserve stack space for all 3 variables or it may reserve space for 2 variables, i.e. one for awhile b and c share the other. Both implementations will be legal. The compiler is even allowed to place the variables directly in some registers so that nothing needs to be reserved on the stack.
You can check your specific system by looking at the generated assembly code.
A C implementation may implement this in multiple ways. Let’s suppose your example objects, a, b, and c, are actually used in your code in some way that results in the compiler actually allocating memory for them and not optimizing them away. Then:
The compiler could allocate stack space (by decreasing the top-of-stack pointer) for all of a, b, and c when the function starts, and release it when the function ends.
The compiler could allocate stack space for a when the function starts, then allocate space (again by decreasing the stack pointer) in the middle of the function when space for b or c is needed, then release that stack space as each block ends.
In a good modern compiler, the compiler is likely to analyze all the active lifetimes of the objects and find a somewhat optimal solution for using stack space in overlapping ways. By “active lifetime”, I mean the time from when the value of an object is set to the last time that value is needed (not the C standard’s definition of “lifetime”). For example, in int a = f(x); … g(a); h(y); a = f(y); … g(a);, there are actually two lifetimes for a, from its initial assignment to the first g(a) and from the assignment a = f(y); to the second g(a);. If the compiler needs memory to store a, it might use different memory for these two lifetimes.
Because of the above, what memory is used for which C object can get quite complicated. A particular memory location might be used for a at one time and for b at another. It may depend on loops and goto statements in your code. It also depends on whether the address of an object is taken—if the address is taken, the compiler may have to keep the object in one place, so that the address is consistent. (It might be able to get away without doing that, depending on how it can see the address is used.)
Basically, the compiler is free to use the stack, other memory, and registers in whatever way it chooses as long as the observable behavior of your program remains as it is defined by the C standard.
(The observable behavior is the input/output interactions of your program, the data written to files, and the accesses to volatile objects.)
Your example as stated is not valid since you have no brackets in the if-else statement. However, in the example below all variables are typically allocated when the function is entered:
int main(void)
{
int a = 10;
if (a < 50) {
int b = 9;
} else {
int c = 10;
}
}
As mentioned by user "500 - Internal Server Error", this is an implementation issue.
Related
If I have a for loop:
for (uint8_t x = 0; x < 100; ++x) {
char f[2000] = {0,};
}
what actually is going on here? Is this reusing the same memory address every time so it only actually allocates it once, or is it using a new memory location each loop? Does it depend on optimization levels? What about the implementation?
There is no guarantee that f is allocated in the same memory address every time around the loop, but it should be very hard to find a compiler that doesn't do it that way. I have seen f get trashed in the loop iteration, so yes it really does go out of scope at the closing brace.
A modern compiler will almost always gather up all the variable declarations within a function, analyze for overlapping runtime scope (not compile time--It's smart enough now to figure out when a variable isn't used any further down in the function most of the time), and allocate all the fixed* memory on the stack at function entry time. This runtime call is still much cheaper than malloc().
Stack allocation is done with a single instruction for all the variables: sub rbp, constant. Freeing the memory is done the same way: add rbp, constant. This buffer is almost half the size (4KB) where the compiler has to emit a call to the runtime to verify the stack has enough room. On Windows, this function is called _chkstk.
*Flexible arrays are allocated with a stretchy stack element near their first use.
When one iteration of the loop ends and a new one begins, the lifetime of the array f ends and the lifetime of a new array f starts. As far as the C standard is concerned, each is distinct from the other.
In practice, the compiler may use the same memory address for each iteration of f or it might not. This is not something that can be depended on. Certain optimizations may take advantage of the fact that the lifetime of f ends at the end of the loop body and do things that appear strange in non-compliant code.
In short, stick with what the C standard says is valid and don't make any assumptions about f.
In dynamic memory allocation, the memory which is used by malloc() and calloc() can be freed by using free().
In static memory allocation, when is the variable in main() freed from the program?
In a tutorial, I learned that when the whole programs is finished then after all variables are freed from RAM.
But someone tells me that when the program is long enough and variable is used early and after that if the variable has no use in the whole code then compiler will automatically free the variable before the end of program.
Can someone please clarify if both statements are correct or not?
The language guarantees that the lifetime of a static storage duration variable is the whole program. So it can be safely accessed at any time.
That being said, the standard only requires the code produced by a conformant compiler to behave as if all language rules were observed. That means that for optimization purposes a compiler is free to release the memory used by a static variable if it knows that the variable will not be used passed a point. Said differently it is neither required nor forbidden and as a programmer you should not even worry for that except for very specific low level optimization questions.
Example:
...
int main() {
static int arr[10240];
// uses arr
..
// no uses of arr passed this point - called B
...
}
The program shall behave as is the array arr existed from the beginning of the program until its end. But as long as arr is not used past point B, the compiler may reuse its memory, because there will be no change in any observable state.
This in only a low level optimization point allowed (but not required) by the as if rule.
As you can see in this question: Why does C not require a garbage collector?
Stephen C, the author of the answer says that:
The culture of these languages (C and C++) is to leave storage
management to the programmer.
Would the correct answer be when the process is terminated all memory used is freed? I think yes. C compiler does not look for garbage or non reachable variables, this is a progammer work.
But I have read that C or C++ garbage collectors exists like Java, they can be useful but remember, the implementation will be slower.
Again, I recommend to you read the question I have attached at the beggining for more information.
In a tutorial, I learned that ...
This tutorial is talking about what you as a programmer can see (unless you are debugging your program):
If you write a program, you can rely on the fact that you can read a static variable until the program has finished.
But someone tells me ...
This person is talking about what is really happening in the background (not visible to the programmer unless you are debugging) when you use an "optimizing" compiler.
... when is the variable ... freed from the program?
We have to distinguish between three types of variables:
Local variables
Local variables reside on the stack. When the function returns, all memory allocated on the stack during the execution of the function is automatically freed.
For this reason, local variables are freed when the function returns or even earlier.
In the following function:
void myFunction(void)
{
int a, b;
a = func1(); /* Line "1" */
func2();
func3(a); /* Line "2" */
b = func4(); /* Line "3" */
func5();
func6(b); /* Line "4" */
}
... an "optimizing" C compiler will detect that the variable a is not needed any longer when the variable b is set. For this reason, it may allocate only enough memory for one int variable. This is as if you only defined one variable (a_and_b) and the compiler replaced both a and b by a_and_b:
int a_and_b;
a_and_b = func1();
...
func6(a_and_b);
If you debug the program, the debugger will tell you that variable b does not exist when debugging lines "1"-"2"; and it will tell you that variable a does not exist when debugging lines "3"-"4".
For this reason, you might say that variable a is "freed" after line "2".
Global variables
Global variables reside in the .data or in the .bss section.
On modern desktop computers (on microcontrollers and on MS-DOS computers it is different) the operating system allocates this memory before the program is started and the operating system frees this memory after the program has finished.
Theoretically, it might be possible to optimize global variables the same way as local variables; this means: To use the same memory for two different global variables if one variable is set the first time after the other one is no longer needed.
However, this would be very complicated because a global variable can be accessed from any C source file in the project. For this reason, I doubt that many compilers and linkers have such a feature.
Static variables
Variables marked with the keyword static are normally also stored in .data and .bss sections - just like global ones.
However, because they can only be accessed from one C source file (or even from only one function), it is much easier to detect that such a variable is no longer used at a certain point in time.
For this reason, a compiler may optimize a static variable the same way as a local variable (so the memory is shared between two variables) or even replace a static variable by a local one (on the stack).
One Example:
int someFunction()
{
static int a;
int b;
a = func7();
b = func8(a); /* Line "5" */
return b + func9(); /* Line "6" */
}
In this case, the program behaves the same way if the variable a is not static. For this reason, we can replace static int by just int.
Now we see that a is no longer read after writing to b. We can replace the two variables a and b by one variable a_and_b.
If the "optimizing" compiler does this, you will see some message "variable does not exist" in the debugger if you stop your program in line "6" and want to read the value of a.
You may say that the variable a has been "freed" in line "5".
In other words, will compilers allocate enough space in the program stack to store all variables at the deepest level of block nesting in the current function or do they look at liveness and the scope of variables too?
void zoo(int num) {
if (num) {
int a = foo();
bar(a);
} else {
int b = foo();
bar(b);
}
}
For example the above code will be assigned different offsets on the stack for a and b, even though, if they were assigned only one offset (e.g. rbp - 8) it would have been legal too. My question is that will compilers like gcc and clang ever output assembly where multiple variables are assigned the same static offset?
Is there anything in the specifications about this?
I want to know if there is a unique mapping between source variables and the stack offsets present in a compiled assembly file.
There is, in general, no unique mapping between objects with automatic storage duration (“local” objects defined inside a function or block) and stack offsets. I have seen compiler-generated code reuse the same stack location for different objects, either because the use of one did not overlap the use of the other in the C code or because the compiler had moved one into a register for whatever purposes and no longer needed to use the stack location for it.
The C and C++ standards do not require implementations to implement their stack allocation in any particular way. They are free to reuse stack locations. They are also free to allocate all the stack space that might be needed1 or to wait to see if particular blocks are entered or not before further allocating stack space for the objects inside those blocks.
Note
1 Implementations that support variable-length arrays generally must wait until the size of the array can be determined before allocating space for it.
Think of the situation as follow.
int main(void)
{
...
int a=10; //a get into the stack.
int b=20; //b get into the stack.
int c=30; //c get into the stack.
...
}
As we know, the “stack segment” is satisfied with the storage approach of "stack data structure"; and here, the local variables a, b, and c are exactly stored in such a direction of memory, so in theory we can only access to the element at the top of the stack.
But what if we do something like this?
printf("b = %d",b);
Local variable b is in the middle of a and c, but we can get it.
So...can we say that we can directly get the element in the middle of the stack?
Here is the image of " a, b, and c stored in stack "
the local variables a, b, and c are exactly stored in such a direction of memory
I don't know from where you got this but this is not true, at least in modern compilers.
First of all, C itself doesn't specify anything about using a stack. How the function calls are implemented is implementation defined. Lots of common implementations use stack like data structure to implement function call in the sense that last called function will be returned first.
But this doesn't mean that the local variables are stored in stack like structure. There are lots of options to the compiler like:
It can eliminate the variable completely if that is not needed in run-time.
It can place the variables in register.
It can re-order variables.
In all of these cases the only thing that compiler guarantees that the observable behavior of the code is not changed.
Since it doesn't store variables in stack like data structure, it has no problem to access them in middle.
First of all C standard never utter stack` anywhere. Now moving from standard - implementors may implement such a way that local variables are stored in stack memory as part of function frame.
Now you are thinking - stack is always accessed on top but here we can access the variable b directly. Though it is in the middle somewhere. Nope you are wrong - the thing is the frame of the functions are which are stored in stack and popped off when done(though an implementation can do it other way also - but speaking in general). It is not the variables that are the unit of operations here.
By accessing b we are not violating any rule of stack data structure. The function frames are the ones which are accessed in LIFO manner not the variables inside those frames.
Also it's a bit out of context now a days to segregate like that. We can simply say that they have automatic storage duration. And that's it. They can be implemented in group of registers also(standard won't stop them). The function frames are the one which will likely to be have that stack data-structures behavior.
Usage of the stack to store automatic variables is only an implementation detail, nothing is required by the standard. But at a lower level it is indeed the most common implementation. It is because in processors, a special register (stack pointer) is used to store return addresses in function calls (instructions call and return). When automatic variables are also stored in that stack, it is trivial to reclaim there storage back at return time or at end of the block. But they are not individually pushed onto a stack: a frame pointer is used to store the memory zone for the current block (including a reference for upper frames) and the stack pointer is increased in one single operation for the size of the frame containing all the local variables. Then those variables are known by the offset to the current frame pointer. So they are known by their own address and not as elements of a stack.
If I type int x is it using sizeof(int) bytes of memory now? Is it not until x has a value?
What if x = b + 6...is x given a spot in memory before b is?
Yes, as soon as you declare a variable like:
int x;
memory is, generally, allocated on the stack. That being said, the compiler is very smart. If it notices you never use that variable, it may optimize it away.
If I type int x is it using sizeof(int) bytes of memory now? Is it not until x has a value?
Once you declare a variable like int x; it will be taking up space in memory (4 bytes in the case of an int). Giving it a value like x = 5 will just modify the memory that is already being taken up.
What if x = b + 6...is x given a spot in memory before b is?
For this statement to be valid, both x and b must have been declared before this statement. As for which one was allocated in memory first, that depends on what you did before this statement.
Example:
int x = 5;
int b = 6;
x = b + 6; //your code
In this case, x was allocated in memory before b.
Actually when you make a function call, the space required for all locally declared variables are already allocated.
When you compile your C function and convert it to assembly compiler adds procedure prolog which actually relocates the stack pointer for opening space for function parameters, return value, local variables and couple of more values to manage function call.
The order of allocation of local variables is compiler dependent and doesn't necessarily have to be in the order of declaration or usage.
When you use a variable before assigning any value. CPU just uses what was in the already allocated memory. It may be 0, it may be some garbage value or it may even be a value that your previous function call left there. Totally depends on your programs execution, operating system and compiler.
So it is one of the best exercises that always initialize what you have declared as soon as possible. Because you may use the variable by mistake before assigning a value. And if it contains the correct value that you intended to assign (lets say 0, which is more probable) than it will work for some time. But later all of a sudden you program may change the behavior that you didn't expect though it was working perfectly before. And debugging might be a pain because of your assumptions.
Depending on where the variable is defined it may be assigned space
for global and static variables at compile time;
for local variables at run time when the function is entered (not when the definition is encounterd -- #Faruxx was pointing that out);
for objects (not variables) which are allocated dynamically via malloc obviously at run time when malloc is executed. Malloc will typically request a lot more memory from the system (a page? 4k?) for the program's address space and slice it up into byte sized bits (cough) during subsequent calls.
A declaration like extern int size; will not allocate or occupy any memory but refers to a memory location which will be resolved only by the linker (if the actual definition is in another translation unit) to some memory reserved at compile time there.
Peformance:
Global variables will in principle impact startup performance because global memory is zero initialized. This is obviously a one time penalty and negligible except for extreme cases. A larger executable will also need longer to start.
Variables on the stack are completely performance penalty free. They are not initialized (exactly for these performance reasons) and the assembler code for incrementing the stack pointer couldn't care less about the increment. The maximum stack size is fixed at startup to a few kB or MB (and the program will typically crash with, you guess it, when it tries to increase the stack beyond its limit); so there is no potentially expensive interaction with the operating system when the stack grows.
Allocations on the heap carry a comparatively large performance penalty because they always involve a function call (malloc) which then actually needs to do some work, plus a potential operating system call (put a memory segment into the program's address space), plus the need to pair each malloc with a delete in long-running programs which must not leak. (When I say "comparatively large" I mean "compared to local variables"; on your average PC you don't need to think about it except in the most inner loop.)