Why do local variables use Stack in C/C++?
Technically, C does not use a stack. If you look at the C99 standard, you'll find no reference to the stack. It's probably the same for the C++ standard, although I haven't checked it.
Stacks are just implementation details used by most compilers to implement the C automatic storage semantics.
The question you're actually asking is, "why do C and C++ compilers use the hardware stack to store variables with auto extent?"
As others have mentioned, neither the C nor C++ language definitions explicitly say that variables must be stored on a stack. They simply define the behavior of variables with different storage durations:
6.2.4 Storage durations of objects
1 An object has a storage duration that determines its lifetime. There are three storage
durations: static, automatic, and allocated. Allocated storage is described in 7.20.3.
2 The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address,25) and retains
its last-stored value throughout its lifetime.26) If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to reaches the end of its lifetime.
3 An object whose identifier is declared with external or internal linkage, or with the
storage-class specifier static has static storage duration. Its lifetime is the entire
execution of the program and its stored value is initialized only once, prior to program
startup.
4 An object whose identifier is declared with no linkage and without the storage-class
specifier static has automatic storage duration.
5 For such an object that does not have a variable length array type, its lifetime extends
from entry into the block with which it is associated until execution of that block ends in
any way. (Entering an enclosed block or calling a function suspends, but does not end,
execution of the current block.) If the block is entered recursively, a new instance of the
object is created each time. The initial value of the object is indeterminate. If an
initialization is specified for the object, it is performed each time the declaration is
reached in the execution of the block; otherwise, the value becomes indeterminate each
time the declaration is reached.
C language standard, draft n1256.
No doubt that paragraph 5 was written with hardware stacks in mind, but there are oddball architectures out there that don't use a hardware stack, at least not in the same way as something like x86. The hardware stack simply makes the behavior specified in paragraph 5 easy to implement.
Local data storage – A subroutine frequently needs memory space for storing the values of local variables, the variables that are known only within the active subroutine and do not retain values after it returns. It is often convenient to allocate space for this use by simply moving the top of the stack by enough to provide the space. This is very fast compared to heap allocation. Note that each separate activation of a subroutine gets its own separate space in the stack for locals.
Stack allocation is much faster since all it really does is move the stackpointer. Using memory pools you can get comparable performance out of heap allocation but that comes with a slight added complexity and its own headaches.
In Heaps there is another layer of indirection since you will have to go from
stack -> heap before you get the correct object. Also the stack is local for
each thread and is inherintly thread safe, where as the heap is free-for-all
memory
It depends on the implementation where variables are stored.
Some computers might not even have a "stack" :D
Other than that, it is usual to do some house keeping when calling functions for keeping track of the return address and maybe a few other things. Instead of creating another house keeping method for local variables, many compiler implementations choose to use the already existing method, which implements the stack, with only minimal changes.
Local variables are local to frames in the call stack.
Using a stack allows recursion.
Because stack is part of the memory that will be automatically discarged when the scope ends. This is the reason for calling sometimes local variables as "automatic". Local variable in a call are "insulated" from recursive or multithreaded calls to the same function.
Local variables are limited to the scope in which they can be accessed.
Using a stack enables jump of control from one scope to other and on returning, to continue with the local variables present initially.
When there is jump, the local variables are pushed and the jump is executed. On returning back to the scope, the local variables are popped out.
Related
I noticed something that I'd like an answer to regarding pointers and variables, it's not really a problem but I'd like to understand the details. To put it simply, to bypass using global variables, I've been starting to have functions return pointers to the data inside the function. And sometimes, that data needs to be static, other times it doesn't.
For example, let's say I have a function that creates a 2d char array, populates it with 'a' and returns a pointer pointing to that array. When the caller tries to use that pointer to access and modify the memory where the 2d array was, random data is there instead of the 2d array's contents. I found out that declaring the 2d array as static fixes this, because the array's contents are saved outside of its scope.
Now on the other hand, let's say I have a function that declares a struct, and initializes it with values. The function then returns a pointer to that struct variable. The caller should have nonsense data when trying to access the struct's values through the pointer but interestingly, it doesn't. I would think that like in the case of the 2d array, after the function call, since the struct is not static, the data at that memory should be freed. However, it's not like that, instead I can access all the struct's elements through the pointer despite it not being static.
Overall, after a function is finished, the caller can access and modify the data of variables created in that function as long as that data was static (in the 2d array case) but sometimes the data of variables is retained outside of scope despite not being labeled static (struct case). Why?
Your observations about data in a two-dimensional array of char or a struct appearing to persist or not after a function returns are the results of happenstance, not behavior defined by the C standard, and you may not rely on them. They were merely artifacts of how your C compiler behaved in particular circumstances. They may change in other circumstances and may not be relied on.
Whenever you provide an object to be used after a function returns (as by returning a pointer to the object), it must not have automatic storage duration. It may have static storage duration, allocated storage duration, or thread storage duration.
First, let us clarify some terminology. This is important for understand the concepts. The C standard does not use the term “global variables”. It generally does not use “variable” to describe object and does not use “global” for them at all. What you think of as a variable is, in the C standard, an identifier and an object. The identifier is the name of the object, and the object is a region of data storage that can represent values.
Whether the C standard guarantees an object can be used at a certain time depends on the object’s lifetime. Lifetime is when during program execution the object exists in the C model of computing. Lifetime is determined in part by its storage duration. The storage duration depends on how and where the object was defined or created.
Lifetime is also affected by the scope of an object’s identifier. Scope is where in the source code an identifier is visible (can be used). There are relationships between scope and lifetime, but they are distinctly different things.
There are four storage durations (and a special temporary storage duration, which I will not discuss here):
If an identifier is declared with _Thread_local, its object has thread storage duration. Its lifetime starts when the thread is created and ends when execution of the thread ends. So it can be used after a function returns as long as its thread is still executing.
Otherwise, if an identifier is declared with static or with extern or outside of any block or list of function parameters (effectively outside of any function), its object has static storage duration. It exists for the entire execution of the program, so it can be used any time during execution.
Otherwise, for any identifier for an object (rather than a type definition, function, or such), its object has automatic storage duration. It is associated with the statement block it is declared in. (A block is a sequence of statements inside braces, { ... }. This can be the main block that defines a function or a block nested within it.) Its lifetime ends when execution of the associated block ends. The C standard makes no guarantee about what happens when you attempt to use an automatic object after execution of its block ends.
Note that when a function calls a subroutine, execution of the function, including the blocks within it, is suspended temporarily, but it is not ended. (Execution ends when the function returns, or special routines like abort, exit, or longjmp are called.) This means that the object still exists while the subroutines are executing. This is true even though the source code in the subroutines has a different scope than the calling function.
Because of the above, statements that say you cannot use objects outside of their scope are false. Scope is not the determining factor in whether an object may be accessed. Lifetime is.
For the fourth storage duration:
Objects with allocated storage duration are created by malloc, calloc, realloc, and aligned_alloc and do not have names (identifiers). An allocated objects extends from when it is allocated to when it is deallocated. So, if a function allocates an object and returns a pointer to it, that pointer may be used to access the object until the object is deallocated.
From Programming Language Pragmatics, by Scott
Object lifetimes generally correspond to one of three principal
storage allocation mechanisms, used to manage the object’s space:
Static objects are given an absolute address that is retained throughout the program’s execution.
Stack objects are allocated and deallocated in last-in, first-out order, usually in conjunction with subroutine calls and returns.
Heap objects may be allocated and deallocated at arbitrary times. They require a more general (and expensive) storage management
algorithm.
For example, in C, static objects must be initialized with constant expressions (expressions which can be evaluated at compile time).
I am not sure whether it is the case in other languages and even what other languages also have static objects.
In general, must static objects be initialized? When initialized, must they be initialized with expressions which can be evaluated at compile time?
By initialization, I mean either explicit or implicit (i.e. automatically done by language implementation), as opposed to uninitailziation.
So to rephrase my question: generally, can static objects be left uninitialized by either programs or compilers?
Thanks.
A static variable will be initialized to "zero" automatically, unless you explicitly initialize it.
Other than that and the life-time or linkage part, it's no different than any other variable, which means you can initialize it the same way you initialize any other variable.
This question already has answers here:
Returning local data from functions in C and C++ via pointer
(13 answers)
returning address of local variable [duplicate]
(3 answers)
Closed 9 years ago.
#include <stdio.h>
void function()
{
int stackVar = 10;
printf("Stack variable = %d\n", stackVar);
}
int main(void)
{
function();
return 0;
}
What happens to the stack frame of function when it returns?
This is undefined behaviour (as opposed to implementation-defined or unspecified). This means that the program is free to misbehave, or not, in any way is pleases.
This is spelled out in 6.2.4 Storage durations of objects:
1 An object has a storage duration that determines its lifetime. There are three storage
durations: static, automatic, and allocated. Allocated storage is described in 7.20.3.
2 The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address, and retains
its last-stored value throughout its lifetime. If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to reaches the end of its lifetime.
3 An object whose identifier is declared with external or internal linkage, or with the
storage-class specifier static has static storage duration. Its lifetime is the entire
execution of the program and its stored value is initialized only once, prior to program
startup.
4 An object whose identifier is declared with no linkage and without the storage-class
specifier static has automatic storage duration.
5 For such an object that does not have a variable length array type, its lifetime extends
from entry into the block with which it is associated until execution of that block ends in
any way. (Entering an enclosed block or calling a function suspends, but does not end,
execution of the current block.) If the block is entered recursively, a new instance of the
object is created each time. The initial value of the object is indeterminate. If an
initialization is specified for the object, it is performed each time the declaration is
reached in the execution of the block; otherwise, the value becomes indeterminate each
time the declaration is reached.
Firstly, you've edited the question dramatically, so other answers are (somewhat unfairly) no longer relevant. Still, to answer the current question:
What happens to the stack frame of function when it returns?
It seems to me you lack a general feel for how the stack operates. So - going a bit crazy here - but will try an analogy that might make it "click". You can imagine the stack frame as being like waves on the beach. The more deeply nested function calls get, and the more data those functions have in parameters and local variables, the more memory is in use. That's like waves reaching further up the beach. As scopes exit the memory is effectively released - the use to which that memory was put is forgotten. So too do waves recede. Still, throughout the lifetime of the program as different sequences of functions enter and exit, the same memory (level of the beach) is reused (under water) and forgotten (not under water). The bits furthest up the beach tend to be covered least often and for short durations, while some stays underwater until the weakest point of low tide... similarly things like recursive functions that aren't tail-recursion optimised can use a lot of memory briefly, but the stack variables created directly in main() stay there until program termination.
undefined behavior.
you are returning local variable address from the function, because the stack-frame is destroyed (out of scope) .
Now if memory (address )is not overwritten then you will get same value else you will get garbage .
You are invoking undefined behavior.
When you return from the function, the stack frame is destroyed (goes out of scope) and it might be you receive the value you left in the function, but this would be a coincidence.
See Wikipedia for some examples and here for an article.
Yesterday I had an interview where the interviewer asked me about the storage classes where variables are stored.
My answer war:
Local Variables are stored in Stack.
Register variables are stored in Register
Global & static variables are stored in data segment.
The memory created dynamically are stored in Heap.
The next question he asked me was: why are they getting stored in those specific memory area? Why is the Local variable not getting stored in register (though I need an auto variable getting used very frequently in my program)? Or why global or static variables are not getting stored in stack?
Then I was clueless. Please help me.
Because the storage area determines the scope and the lifetime of the variables.
You choose a storage specification depending on your requirement, i.e:
Lifetime: The duration you expect the particular variable needs to be alive and valid.
Scope: The scope(areas) where you expect the variable to be accessible.
In short, each storage area provides a different functionality and you need various functionality hence different storage areas.
The C language does not define where any variables are stored, actually. It does, however, define three storage classes: static, automatic, and dynamic.
Static variables are created during program initialization (prior to main()) and remain in existence until program termination. File-scope ('global') and static variables fall under the category. While these commonly are stored in the data segment, the C standard does not require this to be the case, and in some cases (eg, C interpreters) they may be stored in other locations, such as the heap.
Automatic variables are local variables declared in a function body. They are created when or before program flow reaches their declaration, and destroyed when they go out of scope; new instances of these variables are created for recursive function invocations. A stack is a convenient way to implement these variables, but again, it is not required. You could implement automatics in the heap as well, if you chose, and they're commonly placed in registers as well. In many cases, an automatic variable will move between the stack and heap during its lifetime.
Note that the register annotation for automatic variables is a hint - the compiler is not obligated to do anything with it, and indeed many modern compilers ignore it completely.
Finally, dynamic objects (there is no such thing as a dynamic variable in C) refer to values created explicitly using malloc, calloc or other similar allocation functions. They come into existence when explicitly created, and are destroyed when explicitly freed. A heap is a convenient place to put these - or rather, one defines a heap based on the ability to do this style of allocation. But again, the compiler implementation is free to do whatever it wants. If the compiler can perform static analysis to determine the lifetime of a dynamic object, it might be able to move it to the data segment or stack (however, few C compilers do this sort of 'escape analysis').
The key takeaway here is that the C language standard only defines how long a given value is in existence for. And a minimum bound for this lifetime at that - it may remain longer than is required. Exactly how to place this in memory is a subject in which the language and library implementation is given significant freedom.
It is actually just an implementation detail that is convenient.
The compiler could, if he wanted to, generate local variables on the heap if he wishes.
It is just easier to create them on the stack since when leaving a function you can adjust the frame pointer with a simple add/subtract depending on the growth direction of the stack and so automatically free the used space for the next function. Creating locals on the heap however would mean more house-keeping work.
Another point is local variables must not be created on the stack, they can be stored and used just in a register if the compiler thinks that's more appropriate and has enough registers to do so.
Local variables are stored in registers in most cases, because registers are pushed and poped from stack when you make function calls It looks like they are on stack.
There is actually no such tings as register variables because it is just some rarely used keyword in C that tells compiler to try to put this in registers. I think that most compilers just ignore this keyword.
That why asked you more, because he was not sure if you deeply understand topic. Fact is that register variables are virtually on stack.
in embedded systems we have different types of memories(read only non volatile(ROM), read write non volatile(EEPROM, PROM, SRAM, NVRAM, flash), volatile(RAM)) to use and also we have different requirements(cannot change and also persist after power cycling, can change and also persist after power cycling, can change any time) on data we have. we have different sections because we have to map our requirements of data to different types of available memories optimistically.
Advanced Programming in the UNIX Environment by W. Richard Stevens states:
"What are the the states of the automatic variables and register
variables in the main function?"
with regard to what happens when you longjmp back to main (or another function) from somewhere lower down the stack.
It goes on to say:
"It depends. Most implementations do not try to roll back these
automatic variables and register variables, but all that the standards
say is that their values are indeterminate. If you have an automatic
variable that you don't want rolled back, define it with the
volatile attribute. Variables that are declared global or static
are left alone when longjmp is executed.
It seems like he's saying that normal stack-variables will not have their values set back to what they were at the time of the setjmp - but then the rest of the function couldn't rely on its stack variables after the longjmp back to it which seems crazy, so I'm guessing I'm wrong.
Can someone define "automatic variables" for me and explain what specifically isn't set back to its original value and why that is?
All it's saying is that if
you have an automatic (function-local non-static) variable that's not declared volatile; and
you change the value of the variable between setjmp and longjmp
then after the longjmp the value of that variable becomes indeterminate.
I believe this has to do with the possibility of such variables residing in CPU registers rather than in RAM, and the associated difficulty of preserving the values of such variable across the longjmp.
Here is a quote from the gcc manual:
If you use longjmp, beware of automatic variables. ISO C says that
automatic variables that are not declared volatile have undefined
values after a longjmp. And this is all GCC promises to do, because
it is very difficult to restore register variables correctly, and one
of GCC's features is that it can put variables in registers without
your asking it to.
If the potential loss of variable values is a problem in your use case, declare the relevant variables as volatile.
"Automatic variables" is an old term for ordinary (not declared with register or static) local variables, which goes back to terminology used in the C standard and the original meaning of the auto keyword. See sections 6.2.4 and 6.7.1 of the standard
As for this:
but then the rest of the function couldn't rely on its stack variables after the longjmp back to it which seems crazy
The idea is that you're not supposed to modify them in the first place if you're going to longjmp because then you can't know what's going to happen.
The reason is that longjmp may restore state such as processor registers, which automatic variables may have been mapped to (there is no guarantee that they will be on "the stack" or in memory at all. And even if they do exist in memory, some operations may not [unless it is declared volatile] directly access the memory but may access a processor register which the value has already been loaded into)
Your question is kind of odd because it implies you would want them to be restored [i.e. your modifications in intervening functions to be erased] - in general this caveat is warning that they may be restored by accident when it's not expected. "Not restored" doesn't mean "unusable" [though the standard DOES declare them unusable because it might restore a cached register but not the memory so you'll get inconsistent results], it means "has the value a later function wrote to it (because you passed the address intending for it to be written to)".
Automatic variables are regular function-local variables - since they are allocated on the stack and you do not have to take care about their memory they are called automatic.
See http://en.wikipedia.org/wiki/Automatic_variable for a more in-depth descriptions.
I can't define an "automatic variable", but maybe that can help you to understand what happens during a setjump:
(some of) the CPU registers and saved into a file.
and during longjump:
the value of the CPU registers is reset to the saved value.
nothing else! (here is an example)
So during the longjump, you just come back higher in the stack, with all your variables saved in the memory untouched, and some (not all of them) of the registers, in particular the stack pointer and the instruction pointer reset to the value they had during the setjmp.
auto means anything that's local to the function and hasn't specifically been defined as static. It's probably worth noting that the standard specifies pretty much the behavior he states (§7.13.2/3):
All accessible objects have values, and all other components of the abstract machine
have state, as of the time the longjmp function was called, except that the values of
objects of automatic storage duration that are local to the function containing the
invocation of the corresponding setjmp macro that do not have volatile-qualified type
and have been changed between the setjmp invocation and longjmp call are
indeterminate.