When/where are local arrays allocated? - c

https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-and-C.html describes automatic allocation of local variables. I understand that local variables are commonly allocated on the stack. I can imagine how an int might be allocated on the stack; just push its value. But how might an array be allocated?
For example, if you declare an array char str[10];, does that 10 bytes of space go on the stack, or is it allocated somewhere else, and only the str pointer is pushed to the stack? If the latter, where is the 10 bytes of space allocated?
Furthermore, when exactly are local variables, including arrays, allocated? I commonly see heap allocation referred to as "dynamic allocation", implying that automatic variables are not dynamically allocated. But automatic variables may be declared within flow-of-control constructs and function bodies, so the compiler can't possibly know before runtime exactly how much space will be occupied by automatic variables. So automatic variables must also be dynamically allocated, right?
Edit: I would like to emphasize the first half of this question. I am most interested in understanding when and where the space for local arrays is allocated. On the stack? Somewhere else?
Edit 2: I made a mistake when I originally included the C++ tag for this question. I meant to ask only about the C language and its implementations. I apologize for any confusion.

In the C 2018 standard, clause 6.2.4, paragraphs 6 and 7 tell us about the lifetimes of objects with automatic storage duration. Paragraph 6 covers such objects that are not variable length arrays:
… its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time.
Thus, if we have this code:
{
PointA;
int x = 3;
PointB;
}
then x exists in the C model as soon as execution reaches PointA—its block was entered, and that is when the lifetime of x begins. However, although x already exists at PointA, its value is indeterminate. The initialization only occurs when the definition is reached.
Paragraph 7 tells us about variable length arrays:
… its lifetime extends from the declaration of the object until execution of the program leaves the scope of the declaration.
So, if we have this code:
{
PointA;
int x[n]; // n is some variable.
PointB;
}
then x does not exist at PointA. Its lifetime begins when int x[n]; is reached.
Keep in mind this existence is only in terms of C’s abstract model of computing. Compilers are allowed to optimize code as long as the observable results (such as output of the program) are the same. So the actual code generated by a compiler might not create x when the block is entered. (It might not create x at all; it could be optimized away completely.)

For example, if you declare an array char str[10];, does that 10 bytes of space go on the stack, or is it allocated somewhere else, and only the str pointer is pushed to the stack? If the latter, where is the 10 bytes of space allocated?
In general, the array's storage is allocated on the stack, just like any other local variable. This is compiler and target-specific. Even on a x86_64 machine, a 4 billion byte array is probably not allocated on the stack. I'd expect one of: a compile error, a link error, a runtime error, or it works somehow. In the last alternative, it might call new[] or malloc() and leave the pointer to the array on the stack in place of the array.
Notice that the array's allocation and its pointer are the same thing, so your addition of allocated somewhere else, and only the str pointer wording might indicate confusion. The allocation occurs and the name for it are not independent data.

What you ask for is depends on the language implementation (the compiler). To answer your question, this is (a simplified overview of) what compilers usually do for compiled languages (like C/C++):
When the compiler finishes parsing a function, it keeps a symbol table of all local variables declared in this function, even those declared "syntactically" during the instruction flow of the function (like local loops variables). Later, when it needs to generate the final (assembly) code, it generates the necessary instructions to push (or just moves the stack pointer) a sufficient space for all local variables. So, local loop variables, for instance, are not allocated when the loop starts execution. Rather, they are allocated at the beginning of the execution of the function containing the loop. The compiler also adds instructions to remove this allocated stack space before returning from the function.
So, automatic variables, like your char array, is totally allocated on the stack in this (common) scenario.
[EDIT] Variable length arrays (before C99)
The discussion above was for arrays having lengths known at compile time like this:
void f () {
char n[10];
....
}
If we stay in C language terms (before C99), variable-length arrays (arrays whose lengths are not known at compile-time, but rather at runtime) are declared as a pointer like this:
void f() {
char *n;
... //array is later allocated using some kind of memory allocation construct
}
This, in fact, just declares a pointer to the array. Pointers size is known to the compiler. So, as I said above, the compiler will be able to reserve the necessary storage for the pointer on the stack (just the pointer, not the real array) regardless of what will be the size of the array at runtime. When the execution reaches the line that allocates the array (using malloc, for instance), the array storage is dynamically allocated on the heap, and its address is stored in the local automatic variable n. In languages without garbage collection, this requires freeing (deallocating) the reserved storage from the heap manually (i.e. the programmer should add an instruction to do it in the program when the array is no longer needed). This is not necessary for constant-sized array (that are allocated on the stack) because the compiler removes the stack frame before returning from the function, as I said earlier.
[EDIT2]
C99 variable length arrays cannot be declared on the stack. The compiler must add some code to the resulting machine code that handles its dynamic creation and destruction at runtime.

Related

How stack structure works with compound statements inside a function?

I'm trying to learn c programming and can't understand how stacks work.
Everywhere I read I find that when a function is called stack frame is created in the stack which contains all the data for the function call- parameters, return address and local variables. And the stack frame is removed releasing the memory when the function returns.
But what if we had a compound statement inside the function which have its own variables. Is the memory for the local variables for block is also allocated inside the stack frame when the function call and released when it returns.
Example
int main(){
int a = 10;
if(int a<50){
int b=9;
}
else{
int c=10;
}
}
Is the memory for b and c is allocated with a when the function starts executing?
And deallocated when the function returns?
If so than there is no difference other than the visibility of the variable when declaring it in the beginning of the function or inside a another block in the function.
Please explain.
The C standard doesn't specify how such things are to be implemented. The C standard doesn't even mention a stack! A stack is a common way of implementing function calls but nothing in the standard requires a stack. All such things are implementation specific details. For the posted code, the standard only specifies when the variables are in scope.
So there is no general answer to your question. The answer depends on your specific system, i.e. processor, compiler, etc.
Provided that your system uses a stack (which is likely), the compiler may reserve stack space for all 3 variables or it may reserve space for 2 variables, i.e. one for awhile b and c share the other. Both implementations will be legal. The compiler is even allowed to place the variables directly in some registers so that nothing needs to be reserved on the stack.
You can check your specific system by looking at the generated assembly code.
A C implementation may implement this in multiple ways. Let’s suppose your example objects, a, b, and c, are actually used in your code in some way that results in the compiler actually allocating memory for them and not optimizing them away. Then:
The compiler could allocate stack space (by decreasing the top-of-stack pointer) for all of a, b, and c when the function starts, and release it when the function ends.
The compiler could allocate stack space for a when the function starts, then allocate space (again by decreasing the stack pointer) in the middle of the function when space for b or c is needed, then release that stack space as each block ends.
In a good modern compiler, the compiler is likely to analyze all the active lifetimes of the objects and find a somewhat optimal solution for using stack space in overlapping ways. By “active lifetime”, I mean the time from when the value of an object is set to the last time that value is needed (not the C standard’s definition of “lifetime”). For example, in int a = f(x); … g(a); h(y); a = f(y); … g(a);, there are actually two lifetimes for a, from its initial assignment to the first g(a) and from the assignment a = f(y); to the second g(a);. If the compiler needs memory to store a, it might use different memory for these two lifetimes.
Because of the above, what memory is used for which C object can get quite complicated. A particular memory location might be used for a at one time and for b at another. It may depend on loops and goto statements in your code. It also depends on whether the address of an object is taken—if the address is taken, the compiler may have to keep the object in one place, so that the address is consistent. (It might be able to get away without doing that, depending on how it can see the address is used.)
Basically, the compiler is free to use the stack, other memory, and registers in whatever way it chooses as long as the observable behavior of your program remains as it is defined by the C standard.
(The observable behavior is the input/output interactions of your program, the data written to files, and the accesses to volatile objects.)
Your example as stated is not valid since you have no brackets in the if-else statement. However, in the example below all variables are typically allocated when the function is entered:
int main(void)
{
int a = 10;
if (a < 50) {
int b = 9;
} else {
int c = 10;
}
}
As mentioned by user "500 - Internal Server Error", this is an implementation issue.

Does the C compiler allocate memory for a variable? [duplicate]

This question already has answers here:
When is memory allocated during compilation?
(8 answers)
Closed 5 years ago.
I was wondering that in which stage, memory gets allocated to the variable.
Is it in the compilation stage or is it at the execution time?
Yes, and yes, both.
For a global variable (declared at file scope), the compiler reserves memory in the executable image. So this is compile-time.
For an automatic variable (declared in a function), the compiler adds instructions to allocate the variable on the stack. So this is run-time
int a; // file scope
int f(void)
{
int b; // function scope
...
Notes:
The compiler has one (one set of) instructions to allocate all local variables of a function in one time. Generally, there is not overhead per variable (there can be exceptions I don't discuss now). These instructions are executed every time the function is called.
The compiler does not allocate storage for your strings. This is an error beginners often make. Consider:
char *s; // a pointer to a strings
scanf("%s", s); // no, the compiler will not allocate storage for the string to read.
It depends on the kind of variable/object.
Globally and statically allocated variables are known at compile time and their offsets in the data segment are baked into the program. So in a way, they got allocated at compile time.
Variables local to function scope are allocated on the stack. You could say that the compiler knew about them and the kind of storage their needed but obviously they got allocated (in the sense of came into existence) at run time, during a function call.
Another interesting object is the heap allocated object, which can be created with malloc/calloc in C and new or related mechanisms in C++. They are allocated at run time in the heap section.
There's a third kind of memory which is allocated dynamically by using malloc() and friends.
This memory is taken from the so called heap. While automatic variables (you have in functions) is taken from the so called stack.
Then, if you're having a variable with an initializer (e.g. int i = 5;) that never changes value, a compiler may figure that out and not allocate memory at all. Instead it would just use 5 wherever you use that variable in your code.

where are the variable stored that are initialized in main function in c?

In C language, I know that when a variable is dynamically initialized using malloc, it is stored in heap area. But where is the memory allocated when declaration of below kind is done and variables are initialized later.
int a[26];
or
int a[n]; //n is a variable and each element in array a is later initialized through a for loop.
My initial understanding is that like in java, here also all variables declared in main function are stored in stack area. My doubt is- Say, there is a function that takes the address of array "a" and changes its contents. To change the contents of "a", it should be able to access each element's address in "a". As the function itself is getting executed in the stack space on the top of main function, it cannot access array "a"'s contents directly. So, my doubt is where is array "a"'s memory allocated?
Usually, int a[n]; is called a variable length array, and the storage allocation is compiler dependent.
For example, gcc allocates VLAs in stack memory.
FWIW, the local variables are also usually stored in stack memory (minus the compiler optimization, if any).
arrays can be almost any length, they can be used to store thousands or even millions of objects, but the size must be decided when the array is created. Each item in the array is accessed by an index, which is just a number that indicates the position or slot where the object is stored in the array.
Array size stored in the computer physical memory

How does the process of memory allocation of dynamic variables?

When a function is called, a space in memory is reserved for local variables (formal parameters and those declared within the function's scope).
I understand that in ANSI C, because it is required that the variables are declared at the beginning of a block.
However, in the case of the following C code compiled with GCC, will the z variable will have its space allocated at the beginning of the block or only when y is equal to 42?
void foo(int x) {
int y;
scanf("%d%*c", &y);
if (y != 42)
return;
int z;
return;
}
Is the behavior the same for other higher level languages such as Python and Ruby, with similar code?
This is typically implemented by reserving space on the stack for all variables that are declared in the method. It would certainly be possible to do it dynamically, but that would require each "potential" variable to internally be represented as a pointer (since its address cannot be known in advance), and the overhead would almost certainly not be worth it. If you really want "dynamic" variables, you can implement it yourself with pointers and dynamic memory allocation.
Java and C# do the same thing: they reserve space for the total collection of local variables.
I don't really know about Python or Ruby, but in these languages, there is no such thing as a primitive data type: all values are references and stored on the heap. As such, it is entirely possible that the storage space for the value referred to by a variable won't appear until the variable "declaration" is executed (although "declaration" isn't really a thing in dynamic languages; it's more of an assignment to a variable that happens do not exist yet). Note, though, that the variable itself also requires storage space (it's a pointer, after all) - however, the variables of dynamic languages are often implemented as hashmaps, so the variables themselves may also dynamically appear and disappear.

Variable-length arrays are created on heap, but we cannot free them?

I've verified that variable-length arrays are created on heap (see code below), but we cannot use a free operation to free them (cause a fault trap 6).
I was taught that the heap is managed by user and thus we have to explicitly free anything on heap if we don't need them. So who will be responsible to free these memories? Is this a defect of C?
Code that shows variable-length arrays are created on heap. (Platform: Mac OS X, gcc, 64bits)
#include<stdio.h>
#include<stdlib.h>
int main(int argc, char **argv)
{
int n = atoi(argv[1]);
int a1[10];
int a2[n];
int a3[10];
printf("address for a1:%p,address for a2:%p, address for a3:%p\n",a1,a2,a3);
printf("a1-a2: %lx, a1-a3: %lx\n",(a1-a2),(a1-a3));
//free(t); // will cause fault trap 6
return 0;
}
The result is:
$ ./run 10
address for a1:0x7fff5d095aa0,address for a2:0x7fff5d0959e0, address for a3:0x7fff5d095a70
a1-a2: 30, a1-a3: c
It's obvious that a1 and a3 is consecutive and thus are on stack, but a2 has a lower address, thus on heap.
Variable length arrays have automatic storage duration and have either block scope or function prototype scope.
So this array
int a2[n];
has automatic storage duration and the block scope of the function main. It was not created in the heap. It is the compiler that generates the corresponding code to free the allocated memory for the array when the control will exit the block scope.
According to the C Standard (6.2.4 Storage durations of objects)
7 For such an object that does have a variable length array type, its
lifetime extends from the declaration of the object until execution of
the program leaves the scope of the declaration.35) If the scope is
entered recursively, a new instance of the object is created each
time. The initial value of the object is indeterminate.
You may apply function free only to objects that were allocated using one of the memory allocation functions like malloc, calloc or realloc.
How an implementation allocates storage for VLAs is implementation defined. VLAs have automatic storage duration and you should not try to free() it.
You should treat it just like any other local variable for all practical purposes.
You only free() whatever the memory you allocated using malloc() family functions.
VLAs are not supported by all implementation and it is a conditional feature.
The macro
_ _STDC_NO_VLA_ _
is used to test if VLAs are supported or not by implementation (if it's 1 then VLAs are not supported).
In my opinion, VLAs should not used mainly because:
They are optional in C11
The allocation failure is not portable detectable
Is this a defect of C?
Absolutely not. You made several wrong assumptions, which lead you to a wrong conclusion.
First, let's get terminology straight: "stack" is called automatic storage area; "heap" is called dynamic storage area. C standard does not make any claims about any of the things listed below:
Relative order of addresses in automatic and dynamic areas
Relative order of addresses of items within the same storage area
Presence or absence of gaps between allocations within the same area
This makes it impossible to determine if a variable is in an automatic or in a dynamic area simply by looking at numeric addresses, without making a guess. In particular, what appears "obvious" to you has nothing to do with what is actually happening.
So who will be responsible to free these memories?
You are responsible for calling free on everything that you allocated in the dynamic storage area. You do not allocate your variable-length array in the dynamic storage area*, hence you are not responsible for calling free on it.
* If a compiler implementation were to allocate a VLA in the dynamic storage area, the compiler would be responsible for calling free on that pointer.

Resources