How does the process of memory allocation of dynamic variables? - c

When a function is called, a space in memory is reserved for local variables (formal parameters and those declared within the function's scope).
I understand that in ANSI C, because it is required that the variables are declared at the beginning of a block.
However, in the case of the following C code compiled with GCC, will the z variable will have its space allocated at the beginning of the block or only when y is equal to 42?
void foo(int x) {
int y;
scanf("%d%*c", &y);
if (y != 42)
return;
int z;
return;
}
Is the behavior the same for other higher level languages such as Python and Ruby, with similar code?

This is typically implemented by reserving space on the stack for all variables that are declared in the method. It would certainly be possible to do it dynamically, but that would require each "potential" variable to internally be represented as a pointer (since its address cannot be known in advance), and the overhead would almost certainly not be worth it. If you really want "dynamic" variables, you can implement it yourself with pointers and dynamic memory allocation.
Java and C# do the same thing: they reserve space for the total collection of local variables.
I don't really know about Python or Ruby, but in these languages, there is no such thing as a primitive data type: all values are references and stored on the heap. As such, it is entirely possible that the storage space for the value referred to by a variable won't appear until the variable "declaration" is executed (although "declaration" isn't really a thing in dynamic languages; it's more of an assignment to a variable that happens do not exist yet). Note, though, that the variable itself also requires storage space (it's a pointer, after all) - however, the variables of dynamic languages are often implemented as hashmaps, so the variables themselves may also dynamically appear and disappear.

Related

How stack structure works with compound statements inside a function?

I'm trying to learn c programming and can't understand how stacks work.
Everywhere I read I find that when a function is called stack frame is created in the stack which contains all the data for the function call- parameters, return address and local variables. And the stack frame is removed releasing the memory when the function returns.
But what if we had a compound statement inside the function which have its own variables. Is the memory for the local variables for block is also allocated inside the stack frame when the function call and released when it returns.
Example
int main(){
int a = 10;
if(int a<50){
int b=9;
}
else{
int c=10;
}
}
Is the memory for b and c is allocated with a when the function starts executing?
And deallocated when the function returns?
If so than there is no difference other than the visibility of the variable when declaring it in the beginning of the function or inside a another block in the function.
Please explain.
The C standard doesn't specify how such things are to be implemented. The C standard doesn't even mention a stack! A stack is a common way of implementing function calls but nothing in the standard requires a stack. All such things are implementation specific details. For the posted code, the standard only specifies when the variables are in scope.
So there is no general answer to your question. The answer depends on your specific system, i.e. processor, compiler, etc.
Provided that your system uses a stack (which is likely), the compiler may reserve stack space for all 3 variables or it may reserve space for 2 variables, i.e. one for awhile b and c share the other. Both implementations will be legal. The compiler is even allowed to place the variables directly in some registers so that nothing needs to be reserved on the stack.
You can check your specific system by looking at the generated assembly code.
A C implementation may implement this in multiple ways. Let’s suppose your example objects, a, b, and c, are actually used in your code in some way that results in the compiler actually allocating memory for them and not optimizing them away. Then:
The compiler could allocate stack space (by decreasing the top-of-stack pointer) for all of a, b, and c when the function starts, and release it when the function ends.
The compiler could allocate stack space for a when the function starts, then allocate space (again by decreasing the stack pointer) in the middle of the function when space for b or c is needed, then release that stack space as each block ends.
In a good modern compiler, the compiler is likely to analyze all the active lifetimes of the objects and find a somewhat optimal solution for using stack space in overlapping ways. By “active lifetime”, I mean the time from when the value of an object is set to the last time that value is needed (not the C standard’s definition of “lifetime”). For example, in int a = f(x); … g(a); h(y); a = f(y); … g(a);, there are actually two lifetimes for a, from its initial assignment to the first g(a) and from the assignment a = f(y); to the second g(a);. If the compiler needs memory to store a, it might use different memory for these two lifetimes.
Because of the above, what memory is used for which C object can get quite complicated. A particular memory location might be used for a at one time and for b at another. It may depend on loops and goto statements in your code. It also depends on whether the address of an object is taken—if the address is taken, the compiler may have to keep the object in one place, so that the address is consistent. (It might be able to get away without doing that, depending on how it can see the address is used.)
Basically, the compiler is free to use the stack, other memory, and registers in whatever way it chooses as long as the observable behavior of your program remains as it is defined by the C standard.
(The observable behavior is the input/output interactions of your program, the data written to files, and the accesses to volatile objects.)
Your example as stated is not valid since you have no brackets in the if-else statement. However, in the example below all variables are typically allocated when the function is entered:
int main(void)
{
int a = 10;
if (a < 50) {
int b = 9;
} else {
int c = 10;
}
}
As mentioned by user "500 - Internal Server Error", this is an implementation issue.

When/where are local arrays allocated?

https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-and-C.html describes automatic allocation of local variables. I understand that local variables are commonly allocated on the stack. I can imagine how an int might be allocated on the stack; just push its value. But how might an array be allocated?
For example, if you declare an array char str[10];, does that 10 bytes of space go on the stack, or is it allocated somewhere else, and only the str pointer is pushed to the stack? If the latter, where is the 10 bytes of space allocated?
Furthermore, when exactly are local variables, including arrays, allocated? I commonly see heap allocation referred to as "dynamic allocation", implying that automatic variables are not dynamically allocated. But automatic variables may be declared within flow-of-control constructs and function bodies, so the compiler can't possibly know before runtime exactly how much space will be occupied by automatic variables. So automatic variables must also be dynamically allocated, right?
Edit: I would like to emphasize the first half of this question. I am most interested in understanding when and where the space for local arrays is allocated. On the stack? Somewhere else?
Edit 2: I made a mistake when I originally included the C++ tag for this question. I meant to ask only about the C language and its implementations. I apologize for any confusion.
In the C 2018 standard, clause 6.2.4, paragraphs 6 and 7 tell us about the lifetimes of objects with automatic storage duration. Paragraph 6 covers such objects that are not variable length arrays:
… its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time.
Thus, if we have this code:
{
PointA;
int x = 3;
PointB;
}
then x exists in the C model as soon as execution reaches PointA—its block was entered, and that is when the lifetime of x begins. However, although x already exists at PointA, its value is indeterminate. The initialization only occurs when the definition is reached.
Paragraph 7 tells us about variable length arrays:
… its lifetime extends from the declaration of the object until execution of the program leaves the scope of the declaration.
So, if we have this code:
{
PointA;
int x[n]; // n is some variable.
PointB;
}
then x does not exist at PointA. Its lifetime begins when int x[n]; is reached.
Keep in mind this existence is only in terms of C’s abstract model of computing. Compilers are allowed to optimize code as long as the observable results (such as output of the program) are the same. So the actual code generated by a compiler might not create x when the block is entered. (It might not create x at all; it could be optimized away completely.)
For example, if you declare an array char str[10];, does that 10 bytes of space go on the stack, or is it allocated somewhere else, and only the str pointer is pushed to the stack? If the latter, where is the 10 bytes of space allocated?
In general, the array's storage is allocated on the stack, just like any other local variable. This is compiler and target-specific. Even on a x86_64 machine, a 4 billion byte array is probably not allocated on the stack. I'd expect one of: a compile error, a link error, a runtime error, or it works somehow. In the last alternative, it might call new[] or malloc() and leave the pointer to the array on the stack in place of the array.
Notice that the array's allocation and its pointer are the same thing, so your addition of allocated somewhere else, and only the str pointer wording might indicate confusion. The allocation occurs and the name for it are not independent data.
What you ask for is depends on the language implementation (the compiler). To answer your question, this is (a simplified overview of) what compilers usually do for compiled languages (like C/C++):
When the compiler finishes parsing a function, it keeps a symbol table of all local variables declared in this function, even those declared "syntactically" during the instruction flow of the function (like local loops variables). Later, when it needs to generate the final (assembly) code, it generates the necessary instructions to push (or just moves the stack pointer) a sufficient space for all local variables. So, local loop variables, for instance, are not allocated when the loop starts execution. Rather, they are allocated at the beginning of the execution of the function containing the loop. The compiler also adds instructions to remove this allocated stack space before returning from the function.
So, automatic variables, like your char array, is totally allocated on the stack in this (common) scenario.
[EDIT] Variable length arrays (before C99)
The discussion above was for arrays having lengths known at compile time like this:
void f () {
char n[10];
....
}
If we stay in C language terms (before C99), variable-length arrays (arrays whose lengths are not known at compile-time, but rather at runtime) are declared as a pointer like this:
void f() {
char *n;
... //array is later allocated using some kind of memory allocation construct
}
This, in fact, just declares a pointer to the array. Pointers size is known to the compiler. So, as I said above, the compiler will be able to reserve the necessary storage for the pointer on the stack (just the pointer, not the real array) regardless of what will be the size of the array at runtime. When the execution reaches the line that allocates the array (using malloc, for instance), the array storage is dynamically allocated on the heap, and its address is stored in the local automatic variable n. In languages without garbage collection, this requires freeing (deallocating) the reserved storage from the heap manually (i.e. the programmer should add an instruction to do it in the program when the array is no longer needed). This is not necessary for constant-sized array (that are allocated on the stack) because the compiler removes the stack frame before returning from the function, as I said earlier.
[EDIT2]
C99 variable length arrays cannot be declared on the stack. The compiler must add some code to the resulting machine code that handles its dynamic creation and destruction at runtime.

Is the stack offset assigned to local stack variables ever reused, e.g. in case it becomes dead or goes out of scope?

In other words, will compilers allocate enough space in the program stack to store all variables at the deepest level of block nesting in the current function or do they look at liveness and the scope of variables too?
void zoo(int num) {
if (num) {
int a = foo();
bar(a);
} else {
int b = foo();
bar(b);
}
}
For example the above code will be assigned different offsets on the stack for a and b, even though, if they were assigned only one offset (e.g. rbp - 8) it would have been legal too. My question is that will compilers like gcc and clang ever output assembly where multiple variables are assigned the same static offset?
Is there anything in the specifications about this?
I want to know if there is a unique mapping between source variables and the stack offsets present in a compiled assembly file.
There is, in general, no unique mapping between objects with automatic storage duration (“local” objects defined inside a function or block) and stack offsets. I have seen compiler-generated code reuse the same stack location for different objects, either because the use of one did not overlap the use of the other in the C code or because the compiler had moved one into a register for whatever purposes and no longer needed to use the stack location for it.
The C and C++ standards do not require implementations to implement their stack allocation in any particular way. They are free to reuse stack locations. They are also free to allocate all the stack space that might be needed1 or to wait to see if particular blocks are entered or not before further allocating stack space for the objects inside those blocks.
Note
1 Implementations that support variable-length arrays generally must wait until the size of the array can be determined before allocating space for it.

Which memory locations to use for variable storage

Higher level languages such as javascript don't give the programmer a
choice as to where variables are stored. But C does. My question is:
are there any guidelines as to where to store variables, eg dependent
on size, usage, etc.
As far as I understand, there are three possible locations to store
data (excluding code segment used for actual code):
DATA segment
Stack
Heap
So transient small data items should be stored on the stack?
What about data items which must be shared between functions. These
items could be stored on the heap or in the data segment. How do you
decide which to choose?
You're looking through the wrong end of the telescope. You don't specify particular memory segments in which to store a variable (particularly since the very concept of a "memory segment" is highly platform-dependent).
In C code, you decide a variable's lifetime, visibility, and modifiability based on what makes sense for the code, and based on that the compiler will generate the machine code to store the object in the appropriate segment (if applicable)
For example, any variables declared at file scope (outside of any function) or with the keyword static will have static storage duration, meaning they are allocated at program startup and held until the program terminates; these objects may be allocated in a data segment or bss segment. Variables declared within a function or block without the static keyword have automatic storage duration, and are (typically) allocated on the stack.
String literals and other compile-time constant objects are often (but not always!) allocated in a readonly segment. Numeric literals like 3.14159 and character constants like 'A' are not objects, and do not (typically) have memory allocated for them; rather, those values are embedded directly in the machine code instructions.
The heap is reserved for dynamic storage, and variables as such are not stored there; instead, you use a library call like malloc to grab a chunk of the heap at runtime, and assign the resulting pointer value to a variable allocated as described above. The variable will live in either the stack or a data segment, while the memory it points to lives on the heap.
Ideally, functions should communicate solely through parameters, return values, and exceptions (where applicable); functions should not share data through an external variable (i.e., a global). Function parameters are usually allocated on the stack, although some platforms may pass parameters via registers.
You should prefer local/stack variables to global or heap variables when those variables are small, used often and in a relatively small/limited scope. That will give the compiler more opportunities to optimize the code using them as it'll know they aren't going to change between function calls unless you pass around pointers to them.
Also, the stack is usually relatively small and allocating large structures or arrays on it may lead to stack overflows, especially so in recursive code.
Another thing to consider is the use of global variables in multithreaded programs. You want to minimize chances of race conditions and one strategy for that is maiking functions thread-safe and re-enterant by not using any global resources in them directly (if malloc() is thread-safe, if errno is per-thread, etc you can use them, of course).
Btw, using local variables instead of global variables also improves code readability as the variables are located close to the place where they're used and you can quickly find out their type and where and how they're used.
Other than that, if your code is correct, there shouldn't be much practical difference between making variables local or global or in the heap (of course, malloc() can fail and you should remember about it:).
C only allows you to specify where data is stored indirectly... via the scope of the variable and/or allocation. i.e., a local variable to a function is typically a stack variable unless it is declared static in which case it will likely be DATA/BSS. Variables created dynamically via new/malloc will typically be heap.
However, there's no guarantee of any of that... only the implication of it.
That said, the one thing that is guaranteed to be a bad idea is to declare large local variables in functions... common source of strange errors and stack overflows. Very large arrays and structures are best suited to dynamic allocation and keep the pointers in local/global as required.

Why we must initialize a variable before using it? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
What happens to a declared, uninitialized variable in C? Does it have a value?
Now I'm reading Teach Yourself C in 21 Days. In chapter 3, there is a note like this:
DON'T use a variable that hasn't been initialized. Results can be
unpredictable.
Please explain to me why this is the case. The book provide no further clarification about it.
Because, unless the variable has static storage space, it's initial value is indeterminate. You cannot rely on it being anything as the standard does not define it. Even statically allocated variables should be initialized though. Just initialize your variables and avoid a potential headache in the future. There is no good reason to not initialize a variable, plenty of good reasons to do the opposite.
On a side note, don't trust any book that claims to teach you X programming language in 21 days. They are lying, get yourself a decent book.
When a variable is declared, it will point to a piece of memory.
Accessing the value of the variable will give you the contents of that piece of memory.
However until the variable is initialised, that piece of memory could contain anything. This is why using it is unpredictable.
Other languages may assist you in this area by initialising variables automatically when you assign them, but as a C programmer you are working with a fairly low-level language that makes no assumptions about what you want to do with your program. You as the programmer must explicitly tell the program to do everything.
This means initialising variables, but it also means a lot more besides. For example, in C you need to be very careful that you de-allocate any resources that you allocate once you're finished with them. Other languages will automatically clean up after you when the program finished; but in C if you forget, you'll just end up with memory leaks.
C will let you get do a lot of things that would be difficult or impossible in other languages. But this power also means that you have to take responsibility for the housekeeping tasks that you take for granted in those other languages.
You might end up using a variable declared outside the scope of your current method.
Consider the following
int count = 0;
while(count < 500)
{
doFunction();
count ++;
}
...
void doFunction() {
count = sizeof(someString);
print(count);
}
In C, using the values of uninitialized automatic variables (non-static locals and parameter variables) that satisfy the requirement for being a register variable is undefined behavior, because such variables values may have been fetched from a register and certain platforms may abort your program if you read such uninitialized values. This includes variables of type unsigned char (this was added to a later C spec, to accommodate these platforms).
Using values of uninitialized automatic variables that do not satisfy the requirement for being a register variable, like variables that have their addresses taken, is fine as long as the C implementation you use hasn't got trap representations for the variable type you use. For example if the variable type is unsigned char, the Standard requires all platforms not to have trap representations stored in such variables, and a read of an indeterminate value from it will always succeed and is not undefined behavior. Types like int or short don't have such guarantees, so your program may crash, depending on the C implementation you use.
Variables of static storage duration are always initialized if you don't do it explicitly, so you don't have to worry here.
For variables of allocated storage duration (malloc ...), the same applies as for automatic variables that don't satisfy the requirements for being a register variable, because in these cases the affected C implementations need to make your program read from memory, and won't run into the problems in which register reads may raise exceptions.
In C the value of an uninitialized variable is indeterminate. This means you don't know its value and it can differ depending on platform and compiler. The same is also true for compounds such as struct and union types.
Why? Sometimes you don't need to initialize a variable as you are going to pass it to a function that fills it and that doesn't care about what is in the variable. You don't want the overhead of initialization imposed upon you.
How? Primitive types can be initialized from literals or function return values.
int i = 0;
int j = foo();
Structures can be zero intialized with the aggregate intializer syntax:
struct Foo { int i; int j; }
struct Foo f = {0};
At the risk of being pedantic, the statement
DON'T use a variable that hasn't been initialized.
is incorrect. If would be better expressed:
Do not use the value of an uninitialised variable.
The language distinguishes between initialisation and assignment, so the first warning is, in that sense over cautious - you need not provide an initialiser for every variable, but you should have either assigned or initialised a variable with a useful and meaningful value before you perform any operations that subsequently use its value.
So the following is fine:
int i ; // uninitialised variable
i = some_function() ; // variable is "used" in left of assignment expression.
some_other_function( i ) ; // value of variable is used
even though when some_function() is called i is uninitialised. If you are assigning a variable, you are definitely "using" it (to store the return value in this case); the fact that it is not initialised is irrelevant because you are not using its value.
Now if you adhere to
Do not use the value of an uninitialised variable.
as I have suggested, the reason for this requirement becomes obvious - why would you take the value of a variable without knowing that it contained something meaningful? A valid question might then be "why does C not initialise auto variables with a known value. And possible answers to that would be:
Any arbitrary compiler supplied value need not be meaningful in the context of the application - or worse, it may have a meaning that is contrary to the actual state of the application.
C was deliberately designed to have no hidden overhead due to its roots as systems programming language - initialisation is only performed when explicitly coded, because it required additional machine instructions and CPU cycles to perform.
Note that static variables are always initialised to zero. .NET languages, such as C# have the concept of an null value, or a variable that contains nothing, and that can be explicitly tested and even assigned. A variable in C cannot contain nothing, but what it contains can be indeterminate, and therefore code that uses its value will behave non-deterministically.

Resources