All about C memory management [closed] - c

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I want to move on from writing small one/two/three-filed C programs to slightly larger C projects. For that I really want to get the memory management right. Now I know that similar questions have been asked, but they won't quite answer mine. I know some of the theory already and put it to use. Therefore I'd like to present what I know or think to know and you correct me or add information I miss.
There is the stack, static memory and the heap
static int n; // goes to static memory, thread lifetime
char *array = malloc(256 * sizeof(*array)); // goes to heap, needs free(array)
func(int n) {
float f; // goes to stack, dies with func return
static double d; // thread lifetime again
}
Static memory can't overflow since it's set for all static variables,
however heap and stack can overflow, heap overflows when non allocated memory is accessed in most cases, stack is set to ~1MB Windows or ~8MB Linux, if that is exhausted (I got a message "core dumped" on my Ubuntu for setting up an array of structs for every pixel of an image on the stack)
Does static, stack and heap memory behave like this in every scope?
I know heap memory does. But does a global non static array of structs go on the stack? What if I have a static array in file b where there is no main?
Goes on to static memory right? And what is if a function has a local static variable with initialized value? Is it initialized everytime i call the function? and do functions take from the stack?
How to avoid exhausting the stack in large programs with long lifed stack variables and plenty of them? And most of all, where do string literals go?? They are supposed to be pointers not arrays, but if i change the pointer what happens to the original string? One more thing: It always bothers me how it looks straight up bad practise to write code like
if(!strcmp(a, "comparethis")) do ...
or
fprintf(stderr, "There was a problem .... %d", something);
EDIT
Where is the difference in terms of memory management?
char arr[3] = {'A', 'B', 'C'};
char arr[3] = "ABC";
char *arr = "ABC";
EDIT END
Is it good to include string literals anyway or rather read them from a file or whatnot?
Finally, sorry if my grammar failed me here and there but this was written in a fast manner. Thanks for reading and please no hate. Outline my mistakes, don't judge them, if I can ask that much. I wanna improve and fast.
Have a good day anyway.

There is the stack, static memory and the heap
Not exactly. C standard defines automatic storage, static storage and dynamic storage. The stack is a possible (in fact the common) implementation for automatic storage, as is the heap for dynamic storage. This definitions acts on the lifetime of variables:
automatic objects have their life bound by their containing block
static objects come to life at the beginning of the program, and their life extends to the end of the program
dynamic variables are created by malloc (and associated) function(s) from the standard library and shall be destroyed by free.
Provided these rules are obeyed, an implementation if free to physically store objects where it wants to. In particular, some implementations were known to store automatic arrays on the heap and automatically destroy them at block exit
Static memory can't overflow since it's set for all static variables
In the time of MS/DOS, the small memory model required all static variable to fit in a single segment of 64 kbytes. If you wanted more, you could not compile in that mode. The error could be a compile error in one single compilation unit caused the error, or a link error if only the total size exceeded 64k
stack is set to ~1MB Windows or ~8MB Linux
Compiler options allow to change the size of the stack on common compilers
Now for your questions:
does a global non static array of structs go on the stack?
it can depend on implementation, provided it is automatically destroyed when leaving block, it can be stored on the heap
What if I have a static array in file b where there is no main? Goes on to static memory right?
Yes it has to be static, whether the translation unit contains a main or not
And what is if a function has a local static variable with initialized value? Is it initialized everytime i call the function?
as it is static, it is created and initialized at the beginning of the program, and keeps its value through other calls. It is not reinitialized on following calls
and do functions take from the stack?
What do you mean here? Common implementation do use the stack for the function return address and all its automatic variables
How to avoid exhausting the stack in large programs with long lifed stack variables and plenty of them?
You can increase the stack size at compile time, or use a different design. For example iterative algorithms are less stack consuming than recursive ones
And most of all, where do string literals go?
A string litteral has static duration. Some implementation store them in a read only segment, but there is no requirement for it. Simply it is undefined behaviour to try to modify a string litteral.
Where is the difference in terms of memory management?
char arr[3] = {'A', 'B', 'C'};
char arr[3] = "ABC";
char *arr = "ABC";
First and second ones both declare a non const array of 3 characters initialized with A, B and C, and no terminating null.
Third one is quite different: it declares a pointer to a (null terminated) string litteral. That means that arr[0] = 'X'; is undefined behaviour because it modifies a litteral. And sizeof(arr) the length of the string litteral but sizeof(char *).
Is it good to include string literals anyway or rather read them from a file or whatnot?
I cannot understand the question. A string litteral is available inside the program. A string stored in a file requires access to the file. So at least the file name has to be a string litteral.

Related

How stack structure works with compound statements inside a function?

I'm trying to learn c programming and can't understand how stacks work.
Everywhere I read I find that when a function is called stack frame is created in the stack which contains all the data for the function call- parameters, return address and local variables. And the stack frame is removed releasing the memory when the function returns.
But what if we had a compound statement inside the function which have its own variables. Is the memory for the local variables for block is also allocated inside the stack frame when the function call and released when it returns.
Example
int main(){
int a = 10;
if(int a<50){
int b=9;
}
else{
int c=10;
}
}
Is the memory for b and c is allocated with a when the function starts executing?
And deallocated when the function returns?
If so than there is no difference other than the visibility of the variable when declaring it in the beginning of the function or inside a another block in the function.
Please explain.
The C standard doesn't specify how such things are to be implemented. The C standard doesn't even mention a stack! A stack is a common way of implementing function calls but nothing in the standard requires a stack. All such things are implementation specific details. For the posted code, the standard only specifies when the variables are in scope.
So there is no general answer to your question. The answer depends on your specific system, i.e. processor, compiler, etc.
Provided that your system uses a stack (which is likely), the compiler may reserve stack space for all 3 variables or it may reserve space for 2 variables, i.e. one for awhile b and c share the other. Both implementations will be legal. The compiler is even allowed to place the variables directly in some registers so that nothing needs to be reserved on the stack.
You can check your specific system by looking at the generated assembly code.
A C implementation may implement this in multiple ways. Let’s suppose your example objects, a, b, and c, are actually used in your code in some way that results in the compiler actually allocating memory for them and not optimizing them away. Then:
The compiler could allocate stack space (by decreasing the top-of-stack pointer) for all of a, b, and c when the function starts, and release it when the function ends.
The compiler could allocate stack space for a when the function starts, then allocate space (again by decreasing the stack pointer) in the middle of the function when space for b or c is needed, then release that stack space as each block ends.
In a good modern compiler, the compiler is likely to analyze all the active lifetimes of the objects and find a somewhat optimal solution for using stack space in overlapping ways. By “active lifetime”, I mean the time from when the value of an object is set to the last time that value is needed (not the C standard’s definition of “lifetime”). For example, in int a = f(x); … g(a); h(y); a = f(y); … g(a);, there are actually two lifetimes for a, from its initial assignment to the first g(a) and from the assignment a = f(y); to the second g(a);. If the compiler needs memory to store a, it might use different memory for these two lifetimes.
Because of the above, what memory is used for which C object can get quite complicated. A particular memory location might be used for a at one time and for b at another. It may depend on loops and goto statements in your code. It also depends on whether the address of an object is taken—if the address is taken, the compiler may have to keep the object in one place, so that the address is consistent. (It might be able to get away without doing that, depending on how it can see the address is used.)
Basically, the compiler is free to use the stack, other memory, and registers in whatever way it chooses as long as the observable behavior of your program remains as it is defined by the C standard.
(The observable behavior is the input/output interactions of your program, the data written to files, and the accesses to volatile objects.)
Your example as stated is not valid since you have no brackets in the if-else statement. However, in the example below all variables are typically allocated when the function is entered:
int main(void)
{
int a = 10;
if (a < 50) {
int b = 9;
} else {
int c = 10;
}
}
As mentioned by user "500 - Internal Server Error", this is an implementation issue.

When/where are local arrays allocated?

https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-and-C.html describes automatic allocation of local variables. I understand that local variables are commonly allocated on the stack. I can imagine how an int might be allocated on the stack; just push its value. But how might an array be allocated?
For example, if you declare an array char str[10];, does that 10 bytes of space go on the stack, or is it allocated somewhere else, and only the str pointer is pushed to the stack? If the latter, where is the 10 bytes of space allocated?
Furthermore, when exactly are local variables, including arrays, allocated? I commonly see heap allocation referred to as "dynamic allocation", implying that automatic variables are not dynamically allocated. But automatic variables may be declared within flow-of-control constructs and function bodies, so the compiler can't possibly know before runtime exactly how much space will be occupied by automatic variables. So automatic variables must also be dynamically allocated, right?
Edit: I would like to emphasize the first half of this question. I am most interested in understanding when and where the space for local arrays is allocated. On the stack? Somewhere else?
Edit 2: I made a mistake when I originally included the C++ tag for this question. I meant to ask only about the C language and its implementations. I apologize for any confusion.
In the C 2018 standard, clause 6.2.4, paragraphs 6 and 7 tell us about the lifetimes of objects with automatic storage duration. Paragraph 6 covers such objects that are not variable length arrays:
… its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time.
Thus, if we have this code:
{
PointA;
int x = 3;
PointB;
}
then x exists in the C model as soon as execution reaches PointA—its block was entered, and that is when the lifetime of x begins. However, although x already exists at PointA, its value is indeterminate. The initialization only occurs when the definition is reached.
Paragraph 7 tells us about variable length arrays:
… its lifetime extends from the declaration of the object until execution of the program leaves the scope of the declaration.
So, if we have this code:
{
PointA;
int x[n]; // n is some variable.
PointB;
}
then x does not exist at PointA. Its lifetime begins when int x[n]; is reached.
Keep in mind this existence is only in terms of C’s abstract model of computing. Compilers are allowed to optimize code as long as the observable results (such as output of the program) are the same. So the actual code generated by a compiler might not create x when the block is entered. (It might not create x at all; it could be optimized away completely.)
For example, if you declare an array char str[10];, does that 10 bytes of space go on the stack, or is it allocated somewhere else, and only the str pointer is pushed to the stack? If the latter, where is the 10 bytes of space allocated?
In general, the array's storage is allocated on the stack, just like any other local variable. This is compiler and target-specific. Even on a x86_64 machine, a 4 billion byte array is probably not allocated on the stack. I'd expect one of: a compile error, a link error, a runtime error, or it works somehow. In the last alternative, it might call new[] or malloc() and leave the pointer to the array on the stack in place of the array.
Notice that the array's allocation and its pointer are the same thing, so your addition of allocated somewhere else, and only the str pointer wording might indicate confusion. The allocation occurs and the name for it are not independent data.
What you ask for is depends on the language implementation (the compiler). To answer your question, this is (a simplified overview of) what compilers usually do for compiled languages (like C/C++):
When the compiler finishes parsing a function, it keeps a symbol table of all local variables declared in this function, even those declared "syntactically" during the instruction flow of the function (like local loops variables). Later, when it needs to generate the final (assembly) code, it generates the necessary instructions to push (or just moves the stack pointer) a sufficient space for all local variables. So, local loop variables, for instance, are not allocated when the loop starts execution. Rather, they are allocated at the beginning of the execution of the function containing the loop. The compiler also adds instructions to remove this allocated stack space before returning from the function.
So, automatic variables, like your char array, is totally allocated on the stack in this (common) scenario.
[EDIT] Variable length arrays (before C99)
The discussion above was for arrays having lengths known at compile time like this:
void f () {
char n[10];
....
}
If we stay in C language terms (before C99), variable-length arrays (arrays whose lengths are not known at compile-time, but rather at runtime) are declared as a pointer like this:
void f() {
char *n;
... //array is later allocated using some kind of memory allocation construct
}
This, in fact, just declares a pointer to the array. Pointers size is known to the compiler. So, as I said above, the compiler will be able to reserve the necessary storage for the pointer on the stack (just the pointer, not the real array) regardless of what will be the size of the array at runtime. When the execution reaches the line that allocates the array (using malloc, for instance), the array storage is dynamically allocated on the heap, and its address is stored in the local automatic variable n. In languages without garbage collection, this requires freeing (deallocating) the reserved storage from the heap manually (i.e. the programmer should add an instruction to do it in the program when the array is no longer needed). This is not necessary for constant-sized array (that are allocated on the stack) because the compiler removes the stack frame before returning from the function, as I said earlier.
[EDIT2]
C99 variable length arrays cannot be declared on the stack. The compiler must add some code to the resulting machine code that handles its dynamic creation and destruction at runtime.

Is the stack offset assigned to local stack variables ever reused, e.g. in case it becomes dead or goes out of scope?

In other words, will compilers allocate enough space in the program stack to store all variables at the deepest level of block nesting in the current function or do they look at liveness and the scope of variables too?
void zoo(int num) {
if (num) {
int a = foo();
bar(a);
} else {
int b = foo();
bar(b);
}
}
For example the above code will be assigned different offsets on the stack for a and b, even though, if they were assigned only one offset (e.g. rbp - 8) it would have been legal too. My question is that will compilers like gcc and clang ever output assembly where multiple variables are assigned the same static offset?
Is there anything in the specifications about this?
I want to know if there is a unique mapping between source variables and the stack offsets present in a compiled assembly file.
There is, in general, no unique mapping between objects with automatic storage duration (“local” objects defined inside a function or block) and stack offsets. I have seen compiler-generated code reuse the same stack location for different objects, either because the use of one did not overlap the use of the other in the C code or because the compiler had moved one into a register for whatever purposes and no longer needed to use the stack location for it.
The C and C++ standards do not require implementations to implement their stack allocation in any particular way. They are free to reuse stack locations. They are also free to allocate all the stack space that might be needed1 or to wait to see if particular blocks are entered or not before further allocating stack space for the objects inside those blocks.
Note
1 Implementations that support variable-length arrays generally must wait until the size of the array can be determined before allocating space for it.

Does the C compiler allocate memory for a variable? [duplicate]

This question already has answers here:
When is memory allocated during compilation?
(8 answers)
Closed 5 years ago.
I was wondering that in which stage, memory gets allocated to the variable.
Is it in the compilation stage or is it at the execution time?
Yes, and yes, both.
For a global variable (declared at file scope), the compiler reserves memory in the executable image. So this is compile-time.
For an automatic variable (declared in a function), the compiler adds instructions to allocate the variable on the stack. So this is run-time
int a; // file scope
int f(void)
{
int b; // function scope
...
Notes:
The compiler has one (one set of) instructions to allocate all local variables of a function in one time. Generally, there is not overhead per variable (there can be exceptions I don't discuss now). These instructions are executed every time the function is called.
The compiler does not allocate storage for your strings. This is an error beginners often make. Consider:
char *s; // a pointer to a strings
scanf("%s", s); // no, the compiler will not allocate storage for the string to read.
It depends on the kind of variable/object.
Globally and statically allocated variables are known at compile time and their offsets in the data segment are baked into the program. So in a way, they got allocated at compile time.
Variables local to function scope are allocated on the stack. You could say that the compiler knew about them and the kind of storage their needed but obviously they got allocated (in the sense of came into existence) at run time, during a function call.
Another interesting object is the heap allocated object, which can be created with malloc/calloc in C and new or related mechanisms in C++. They are allocated at run time in the heap section.
There's a third kind of memory which is allocated dynamically by using malloc() and friends.
This memory is taken from the so called heap. While automatic variables (you have in functions) is taken from the so called stack.
Then, if you're having a variable with an initializer (e.g. int i = 5;) that never changes value, a compiler may figure that out and not allocate memory at all. Instead it would just use 5 wherever you use that variable in your code.

How does memory allocation in malloc differs from that of an array?

I have written a code like this:
int * ptr;
ptr = (int*) malloc(5 * sizeof(int));
Please tell me how memory is allocated to "ptr" using this malloc function?
How is it different from memory allocation in an integer array?
1. Answer in terms of the language
It differs in the storage duration. If you use an integer array, it could be at the file scope (outside any function):
int arr[5];
This has static storage duration, which means the object is alive for the whole execution time of the program.
The other possibility is inside a function:
void foo(void)
{
int arr[5];
}
This variant has automatic storage duration, so the object is only alive as long as the program execution is inside this function (more generally: inside the scope of the variable which is the enclosing pair of braces: { ... })
If you use malloc() instead, the object has dynamic storage duration. This means you decide for how long the object is alive. It stays alive until you call free() on it.
2. Answer in terms of the OS
malloc() is typically implemented on the heap. This is an area of your address space where your program can dynamically request more memory from the OS.
In contrast, with an object with automatic storage duration, typical implementations place it on the stack (where a new frame is created for each function call) and with static storage duration, it will be in a data segment of your program already from the start.
This part of the answer is intentionally a bit vague; C implementations exist for a vast variety of systems, and while stack, heap and data segments are used in many modern operating systems with virtual memory management, they are by no means required -- take for example embedded platforms and microcontrollers where your program might run without any operating system. So, writing portable C code, you should (most of the time) only be interested in what the language specifies.
The C standard is very clear on this.
The memory pointed to by ptr has dynamic storage duration.
The memory associated with an int foo[5];, say has automatic storage duration.
The key difference between the two is that in the dynamic case you need to call free on ptr to release the memory once you're done with it, whereas foo will be released automatically once it goes out of scope.
The mechanism for acquiring the memory in either case is intentionally left to the compiler.
For further research, Google the terms that I've italicised.

Resources