Dynamic allocation with malloc - c

In what ways is
int main(int argc, char **argv)
{
int N = atoi(argv[1]);
int v[N];
...some code here...
}
better than
int main(int argc, char **argv)
{
int count = atoi(argv[1]);
int *N = malloc(count * sizeof(int));
...some code here...
free(N);
}
?
Is it less error-prone?
P.S: I know where the two will typically be allocated.
Edit: Initialized count.

It's not "better" per se. It does a different thing:
int v[count];
is a variable-length array in C99 and later. That means it has a lifetime: The moment it goes out of scope (the function in which it was created is finished), it's forgotten.
int *N = malloc(count * sizeof(int));
on the other hand allocates that memory on the heap. That means it lives as long as you do not explicitly free it (or the program ends, which is at the end of your main, anyway, so this might not be the best code to illustrate the difference on)!
Now, having to remember that you have allocated some memory that you need to deallocate later is a hassle. Programmers get that wrong all the time, and that leads to either memory leaks (you forgetting to free memory you're not using anymore, leading to the memory used by a program growing over time) or use-after-free bugs (you using memory after you've already freed it, which has undefined results and can lead to serious bugs, including security problems).
On the other hand, your stack space is small. Like 4 kB or 8 kB. Try passing 1048576 to your program as argv[1]! That's a meager 1 million ints you're creating. And that probably means you're allocating 4 MB of RAM. 4 MB, compared to the Gigabytes of RAM your computer most likely has, that's simply not much data, right?
But your int v[1048576] would plainly fail, because that's more than a function is allowed to have "scratchpad" space for itself. So, in general, for variable-length things where you cannot know before that they are really small, your second solution with malloc is "better", because it actually works.
General remark: Memory management in C is hard. That's why languages (C++ and others) have come up with solutions where you can have something that can be made aware of its own lifetime, so that you cannot forget to delete something, because the object itself knows when it is no longer used.

Related

Why does malloc need to be used for dynamic memory allocation in C?

I have been reading that malloc is used for dynamic memory allocation. But if the following code works...
int main(void) {
int i, n;
printf("Enter the number of integers: ");
scanf("%d", &n);
// Dynamic allocation of memory?
int int_arr[n];
// Testing
for (int i = 0; i < n; i++) {
int_arr[i] = i * 10;
}
for (int i = 0; i < n; i++) {
printf("%d ", int_arr[i]);
}
printf("\n");
}
... what is the point of malloc? Isn't the code above just a simpler-to-read way to allocate memory dynamically?
I read on another Stack Overflow answer that if some sort of flag is set to "pedantic", then the code above would produce a compile error. But that doesn't really explain why malloc might be a better solution for dynamic memory allocation.
Look up the concepts for stack and heap; there's a lot of subtleties around the different types of memory. Local variables inside a function live in the stack and only exist within the function.
In your example, int_array only exists while execution of the function it is defined in has not ended, you couldn't pass it around between functions. You couldn't return int_array and expect it to work.
malloc() is used when you want to create a chunk of memory which exists on the heap. malloc returns a pointer to this memory. This pointer can be passed around as a variable (eg returned) from functions and can be used anywhere in your program to access your allocated chunk of memory until you free() it.
Example:
'''C
int main(int argc, char **argv){
int length = 10;
int *built_array = make_array(length); //malloc memory and pass heap pointer
int *array = make_array_wrong(length); //will not work. Array in function was in stack and no longer exists when function has returned.
built_array[3] = 5; //ok
array[3] = 5; //bad
free(built_array)
return 0;
}
int *make_array(int length){
int *my_pointer = malloc( length * sizeof int);
//do some error checking for real implementation
return my_pointer;
}
int *make_array_wrong(int length){
int array[length];
return array;
}
'''
Note:
There are plenty of ways to avoid having to use malloc at all, by pre-allocating sufficient memory in the callers, etc. This is recommended for embedded and safety critical programs where you want to be sure you'll never run out of memory.
Just because something looks prettier does not make it a better choice.
VLAs have a long list of problems, not the least of which they are not a sufficient replacement for heap-allocated memory.
The primary -- and most significant -- reason is that VLAs are not persistent dynamic data. That is, once your function terminates, the data is reclaimed (it exists on the stack, of all places!), meaning any other code still hanging on to it are SOL.
Your example code doesn't run into this problem because you aren't using it outside of the local context. Go ahead and try to use a VLA to build a binary tree, then add a node, then create a new tree and try to print them both.
The next issue is that the stack is not an appropriate place to allocate large amounts of dynamic data -- it is for function frames, which have a limited space to begin with. The global memory pool, OTOH, is specifically designed and optimized for this kind of usage.
It is good to ask questions and try to understand things. Just be careful that you don't believe yourself smarter than the many, many people who took what now is nearly 80 years of experience to design and implement systems that quite literally run the known universe. Such an obvious flaw would have been immediately recognized long, long ago and removed before either of us were born.
VLAs have their place, but it is, alas, small.
Declaring local variables takes the memory from the stack. This has two ramifications.
That memory is destroyed once the function returns.
Stack memory is limited, and is used for all local variables, as well as function return addresses. If you allocate large amounts of memory, you'll run into problems. Only use it for small amounts of memory.
When you have the following in your function code:
int int_arr[n];
It means you allocated space on the function stack, once the function will return this stack will cease to exist.
Image a use case where you need to return a data structure to a caller, for example:
Car* create_car(string model, string make)
{
Car* new_car = malloc(sizeof(*car));
...
return new_car;
}
Now, once the function will finish you will still have your car object, because it was allocated on the heap.
The memory allocated by int int_arr[n] is reserved only until execution of the routine ends (when it returns or is otherwise terminated, as by setjmp). That means you cannot allocate things in one order and free them in another. You cannot allocate a temporary work buffer, use it while computing some data, then allocate another buffer for the results, and free the temporary work buffer. To free the work buffer, you have to return from the function, and then the result buffer will be freed to.
With automatic allocations, you cannot read from a file, allocate records for each of the things read from the file, and then delete some of the records out of order. You simply have no dynamic control over the memory allocated; automatic allocations are forced into a strictly last-in first-out (LIFO) order.
You cannot write subroutines that allocate memory, initialize it and/or do other computations, and return the allocated memory to their callers.
(Some people may also point out that the stack memory commonly used for automatic objects is commonly limited to 1-8 mebibytes while the memory used for dynamic allocation is generally much larger. However, this is an artifact of settings selected for common use and can be changed; it is not inherent to the nature of automatic versus dynamic allocation.)
If the allocated memory is small and used only inside the function, malloc is indeed unnecessary.
If the memory amount is extremely large (usually MB or more), the above example may cause stack overflow.
If the memory is still used after the function returned, you need malloc or global variable (static allocation).
Note that the dynamic allocation through local variables as above may not be supported in some compiler.

Repeatedly allocate memory without freeing it

The following code shows an example that repeatedly allocates memory without first calling free. Instead, it frees **sign after the loop.
#include <stdio.h>
#include <stdlib.h>
float ** fun(int nloc)
{
float **sign;
int i,nt=100,it,k;
sign=(float **)calloc(nloc,sizeof(float *));
for (i=0;i<nloc;i++)
sign[i] = (float *)calloc(nt,sizeof(float *));
for (it=0;it<nt;it++){
for (k=0;k<nloc;k++){
sign[k][it]=it*0.2;
}
}
return sign;
}
int main(int argc, char *argv[])
{
int i,isrc,n=3,nloc=1;
float **sign=NULL;
for (isrc=0;isrc<n;isrc++){
sign = fun(nloc);
}
for (i=0;i<nloc;i++){
free(sign[i]);
}
free(sign);
exit(0);
}
This is a correct code snippet. My question is: why is it legal that we can allocate memory for a pointer in each iteration without having to free it first?
[Supplementary message]:
Hi all, I think there's one case we cannot free memory in the loop. If buffer=p and p is defined outside the loop, like:
float *buffer, *p;
/* Elements of p calculated */
for (...){
/* allocate memory for **buffer */
buffer = p;
free(buffer)
/* if free here will cause p lost */
}
If free buffer at the end of each loop, it may cause p lost because buffer and p share the same memory address.
why is it legal that we can allocate memory for a pointer in each iteration without having to free it first?
The responsibility of freeing dynamically allocated memory is left on the programmer. It is legal because the compiler does not enforce it, although there are code checking tools that can flag this problem.
Freeing dynamically allocated memory should be done in the reverse order of allocation. For ex:
for (i=0;i<nloc;i++)
free(sign[i]);
free(sign);
It is legal because in C, you as the programmer are responsible for memory management. There is a very good reason for this. To quote another thread
Garbage collection requires data structures for tracking allocations
and/or reference counting. These create overhead in memory,
performance, and the complexity of the language. C++ is designed to be
"close to the metal", in other words, it takes the higher performance
side of the tradeoff vs convenience features. Other languages make
that tradeoff differently. This is one of the considerations in
choosing a language, which emphasis you prefer.
Not only is performance and code size a consideration, but different systems have difference addressing schemes for memory. It is also for this reason that C is easy to port and maintain across platforms, given that there is no garbage collection to alter.
EDIT: Some answers mention freeing memory space as opposed to the pointer itself, it is worth further specifying what that means: free() simply marks the allocated space as available, it is not 'freed' or erased, nor does any other operation occur on that memory space. It is then still incumbent on the programmer to delete the address that was assigned to the pointer variable.
My question is: why is it legal that we can allocate memory for a pointer in each iteration without having to free it first?
Short answer
C trades away safety to gain simplicity and performance.
Longer answer
You don't free the pointer. You free the memory block the pointer is pointing at.
Think about what malloc (and calloc) does. They both allocate a piece of memory, and if the allocation is successful they return a pointer to that memory block. The allocation functions (like all functions) has no insight, nor control whatsoever of what you do with the return value. The functions does not even see the pointer that you are assigning the return value to.
It would be fairly complex (relatively speaking, C has a pretty simple structure) to implement a protection against it. First, consider this code:
int * a = malloc(1);
int * b = a;
a = malloc(1);
free(b);
free(a);
This code has no memory leaks, even though we did the precise thing you asked about. We reassigned a before calling free upon the memory from the first malloc. It works fine, because we have saved the address in b.
So in order to disallow reassigning pointers that points to a memory block that no other pointer points at, the runtime environment would need to keep track of this, and it is not entirely trivial. And it would also need to create extra code to handle all this. Since this check needs to be done at runtime, it may affect performance.
Also, remember that even though C may seem very low level today, it was considered a high level language when it came.

Questions about dynamic memory allocation in C

What is the difference between
int size;
int *arr;
scanf("%i", &size);
arr = malloc(size * sizeof(*arr));
and
int size;
scanf("%i", &size);
int arr[size];
When I want to allocate memory for 2 big numbers i would use the
following code:
unsigned long *big_nums;
big_nums = malloc(2 * sizeof(*big_nums));
I would access the first big bumber using big_nums[0] an the seond
one with big_nums[1]. Let's say unsigned long is 4 bytes big,
then the code would allocate 2 * 4 = 8 bytes. Let's say I do
something like this:
unsigned long *big_nums;
big_nums = malloc(7);
Using big_nums[0] is clear for me, but how about big_nums[1]? Will
it cause some kind of segmentation fault error or what?
There are two places to get memory from: the stack and the heap. The stack is where you allocate short lived things, and the heap is for allocating long term things.
malloc() allocates from the heap, and int arr[size] allocates from the stack.
When your function exits, arr[size] will be disposed of automatically, but malloc() will not. This leads to what's called "memory leaks".
big_nums = malloc(7);
will indeed be an error if you access big_nums[1]. In general the standard says behavior is "undefined" which means it could work, or might not.
For Q#1: The second version will (try to) allocate a variable-length array on the stack. In C99 onwards, this is possible; but in traditional C variable-length arrays don't exist, and you must roll them yourself using malloc.
For Q#2: You will be allowed to make that error. And when you write to the second element of the array, you will overwrite one byte that does not "belong" to you.
My guess is that in most cases, nothing bad will happen because malloc(7) will secretly be equivalent to malloc(8). But there is NO GUARANTEE of this. Anything could happen, including a segfault or something worse.
By the way, if you have two separate questions, it would be best to write them up as two separate questions. You get more points way.

Why is not freeing memory bad practice?

int a = 0;
int *b = malloc (sizeof(int));
b = malloc (sizeof(int));
The above code is bad because it allocates memory on the heap and then doesn't free it, meaning you lose access to it. But you also created 'a' and never used it, so you also allocated memory on the stack, which isn't freed until the scope ends.
So why is it bad practice to not free memory on the heap but okay for memory on the stack to not be freed (until the scope ends)?
Note: I know that memory on the stack can't be freed, I want to know why its not considered bad.
The stack memory will get released automatically when the scope ends. The memory allocated on the heap will remain occupied unless you release it explicitly. As an example:
void foo(void) {
int a = 0;
void *b = malloc(1000);
}
for (int i=0; i<1000; i++) {
foo();
}
Running this code will decrease the available memory by 1000*1000 bytes required by b, whereas the memory required by a will always get released automatically when you return from the foo call.
Simple: Because you'll leak memory. And memory leaks are bad. Leaks: bad, free: good.
When calling malloc or calloc, or indeed any *alloc function, you're claiming a chunk of memory (the size of which is defined by the arguments passed to the allocating function).
Unlike stack variables, which reside in a portion of memory the program has, sort of, free reign over, the same rules don't apply to heap memory. You may need to allocate heap memory for any number of reasons: the stack isn't big enough, you need an array of pointers, but have no way of knowing how big this array will need to be at compile time, you need to share some chunk of memory (threading nightmares), a struct that requires the members to be set at various places (functions) in your program...
Some of these reasons, by their very nature, imply that the memory can't be freed as soon as pointer to that memory goes out of scope. Another pointer might still be around, in another scope, that points to the same block of memory.
There is, though, as mentioned in one of the comments, a slight drawback to this: heap memory requires not just more awareness on the programmers part, but it's also more expensive, and slower than working on the stack.
So some rules of thumb are:
You claimed the memory, so you take care of it... you make sure it's freed when you're done playing around with it.
Don't use heap memory without a valid reason. Avoiding stack overflow, for example, is a valid reason.
Anyway,
Some examples:
Stack overflow:
#include <stdio.h>
int main()
{
int foo[2000000000];//stack overflow, array is too large!
return 0;
}
So, here we've depleted the stack, we need to allocate the memory on the heap:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *foo= malloc(2000000000*sizeof(int));//heap is bigger
if (foo == NULL)
{
fprintf(stderr, "But not big enough\n");
}
free(foo);//free claimed memory
return 0;
}
Or, an example of an array, whose length depends on user input:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *arr = NULL;//null pointer
int arrLen;
scanf("%d", &arrLen);
arr = malloc(arrLen * sizeof(int));
if (arr == NULL)
{
fprintf(stderr, "Not enough heap-mem for %d ints\n", arrLen);
exit ( EXIT_FAILURE);
}
//do stuff
free(arr);
return 0;
}
And so the list goes on... Another case where malloc or calloc is useful: An array of strings, that all might vary in size. Compare:
char str_array[20][100];
In this case str_array is an array of 20 char arrays (or strings), each 100 chars long. But what if 100 chars is the maximum you'll ever need, and on average, you'll only ever use 25 chars, or less?
You're writing in C, because it's fast and your program won't use any more resources than it actually needs? Then this isn't what you actually want to be doing. More likely, you want:
char *str_array[20];
for (int i=0;i<20;++i) str_array[i] = malloc((someInt+i)*sizeof(int));
Now each element in the str_array has exactly the amount of memory I need allocated too it. That's just way more clean. However, in this case calling free(str_array) won't cut it. Another rule of thumb is: Each alloc call has to have a free call to match it, so deallocating this memory looks like this:
for (i=0;i<20;++i) free(str_array[i]);
Note:
Dynamically allocated memory isn't the only cause for mem-leaks. It has to be said. If you read a file, opening a file pointer using fopen, but failing to close that file (fclose) will cause a leak, too:
int main()
{//LEAK!!
FILE *fp = fopen("some_file.txt", "w");
if (fp == NULL) exit(EXIT_FAILURE);
fwritef(fp, "%s\n", "I was written in a buggy program");
return 0;
}
Will compile and run just fine, but it will contain a leak, that is easily plugged (and it should be plugged) by adding just one line:
int main()
{//OK
FILE *fp = fopen("some_file.txt", "w");
if (fp == NULL) exit(EXIT_FAILURE);
fwritef(fp, "%s\n", "I was written in a bug-free(?) program");
fclose(fp);
return 0;
}
As an asside: if the scope is really long, chances are you're trying to cram too much into a single function. Even so, if you're not: you can free up claimed memory at any point, it needn't be the end of the current scope:
_Bool some_long_f()
{
int *foo = malloc(2000000000*sizeof(int));
if (foo == NULL) exit(EXIT_FAILURE);
//do stuff with foo
free(foo);
//do more stuff
//and some more
//...
//and more
return true;
}
Because stack and heap, mentioned many times in the other answers, are sometimes misunderstood terms, even amongst C programmers, Here is a great conversation discussing that topic....
So why is it bad practice to not free memory on the heap but okay for memory on the stack to not be freed (until the scope ends)?
Memory on the stack, such as memory allocated to automatic variables, will be automatically freed upon exiting the scope in which they were created.
whether scope means global file, or function, or within a block ( {...} ) within a function.
But memory on the heap, such as that created using malloc(), calloc(), or even fopen() allocate memory resources that will not be made available to any other purpose until you explicity free them using free(), or fclose()
To illustrate why it is bad practice to allocate memory without freeing it, consider what would happen if an application were designed to run autonomously for very long time, say that application was used in the PID loop controlling the cruise control on your car. And, in that application there was un-freed memory, and that after 3 hours of running, the memory available in the microprocessor is exhausted, causing the PID to suddenly rail. "Ah!", you say, "This will never happen!" Yes, it does. (look here). (not exactly the same problem, but you get the idea)
If that word picture doesn't do the trick, then observe what happens when you run this application (with memory leaks) on your own PC. (at least view the graphic below to see what it did on mine)
Your computer will exhibit increasingly sluggish behavior until it eventually stops working. Likely, you will be required to re-boot to restore normal behavior.
(I would not recommend running it)
#include <ansi_c.h>
char *buf=0;
int main(void)
{
long long i;
char text[]="a;lskdddddddd;js;'";
buf = malloc(1000000);
strcat(buf, "a;lskdddddddd;js;dlkag;lkjsda;gkl;sdfja;klagj;aglkjaf;d");
i=1;
while(strlen(buf) < i*1000000)
{
strcat(buf,text);
if(strlen(buf) > (i*10000) -10)
{
i++;
buf = realloc(buf, 10000000*i);
}
}
return 0;
}
Memory usage after just 30 seconds of running this memory pig:
I guess that has to do with scope 'ending' really often (at the end of a function) meaning if you return from that function creating a and allocating b, you will have freed in a sense the memory taken by a, and lost for the remainder of the execution memory used by b
Try calling that function a a handful of times, and you'll soon exhaust all of your memory. This never happens with stack variables (except in the case of a defectuous recursion)
Memory for local variables automatically is reclaimed when the function is left (by resetting the frame pointer).
The problem is that memory you allocate on the heap never gets freed until your program ends, unless you explicitly free it. That means every time you allocate more heap memory, you reduce available memory more and more, until eventually your program runs out (in theory).
Stack memory is different because it's laid-out and used in a predictable pattern, as determined by the compiler. It expands as needed for a given block, then contracts when the block ends.
So why is it bad practice to not free memory on the heap but okay for memory on the stack to not be freed (until the scope ends)?
Imagine the following:
while ( some_condition() )
{
int x;
char *foo = malloc( sizeof *foo * N );
// do something interesting with x and foo
}
Both x and foo are auto ("stack") variables. Logically speaking, a new instance for each is created and destroyed in each loop iteration1; no matter how many times this loop runs, the program will only allocate enough memory for a single instance of each.
However, each time through the loop, N bytes are allocated from the heap, and the address of those bytes is written to foo. Even though the variable foo ceases to exist at the end of the loop, that heap memory remains allocated, and now you can't free it because you've lost the reference to it. So each time the loop runs, another N bytes of heap memory is allocated. Over time, you run out of heap memory, which may cause your code to crash, or even cause a kernel panic depending on the platform. Even before then, you may see degraded performance in your code or other processes running on the same machine.
For long-running processes like Web servers, this is deadly. You always want to make sure you clean up after yourself. Stack-based variables are cleaned up for you, but you're responsible for cleaning up the heap after you're done.
1. In practice, this (usually) isn't the case; if you look at the generated machine code, you'll (usually) see the stack space allocated for x and foo at function entry. Usually, space for all local variables (regardless of their scope within the function) is allocated at once.

malloc non-deterministic behaviour

#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *arr = (int*)malloc(10);
int i;
for(i=0;i<100;i++)
{
arr[i]=i;
printf("%d", arr[i]);
}
return 0;
}
I am running above program and a call to malloc will allocate 10 bytes of memory and since each int variable takes up 2 bytes so in a way I can store 5 int variables of 2 bytes each thus making up my total 10 bytes which I dynamically allocated.
But on making a call to for-loop it is allowing me to enter values even till 99th index and storing all these values as well. So in a way if I am storing 100 int values it means 200 bytes of memory whereas I allocated only 10 bytes.
So where is the flaw with this code or how does malloc behave? If the behaviour of malloc is non-deterministic in such a manner then how do we achieve proper dynamic memory handling?
The flaw is in your expectations. You lied to the compiler: "I only need 10 bytes" when you actually wrote 100*sizeof(int) bytes. Writing beyond an allocated area is undefined behavior and anything may happen, ranging from nothing to what you expect to crashes.
If you do silly things expect silly behaviour.
That said malloc is usually implemented to ask the OS for chunks of memory that the OS prefers (like a page) and then manages that memory. This speeds up future mallocs especially if you are using lots of mallocs with small sizes. It reduces the number of context switches that are quite expensive.
First of all, in the most Operating Systems the size of int is 4 bytes. You can check that with:
printf("the size of int is %d\n", sizeof(int));
When you call the malloc function you allocate size at heap memory. The heap is a set aside for dynamic allocation. There's no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time. Because your program is small and you have no collision in the heap you can run this for with more values that 100 and it runs too.
When you know what are you doing with malloc then you build programs with proper dynamic memory handling. When your code has improper malloc allocation then the behaviour of the program is "unknown". But you can use gdb debugger to find where the segmentation will be revealed and how the things are in heap.
malloc behaves exactly as it states, allocates n number bytes of memory, nothing more. Your code might run on your PC, but operating on non-allocated memory is undefined behavior.
A small note...
Int might not be 2 bytes, it varies on different architectures/SDKs. When you want to allocate memory for n integer elements, you should use malloc( n * sizeof( int ) ).
All in short, you manage dynamic memory with other tools that the language provides ( sizeof, realloc, free, etc. ).
C doesn't do any bounds-checking on array accesses; if you define an array of 10 elements, and attempt to write to a[99], the compiler won't do anything to stop you. The behavior is undefined, meaning the compiler isn't required to do anything in particular about that situation. It may "work" in the sense that it won't crash, but you've just clobbered something that may cause problems later on.
When doing a malloc, don't think in terms of bytes, think in terms of elements. If you want to allocate space for N integers, write
int *arr = malloc( N * sizeof *arr );
and let the compiler figure out the number of bytes.

Resources