Today I was helping a friend of mine with some C code, and I've found some strange behavior that I couldn't explain him why it was happening. We had TSV file with a list of integers, with an int each line. The first line was the number of lines the list had.
We also had a c file with a very simple "readfile". The first line was read to n, the number of lines, then there was an initialization of:
int list[n]
and finally a for loop of n with a fscanf.
For small n's (till ~100.000), everything was fine. However, we've found that when n was big (10^6), a segfault would occur.
Finally, we changed the list initialization to
int *list = malloc(n*sizeof(int))
and everything when well, even with very large n.
Can someone explain why this occurred? what was causing the segfault with int list[n], that was stopped when we start using list = malloc(n*sizeof(int))?
There are several different pieces at play here.
The first is the difference between declaring an array as
int array[n];
and
int* array = malloc(n * sizeof(int));
In the first version, you are declaring an object with automatic storage duration. This means that the array lives only as long as the function that calls it exists. In the second version, you are getting memory with dynamic storage duration, which means that it will exist until it is explicitly deallocated with free.
The reason that the second version works here is an implementation detail of how C is usually compiled. Typically, C memory is split into several regions, including the stack (for function calls and local variables) and the heap (for malloced objects). The stack typically has a much smaller size than the heap; usually it's something like 8MB. As a result, if you try to allocate a huge array with
int array[n];
Then you might exceed the stack's storage space, causing the segfault. On the other hand, the heap usually has a huge size (say, as much space as is free on the system), and so mallocing a large object won't cause an out-of-memory error.
In general, be careful with variable-length arrays in C. They can easily exceed stack size. Prefer malloc unless you know the size is small or that you really only do want the array for a short period of time.
int list[n]
Allocates space for n integers on the stack, which is usually pretty small. Using memory on the stack is much faster than the alternative, but it is quite small and it is easy to overflow the stack (i.e. allocate too much memory) if you do things like allocate huge arrays or do recursion too deeply. You do not have to manually deallocate memory allocated this way, it is done by the compiler when the array goes out of scope.
malloc on the other hand allocates space in the heap, which is usually very large compared to the stack. You will have to allocate a much larger amount of memory on the heap to exhaust it, but it is a lot slower to allocate memory on the heap than it is on the stack, and you must deallocate it manually via free when you are done using it.
int list[n] stores the data in the stack, while malloc stores it in the heap.
The stack is limited, and there is not much space, while the heap is much much bigger.
int list[n] is a VLA, which allocates on the stack instead of on the heap. You don't have to free it (it frees automatically at the end of the function call) and it allocates quickly but the storage space is very limited, as you have discovered. You must allocate larger values on the heap.
This declaration allocates memory on the stack
int list[n]
malloc allocates on the heap.
Stack size is usually smaller than heap, so if you allocate too much memory on the stack you get a stackoverflow.
See also this answer for further information
Assuming you have a typical implementation in your implementation it's most likely that:
int list[n]
allocated list on your stack, where as:
int *list = malloc(n*sizeof(int))
allocated memory on your heap.
In the case of a stack there is typically a limit to how large these can grow (if they can grow at all). In the case of a heap there is still a limit, but that tends to be much largely and (broadly) constrained by your RAM+swap+address space which is typically at least an order of magnitude larger, if not more.
If you are on linux, you can set ulimit -s to a larger value and this might work for stack allocation also.
When you allocate memory on stack, that memory remains till the end of your function's execution. If you allocate memory on heap(using malloc), you can free the memory any time you want(even before the end of your function's execution).
Generally, heap should be used for large memory allocations.
When you allocate using a malloc, memory is allocated from heap and not from stack, which is much more limited in size.
int array[n];
It is an example of statically allocated array and at the compile time the size of the array will be known. And the array will be allocated on the stack.
int *array(malloc(sizeof(int)*n);
It is an example of dynamically allocated array and the size of the array will be known to user at the run time. And the array will be allocated on the heap.
Related
I was curious with this:
What is the diference between:
const int MAX_BUF = 1000;
char* Buffer = malloc(MAX_BUF);
and:
char Buffer[MAX_BUF];
Case 1: In
char Buffer[MAX_BUF];
Buffer is an array of size MAX_BUF. The allocation technique is called VLA.
Case 2: In
const int MAX_BUF = 1000;
char* Buffer = malloc(MAX_BUF);
Buffer is a pointer which is allocated a memory of size MAX_BUF which is 1000.
and, an array is not the same as a pointer, and C-FAQ has a Very Good collection detailing the reasons.
The major difference, in terms of usability and behaviour are:
(1) is on stack, usually Note, while (2) is on heap, always.
(1) has fixed size once allocated, (2) can be resized.
(1) is allocated when the enclosing function is called and has the block scope OTOH, (2) is allocated memory dynamically, at runtime and the returned memory has a lifetime which extends from the allocation until the deallocation.
(1) allocated memory need not be managed by programmer, while in (2) all malloc()d memory should be free()d. [Courtesy: Giorgi]
Note: Wiki
For example, the GNU C Compiler allocates memory for VLAs on the stack.
I will add a bit info in terms of memory management, in addition to what others said.
1) The main difference is here:
const int MAX_BUF = 1000;
char* Buffer = malloc(MAX_BUF);
You need to manage the allocated memory manually, e.g., free Buffer when you are done using it. Forgetting to free it (or freeing it twice) may lead to trouble.
2) With the second case:
char Buffer[MAX_BUF];
You don't need to free anything. It will get destroyed automatically. Hence you avoid the task of handling the memory - which is good.
You should try to evaluate always which approach you need.
Some points.
Since second is allocated on stack, the first approach is taken also when large array needs to be created - since more memory is usually available on the heap.
Also if you create array using second approach for example in the method, the life time of the object will be that method - you will not be able to use that array outside that method. Whereas with dynamic allocation that is not the case.
char* Buffer = malloc(MAX_BUF);
creates a char pointer Buffer, dynamically allocates MAX_BUF bytes of memory via the malloc and makes Buffer point to the start of the allocated space. This memory is allocated on the heap.
char Buffer[MAX_BUF];
creates an array Buffer of size MAX_BUF which can hold a maximum of MAX_BUF characters. Note that you are creating a Variable Length Array (a feature introduced in C99) since MAX_BUF is a variable. This array may be created on the stack.
The most notable difference is scope. The VLA array will only be valid inside the scope where it is declared, while a dynamic array will be available everywhere in the program until you call free().
In practice, VLAs may be faster than dynamic memory, in case the compiler use stack allocation for the VLA. It is however not specified by the C standard where a VLA is allocated.
(A compiler could in theory allocate a VLA on the heap, but then the compiler would also be responsible for the clean-up. I don't think any such solutions exist. Every compiler I've used always declare VLAs on the stack.)
This means that VLAs are unsuitable to hold large amounts of data: you would risk stack overflow. This is not a concern when you are using dynamic memory however.
VLAs don't have the same portability as dynamic arrays, since awfully old compilers don't support VLAs. In theory, new C11 compilers don't have to suport VLAs either, though at this point I know of no compiler has been stupid enough to drop that support.
Comparison/summary:
VLAs should be used when there are small amounts of local data, because they have fast allocation time and automatic clean-up.
Dynamic arrays should be used when there are large amounts of data, to prevent stack overflow.
Dynamic arrays should be used when the data needs to persist after the execution of a function and be available elsewhere in the program.
Dynamic arrays should be used when you have exceptional and/or irrational portability requirements.
From what I understand, the malloc function takes a variable and allocates memory as asked. In this case, it will ask the compiler to prepare memory in order to fit the equivalence of twenty double variables. Is my way of understanding it correctly, and why must it be used?
double *q;
q=(double *)malloc(20*sizeof(double));
for (i=0;i<20; i++)
{
*(q+i)= (double) rand();
}
You don't have to use malloc() when:
The size is known at compile time, as in your example.
You are using C99 or C2011 with VLA (variable length array) support.
Note that malloc() allocates memory at runtime, not at compile time. The compiler is only involved to the extent that it ensures the correct function is called; it is malloc() that does the allocation.
Your example mentions 'equivalence of ten integers'. It is very seldom that 20 double occupy the same space as 10 int. Usually, 10 double will occupy the same space as 20 int (when sizeof(int) == 4 and sizeof(double) == 8, which is a very commonly found setting).
It's used to allocate memory at run-time rather than compile-time. So if your data arrays are based on some sort of input from the user, database, file, etc. then malloc must be used once the desired size is known.
The variable q is a pointer, meaning it stores an address in memory. malloc is asking the system to create a section of memory and return the address of that section of memory, which is stored in q. So q points to the starting location of the memory you requested.
Care must be taken not to alter q unintentionally. For instance, if you did:
q = (double *)malloc(20*sizeof(double));
q = (double *)malloc(10*sizeof(double));
you will lose access to the first section of 20 double's and introduce a memory leak.
When you use malloc you are asking the system "Hey, I want this many bytes of memory" and then he will either say "Sorry, I'm all out" or "Ok! Here is an address to the memory you wanted. Don't lose it".
It's generally a good idea to put big datasets in the heap (where malloc gets your memory from) and a pointer to that memory on the stack (where code execution takes place). This becomes more important on embedded platforms where you have limited memory. You have to decide how you want to divvy up the physical memory between the stack and heap. Too much stack and you can't dynamically allocate much memory. Too little stack and you can function call your way right out of it (also known as a stack overflow :P)
As the others said, malloc is used to allocate memory. It is important to note that malloc will allocate memory from the heap, and thus the memory is persistent until it is free'd. Otherwise, without malloc, declaring something like double vals[20] will allocate memory on the stack. When you exit the function, that memory is popped off of the stack.
So for example, say you are in a function and you don't care about the persistence of values. Then the following would be suitable:
void some_function() {
double vals[20];
for(int i = 0; i < 20; i++) {
vals[i] = (double)rand();
}
}
Now if you have some global structure or something that stores data, that has a lifetime longer than that of just the function, then using malloc to allocate that memory from the heap is required (alternatively, you can declare it as a global variable, and the memory will be preallocated for you).
In you example, you could have declared double q[20]; without the malloc and it would work.
malloc is a standard way to get dynamically allocated memory (malloc is often built above low-level memory acquisition primitives like mmap on Linux).
You want to get dynamically allocated memory resources, notably when the size of the allocated thing (here, your q pointer) depends upon runtime parameters (e.g. depends upon input). The bad alternative would be to allocate all statically, but then the static size of your data is a strong built-in limitation, and you don't like that.
Dynamic resource allocation enables you to run the same program on a cheap tablet (with half a gigabyte of RAM) and an expensive super-computer (with terabytes of RAM). You can allocate different size of data.
Don't forget to test the result of malloc; it can fail by returning NULL. At the very least, code:
int* q = malloc (10*sizeof(int));
if (!q) {
perror("q allocation failed");
exit(EXIT_FAILURE);
};
and always initialize malloc-ed memory (you could prefer using calloc which zeroes the allocated memory).
Don't forget to later free the malloc-ed memory. On Linux, learn about using valgrind. Be scared of memory leaks and dangling pointers. Recognize that the liveness of some data is a non-modular property of the entire program. Read about garbage collection!, and consider perhaps using Boehm's conservative garbage collector (by calling GC_malloc instead of malloc).
You use malloc() to allocate memory dynamically in C. (Allocate the memory at the run time)
You use it because sometimes you don't know how much memory you'll use when you write your program.
You don't have to use it when you know thow many elements the array will hold at compile time.
Another important thing to notice that if you want to return an array from a function, you will want to return an array which was not defined inside the function on the stack. Instead, you'll want to dynamically allocate an array (on the heap) and return a pointer to this block:
int *returnArray(int n)
{
int i;
int *arr = (int *)malloc(sizeof(int) * n);
if (arr == NULL)
{
return NULL;
}
//...
//fill the array or manipulate it
//...
return arr; //return the pointer
}
Today I was helping a friend of mine with some C code, and I've found some strange behavior that I couldn't explain him why it was happening. We had TSV file with a list of integers, with an int each line. The first line was the number of lines the list had.
We also had a c file with a very simple "readfile". The first line was read to n, the number of lines, then there was an initialization of:
int list[n]
and finally a for loop of n with a fscanf.
For small n's (till ~100.000), everything was fine. However, we've found that when n was big (10^6), a segfault would occur.
Finally, we changed the list initialization to
int *list = malloc(n*sizeof(int))
and everything when well, even with very large n.
Can someone explain why this occurred? what was causing the segfault with int list[n], that was stopped when we start using list = malloc(n*sizeof(int))?
There are several different pieces at play here.
The first is the difference between declaring an array as
int array[n];
and
int* array = malloc(n * sizeof(int));
In the first version, you are declaring an object with automatic storage duration. This means that the array lives only as long as the function that calls it exists. In the second version, you are getting memory with dynamic storage duration, which means that it will exist until it is explicitly deallocated with free.
The reason that the second version works here is an implementation detail of how C is usually compiled. Typically, C memory is split into several regions, including the stack (for function calls and local variables) and the heap (for malloced objects). The stack typically has a much smaller size than the heap; usually it's something like 8MB. As a result, if you try to allocate a huge array with
int array[n];
Then you might exceed the stack's storage space, causing the segfault. On the other hand, the heap usually has a huge size (say, as much space as is free on the system), and so mallocing a large object won't cause an out-of-memory error.
In general, be careful with variable-length arrays in C. They can easily exceed stack size. Prefer malloc unless you know the size is small or that you really only do want the array for a short period of time.
int list[n]
Allocates space for n integers on the stack, which is usually pretty small. Using memory on the stack is much faster than the alternative, but it is quite small and it is easy to overflow the stack (i.e. allocate too much memory) if you do things like allocate huge arrays or do recursion too deeply. You do not have to manually deallocate memory allocated this way, it is done by the compiler when the array goes out of scope.
malloc on the other hand allocates space in the heap, which is usually very large compared to the stack. You will have to allocate a much larger amount of memory on the heap to exhaust it, but it is a lot slower to allocate memory on the heap than it is on the stack, and you must deallocate it manually via free when you are done using it.
int list[n] stores the data in the stack, while malloc stores it in the heap.
The stack is limited, and there is not much space, while the heap is much much bigger.
int list[n] is a VLA, which allocates on the stack instead of on the heap. You don't have to free it (it frees automatically at the end of the function call) and it allocates quickly but the storage space is very limited, as you have discovered. You must allocate larger values on the heap.
This declaration allocates memory on the stack
int list[n]
malloc allocates on the heap.
Stack size is usually smaller than heap, so if you allocate too much memory on the stack you get a stackoverflow.
See also this answer for further information
Assuming you have a typical implementation in your implementation it's most likely that:
int list[n]
allocated list on your stack, where as:
int *list = malloc(n*sizeof(int))
allocated memory on your heap.
In the case of a stack there is typically a limit to how large these can grow (if they can grow at all). In the case of a heap there is still a limit, but that tends to be much largely and (broadly) constrained by your RAM+swap+address space which is typically at least an order of magnitude larger, if not more.
If you are on linux, you can set ulimit -s to a larger value and this might work for stack allocation also.
When you allocate memory on stack, that memory remains till the end of your function's execution. If you allocate memory on heap(using malloc), you can free the memory any time you want(even before the end of your function's execution).
Generally, heap should be used for large memory allocations.
When you allocate using a malloc, memory is allocated from heap and not from stack, which is much more limited in size.
int array[n];
It is an example of statically allocated array and at the compile time the size of the array will be known. And the array will be allocated on the stack.
int *array(malloc(sizeof(int)*n);
It is an example of dynamically allocated array and the size of the array will be known to user at the run time. And the array will be allocated on the heap.
I basically have this piece of code.
char (* text)[1][80];
text = calloc(2821522,80);
The way I calculated it, that calloc should have allocated 215.265045 megabytes of RAM, however, the program in the end exceeded that number and allocated nearly 700mb of ram.
So it appears I cannot properly know how much memory that function will allocate.
How does one calculate that propery?
calloc (and malloc for that matter) is free to allocate as much space as it needs to satisfy the request.
So, no, you cannot tell in advance how much it will actually give you, you can only assume that it's given you the amount you asked for.
Having said that, 700M seems a little excessive so I'd be investigating whether the calloc was solely responsible for that by, for example, a program that only does the calloc and nothing more.
You might also want to investigate how you're measuring that memory usage.
For example, the following program:
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
int main (void) {
char (* text)[1][80];
struct mallinfo mi;
mi = mallinfo(); printf ("%d\n", mi.uordblks);
text = calloc(2821522,80);
mi = mallinfo(); printf ("%d\n", mi.uordblks);
return 0;
}
outputs, on my system:
66144
225903256
meaning that the calloc has allocated 225,837,112 bytes which is only a smidgeon (115,352 bytes or 0.05%) above the requested 225,721,760.
Well it depends on the underlying implementation of malloc/calloc.
It generally works like this - there's this thing called the heap pointer which points to the top of the heap - the area from where dynamic memory gets allocated. When memory is first allocated, malloc internally requests x amount of memory from the kernel - i.e. the heap pointer increments by a certain amount to make that space available. That x may or may not be equal to the size of the memory block you requested (it might be larger to account for future mallocs). If it isn't, then you're given at least the amount of memory you requested(sometimes you're given more memory because of alignment issues). The rest is made part of an internal free list maintained by malloc. To sum it up malloc has some underlying data structures and a lot depends on how they are implemented.
My guess is that the x amount of memory was larger (for whatever reason) than you requested and hence malloc/calloc was holding on to the rest in its free list. Try allocating some more memory and see if the footprint increases.
int numbers*;
numbers = malloc ( sizeof(int) * 10 );
I want to know how is this dynamic memory allocation, if I can store just 10 int items to the memory block ? I could just use the array and store elemets dynamically using index. Why is the above approach better ?
I am new to C, and this is my 2nd day and I may sound stupid, so please bear with me.
In this case you could replace 10 with a variable that is assigned at run time. That way you can decide how much memory space you need. But with arrays, you have to specify an integer constant during declaration. So you cannot decide whether the user would actually need as many locations as was declared, or even worse , it might not be enough.
With a dynamic allocation like this, you could assign a larger memory location and copy the contents of the first location to the new one to give the impression that the array has grown as needed.
This helps to ensure optimum memory utilization.
The main reason why malloc() is useful is not because the size of the array can be determined at runtime - modern versions of C allow that with normal arrays too. There are two reasons:
Objects allocated with malloc() have flexible lifetimes;
That is, you get runtime control over when to create the object, and when to destroy it. The array allocated with malloc() exists from the time of the malloc() call until the corresponding free() call; in contrast, declared arrays either exist until the function they're declared in exits, or until the program finishes.
malloc() reports failure, allowing the program to handle it in a graceful way.
On a failure to allocate the requested memory, malloc() can return NULL, which allows your program to detect and handle the condition. There is no such mechanism for declared arrays - on a failure to allocate sufficient space, either the program crashes at runtime, or fails to load altogether.
There is a difference with where the memory is allocated. Using the array syntax, the memory is allocated on the stack (assuming you are in a function), while malloc'ed arrays/bytes are allocated on the heap.
/* Allocates 4*1000 bytes on the stack (which might be a bit much depending on your system) */
int a[1000];
/* Allocates 4*1000 bytes on the heap */
int *b = malloc(1000 * sizeof(int))
Stack allocations are fast - and often preferred when:
"Small" amount of memory is required
Pointer to the array is not to be returned from the function
Heap allocations are slower, but has the advantages:
Available heap memory is (normally) >> than available stack memory
You can freely pass the pointer to the allocated bytes around, e.g. returning it from a function -- just remember to free it at some point.
A third option is to use statically initialized arrays if you have some common task, that always requires an array of some max size. Given you can spare the memory statically consumed by the array, you avoid the hit for heap memory allocation, gain the flexibility to pass the pointer around, and avoid having to keep track of ownership of the pointer to ensure the memory is freed.
Edit: If you are using C99 (default with the gnu c compiler i think?), you can do variable-length stack arrays like
int a = 4;
int b[a*a];
In the example you gave
int *numbers;
numbers = malloc ( sizeof(int) * 10 );
there are no explicit benefits. Though, imagine 10 is a value that changes at runtime (e.g. user input), and that you need to return this array from a function. E.g.
int *aFunction(size_t howMany, ...)
{
int *r = malloc(sizeof(int)*howMany);
// do something, fill the array...
return r;
}
The malloc takes room from the heap, while something like
int *aFunction(size_t howMany, ...)
{
int r[howMany];
// do something, fill the array...
// you can't return r unless you make it static, but this is in general
// not good
return somethingElse;
}
would consume the stack that is not so big as the whole heap available.
More complex example exists. E.g. if you have to build a binary tree that grows according to some computation done at runtime, you basically have no other choices but to use dynamic memory allocation.
Array size is defined at compilation time whereas dynamic allocation is done at run time.
Thus, in your case, you can use your pointer as an array : numbers[5] is valid.
If you don't know the size of your array when writing the program, using runtime allocation is not a choice. Otherwise, you're free to use an array, it might be simpler (less risk to forget to free memory for example)
Example:
to store a 3-D position, you might want to use an array as it's alwaays 3 coordinates
to create a sieve to calculate prime numbers, you might want to use a parameter to give the max value and thus use dynamic allocation to create the memory area
Array is used to allocate memory statically and in one go.
To allocate memory dynamically malloc is required.
e.g. int numbers[10];
This will allocate memory statically and it will be contiguous memory.
If you are not aware of the count of the numbers then use variable like count.
int count;
int *numbers;
scanf("%d", count);
numbers = malloc ( sizeof(int) * count );
This is not possible in case of arrays.
Dynamic does not refer to the access. Dynamic is the size of malloc. If you just use a constant number, e.g. like 10 in your example, it is nothing better than an array. The advantage is when you dont know in advance how big it must be, e.g. because the user can enter at runtime the size. Then you can allocate with a variable, e.g. like malloc(sizeof(int) * userEnteredNumber). This is not possible with array, as you have to know there at compile time the (maximum) size.