This question already has answers here:
Do I really need malloc?
(2 answers)
Closed 2 years ago.
As far as I know, the C compiler (I am using GCC 6) will scan the code in order to:
Finding syntax issues;
Allocating memory to the program (Static allocation concept);
So why does this code work?
int main(){
int integers_amount; // each int has 4 bytes
printf("How many intergers do you wanna store? \n");
scanf("%d", &integers_amount);
int array[integers_amount];
printf("Size of array: %d\n", sizeof(array)); // Should be 4 times integer_amount
for(int i = 0; i < integers_amount; i++){
int integer;
printf("Type the integer: \n");
scanf("%d", &integer);
array[i] = integer;
}
for(int j = 0; j < integers_amount; j++){
printf("Integer typed: %d \n", array[j]);
}
return 0;
}
My point is:
How does the C compiler infer the size of the array during compilation time?
I mean, it was declared but its value has not been informed just yet (Compilation time). I really believed that the compiler allocated the needed amount of memory (in bytes) at compilation time - That is the concept of static allocation matter of fact.
From what I could see, the allocation for the variable 'array' is done during runtime, only after the user has informed the 'size' of the array. Is that correct?
I thought that dynamic allocation was used to use the needed memory only (let's say that I declare an integer array of size 10 because I don't know how many values the user will need to hold there, but I ended up only using 7, so I have a waste of 12 bytes).
If during runtime I have those bytes informed I can allocate only the memory needed. However, it doesn't seem to be the case because from the code we can see that the array is only allocated during runtime.
Can I have some help understanding that?
Thanks in advance.
How does the C compiler infer the size of the array during compilation time?
It's what's called a variable length array or for short a VLA, the size is determined at runtime but it's a one off, you cannot resize anymore. Some compilers even warn you about the usage of such arrays, as they are stored in the stack, which has a very limited size, it can potencially cause a stackoverflow.
From what I could see, the allocation for the variable 'array' is done during runtime, only after the user has informed the 'size' of the array. Is that correct?
Yes, that is correct. That's why these can be dangerous, the compiler won't know what is the size of the array at compile time, so if it's too large there is nothing it can do to avoid problems. For that reason C++ forbids VLA's.
let's say that I declare an integer array of size 10 because I don't know how many values the user will need to hold there, but I ended up only using 7, so I have a waste of 12 bytes
Contrary to fixed size arrays, a variable length array size can be determined at runtime, but when its size is defined you can no longer change it, for that you have dynamic memory allocation (discussed ahead) if you are really set on having the exact size needed, and not one byte more.
Anyway, if you are expecting an outside value to set the size of the array, odds are that it is the size you need, if not, well there is nothing you can do, aside from the mentioned dynamic memory allocation, in any case it's better to have a little more wasted space than too little space.
Can I have some help understanding that?
There are three concepts I find relevant to the discussion:
Fixed size arrays, i.e. int array[10]:
Their size defined at compile time, they cannot be resized and are useful if you already know the size they should have.
Variable length arrays, i.e. int array[size], size being a non constant variable:
Their size is defined at runtime, but can only be set once, they are useful if the size of the array is dependant on external values, e.g. a user input or some value retrived from a file.
Dynamically allocated arrays: i.e. int *array = malloc(sizeof *arr * size), size may or may not be a constant:
These are used when your array will need to be resized, or if it's too large to store in the stack, which has limited size. You can change its size at any point in your code using realloc, which may simply resize the array or, as #Peter reminded, may simply allocate a new array and copy the contents of the old one over.
Variables defined inside functions, like array in your snippet (main is a function like any other!), have "automatic" storage duration; typically, this translates to them being on the "stack", a universal concept for a first in/last out storage which gets built and unbuilt as functions are entered and exited.
The "stack" simply is an address which keeps track of the current edge of unused storage available for local variables of a function. The compiler emits code for moving it "forward" when a function is entered in order to accommodate the memory needs of local variables and to move it "backward" when the program flow leaves the function (the double quotes are there because the stack may as well grow towards smaller addresses).
Typically these stack adjustments upon entering into and returning from functions are computed at compile time; after all, the local variables are all visible in the program code. But principally, nothing keeps a program from changing the stack pointer "on the fly". Very early on, Unixes made use of this and provided a function which dynamically allocates space on the stack, called alloca(). The FreeBSD man page says: "The alloca() function appeared in Version 32V AT&T UNIX"ยด(which was released in 1979).
alloca behaves very much like alloc except that the storage is lost when the current function returns, and that it underlies the usual stack size restrictions.
So the first part of the answer is that your array does not have static storage duration. The memory where local variables will reside is not known at compile time (for example, a function with local variables in it may or may not be called at all, depending on run-time user input!). If it were, your astonishment would be entirely justified.
The second part of the answer is that array is a variable length array, a fairly new feature of the C programming language which was only added in 1999. It declares an object on the stack whose size is not known until run time (leading to the anti-paradigmatic consequence that sizeof(array) is not a compile time constant!).
One could argue that variable length arrays are only syntactic sugar around an alloca call; but alloca is, although widely available, not part of any standard.
Related
I have been writing a program where I have a 2d array that changes size if the user wants, as follows:
#include <stdlib.h>
#include <stdio.h>
int max_length = 1024;
int memory_length = 16;
int block_length = 64;
void process_input(int memory[memory_length][block_length], char* user_input) {
...
}
int main(void) {
printf("Not sure what to do? Enter 'help'\n");
while (0 == 0) {
int memory[memory_length][block_length];
char user_input[max_length];
printf(">> ");
fgets(user_input, max_length, stdin);
printf("\n");
process_input(memory, user_input);
if (user_input[0] == 'e' && user_input[1] == 'n' && user_input[2] == 'd') {
break;
}
printf("\n");
}
return 0;
}
NOTE: The process_input() function that I made allows the user to play around with the values inside the array 'memory' as well as change the value of memory_length or block_length, hence then changing the length of the array. After the user is done the cycle repeats with a fresh array.
I can use the 2d array perfectly fine, parsing it to any function. However one day I discover that there are functions such as malloc() that allow you to dynamically allocate memory through a pointer. This made me then question:
Should I re-write my whole very complicated program to use malloc and other 'memory functions', or is it okay to keep it this way?
Also as a side question that might be answered by answering the main question:
Every time I declare the 2d array, does the previous contents of the array get free, or do I keep filling up my memory like an amateur?
Finally if there is anything else that you may notice in the code or in my writing please let me know.
Thanks.
Should I re-write my whole very complicated program to use malloc and other 'memory functions', or is it okay to keep it this way?
Probably rewrite it indeed. int memory[memory_length][block_length]; in main() is a variable-length array (VLA). It is allocated with automatic storage and gets the size of those size variables at the point where its declaration is encountered, then it can't be resized from there.
For some reason a lot of beginners seem to think you can resize the VLA by changing the variables originally used to determine it's size, but no such magic relation between the VLA and those variables exists. How to declare variable-length arrays correctly?
The only kind of array in C that allows run-time resizing is one which was allocated dynamically. The only alternative to that is to allocate an array "large enough" and then keep track of how much of the array you actively are using - but it will sit there in memory (and that is likely no big deal).
However, it is not recommended to allocate huge arrays with automatic storage, since those usually end up on the stack and can cause stack overflows. Use either static storage duration or allocated storage (with malloc etc).
Every time I declare the 2d array, does the previous contents of the array get free, or do I keep filling up my memory like an amateur?
You can only declare it once. In case you do so inside a local scope, with automatic storage duration, it does indeed get cleared up every time you leave the scope which it was declared. But that also means that it can't be used outside that scope.
Finally if there is anything else that you may notice in the code or in my writing please let me know.
Yes, get rid of the global variables. There is no reason to use them in this example, avoid them like the plague. For example a function using an already allocated array might pass the sizes along, like in this example:
void process_input (size_t memory_length,
size_t block_length,
int memory[memory_length][block_length],
char* user_input)
In C, local variables, i.e. variables declared within a function, are allocated on the stack. They are only allocated once when the function is first called. The fact that you can declare variables within a while loop can lead to some confusion. The loop does not somehow allocate the memory again and again.
The memory allocated for all local variables is released when the function return.
The main reason that you might want declare a variable inside a loop (besides convenience) is to limit the scope of the variable. In your code above, you cannot access the "memory" variable outside of the while loop. You can easily check this for yourself. Your compiler should raise an error.
Whether the stack or the heap contains more memory depends on your computer architecture. In an embedded system you can often specify whether to allocate more or less memory to the heap or the stack. On a computer with virtual memory, such as a PC, the size of the heap and the stack are only limited by the size of your hard drive and the address space.
Allocating arrays on the heap is not as simple as it might seem. Single dimensional arrays work just as you might imagine, but things get more complicated with multidimensional arrays, so it is probably better to stick with either a locally or statically declared array in your case.
I have read that the size of heap and stack cannot be computed at compile time and needs to be evaluated at runtime.
I can think of this code which allocates heap based on user input and needs the runtime:
int size;
scanf("%d", &size):
void *ptr= malloc(size);
But aren't all the stack variables already present in a function? given their data type (int, char, long etc.) why can't the compiler calculate the size?
With C99, it is possible to create Variable length array (VLA) on the stack. Those arrays will have dynamic size based on runtime parameters, or calculated expressions. In those cases, not possible to calculate stack size until runtime.
For example:
int f(int n) {
// Size based on input
int x[n] ;
// Dynamic size
int m = n+5000 ;
int y[mm] ;
};
Needless to say that if the allocation of a single function can not be calculated, it is not possible to calculate the stack size of a complete program
Stack memory is allocated at the entry of the function. That's why the stack size depends on the sequence of function calls, which is not defined at compile time (e.g. any if, switch can change the function call sequence)
Why can't the size of stack be determined at compile time?
But aren't all the stack variables already present in a function?
2 things that prevent compile time stack size computation:
Variable logic arrays allow for a run-time determined amount of memory usage of the "stack". Other non-standard functions like alloca() do so also.
Recursion allows for a run-time determined depth of function calls and thus a run-time determined amount of memory usage even if each function memory usage was constant. #Weather Vane
Were it not for these 2 and maybe others, code could be analysed at compile time to determine stack usage, max function depth and then possible not even use a "stack" in classic sense, storing all "stack" memory in a fixed space. Some compilers provide for this. e.g.
recursive functions need the stack size to be dynamic I believe.
And also some versions of C allow variable length arrays
How compiler determine the size of below array at compile time?
int n;
scanf("%d",&n);
int a[n];
How it is different from dynamic allocation(other than memory is allocated in heap for dynamic array).
If possible please, explain this in terms of activation stack memory image how this array is allocated memory.
The array's size isn't determined at compile time; it's determined at run time. At the time that the array is allocated, n has a known value. In typical implementations where automatic variables are allocated on the program stack, the stack pointer will be adjusted to make room for that many ints. It becomes parts of the stack frame and will be automatically reclaimed when it goes out of scope.
This code was not valid in C90; C90 required that all variables be declared at the beginning of the block, so mixing declarations and code like this was not permitted. Variable-length arrays and mixed code and declarations were introduced in C99.
In C the proper name for the allocation type is automatic. In computing jargon the term stack is sometimes used synonymously.
The storage of a is valid from the point of definition int a[n]; up until the end of the enclosing scope (i.e. the end of the current function, or earlier).
It is just the same as int a[50]; , except that a different number of ints than 50 may be allocated.
A drawback of using automatic arrays (with or without runtime sizes) is that there is no portable way to protect against stack overflow. (Actually, stack overflow from automatic variables is something the C standard does not address at all, but it is a real problem in practice).
If you were to use dynamic allocation (i.e. malloc and friends) then it will let you know if there is insufficient memory by returning NULL, whereas stack overflows are nasty.
People seem to say how malloc is so great when using arrays and you can use it in cases when you don't know how many elements an array has at compile time(?). Well, can't you do that without malloc? For example, if we knew we had a string that had max length 10 doesn't the following do close enough to the same thing?... Besides being able to free the memory that is.
char name[sizeof(char)*10];
and
char *name = malloc(sizeof(char)*10);
The first creates an array of chars on the stack. The length of the array will be sizeof(char)*10, but seeing as char is defined by the standard of being 1 in size, you could just write char name[10];
If you want an array, big enough to store 10 ints (defined per standard to be at least 2 bytes in size, but most commonly implemented as 4 bytes big), int my_array[10] works, too. The compiler can work out how much memory will be required anyways, no need to write something like int foo[10*sizeof(int)]. In fact, the latter will be unpredictable: depending on sizeof(int), the array will store at least 20 ints, but is likely to be big enough to store 40.
Anyway, the latter snippet calls a function, malloc wich will attempt to allocate enough memory to store 10 chars on the heap. The memory is not initialized, so it'll contain junk.
Memory on the heap is slightly slower, and requires more attention from you, who is writing the code: you have to free it explicitly.
Again: char is guaranteed to be size 1, so char *name = malloc(10); will do here, too. However, when working with heap memory, I -and I'm not alone in this- prefer to allocate the memory like so some_ptr = malloc(10*sizeof *some_ptr); using *some_ptr, is like saying 10 times the size of whatever type this pointer will point to. If you happen to change the type later on, you don't have to refactor all malloc calls.
General rule of thumb, to answer your question "can you do without malloc", is that you don't use malloc, unless you have to.
Stack memory is faster, and easier to use, but it is less abundant. This site was named after a well-known issue you can run into when you've pushed too much onto the stack: it overflows.
When you run your program, the system will allocate a chunk of memory that you can use freely. This isn't much, but plenty for simple computations and calling functions. Once you run out, you'll have to resort to allocating memory from the heap.
But in this case, an array of 10 chars: use the stack.
Other things to consider:
An array is a contguous block of memory
A pointer doesn't know/can't tell you how big a block of memory was allocated (sizeof(an_array)/sizeof(type) vs sizeof(a_pointer))
An array's declaration does not require the use of sizeof. The compiler works out the size for you: <type> my_var[10] will reserve enough memory to hold 10 elements of the given type.
An array decays into a pointer, most of the time, but that doesn't make them the same thing
pointers are fun, if you know what you're doing, but once you start adding functions, and start passing pointers to pointers to pointers, or a pointer to a pointer to a struct, that has members that are pointers... your code won't be as jolly to maintain. Starting off with an array, I find, makes it easier to come to grips with the code, as it gives you a starting point.
this answer only really applies to the snippets you gave, if you're dealing with an array that grows over time, than realloc is to be preferred. If you're declaring this array in a recursive function, that runs deep, then again, malloc might be the safer option, too
Check this link on differences between array and pointers
Also take a look at this question + answer. It explains why a pointer can't give you the exact size of the block of memory you're working on, and why an array can.
Consider that an argument in favour of arrays wherever possible
char name[sizeof(char)*10]; // better to use: char name[10];
Statically allocates a vector of sizeof(char)*10 char elements, at compile time. The sizeof operator is useless because if you allocate an array of N elements of type T, the size allocated will already be sizeof(T)*N, you don't need to do the math. Stack allocated and no free needed. In general, you use char name[10] when you already know the size of the object you need (the length of the string in this case).
char *name = malloc(sizeof(char)*10);
Allocates 10 bytes of memory in the heap. Allocation is done at run time, you need to free the result.
char name[sizeof(char)*10];
The first one is allocated on the stack, once it goes out of scope memory gets automatically freed. You can't change the size of the first one.
char *name = malloc(sizeof(char)*10);
The second one is allocated on the heap and should be freed with free. It will stick around otherwise for the lifetime of your application. You can reallocate memory for the second one if you need.
The storage duration is different:
An array created with char name[size] exists for the entire duration of program execution (if it is defined at file scope or with static) or for the execution of the block it is defined in (otherwise). These are called static storage duration and automatic storage duration.
An array created with malloc(size) exists for just as long as you specify, from the time you call malloc until the time you call free. Thus, it can be made to use space only while you need it, unlike static storage duration (which may be too long) or automatic storage duration (which may be too short).
The amount of space available is different:
An array created with char name[size] inside a function uses the stack in typical C implementations, and the stack size is usually limited to a few megabytes (more if you make special provisions when building the program, typically less in kernel software and embedded systems).
An array created with malloc may use gigabytes of space in typical modern systems.
Support for dynamic sizes is different:
An array created with char name[size] with static storage duration must have a size specified at compile time. An array created with char name[size] with automatic storage duration may have a variable length if the C implementation supports it (this was mandatory in C 1999 but is optional in C 2011).
An array created with malloc may have a size computed at run-time.
malloc offers more flexibility:
Using char name[size] always creates an array with the given name, either when the program starts (static storage duration) or when execution reaches the block or definition (automatic).
malloc can be used at run-time to create any number of arrays (or other objects), by using arrays of pointers or linked lists or trees or other data structures to create a multitude of pointers to objects created with malloc. Thus, if your program needs a thousand separate objects, you can create an array of a thousand pointers and use a loop to allocate space for each of them. In contrast, it would be cumbersome to write a thousand char name[size] definitions.
First things first: do not write
char name[sizeof(char)*10];
You do not need the sizeof as part of the array declaration. Just write
char name[10];
This declares an array of 10 elements of type char. Just as
int values[10];
declares an array of 10 elements of type int. The compiler knows how much space to allocate based on the type and number of elements.
If you know you'll never need more than N elements, then yes, you can declare an array of that size and be done with it, but:
You run the risk of internal fragmentation; your maximum number of bytes may be N, but the average number of bytes you need may be much smaller than that. For example, let's say you want to store 1000 strings of max length 255, so you declare an array like
char strs[1000][256];
but it turns out that 900 of those strings are only 20 bytes long; you're wasting a couple of hundred kilobytes of space1. If you split the difference and stored 1000 pointers, then allocated only as much space as was necessary to store each string, then you'd wind up wasting a lot less memory:
char *strs[1000];
...
strs[i] = strdup("some string"); // strdup calls malloc under the hood
...
Stack space is also limited relative to heap space; you may not be able to declare arbitrarily large arrays (as auto variables, anway). A request like
long double huge[10000][10000][10000][10000];
will probably cause your code to crash at runtime, because the default stack size isn't large enough to accomodate it2.
And finally, most situations fall into one of three categories: you have 0 elements, you have exactly 1 element, or you have an unlimited number of elements. Allocating large enough arrays to cover "all possible scenarios" just doesn't work. Been there, done that, got the T-shirt in multiple sizes and colors.
1. Yes, we live in the future where we have gigabytes of address space available, so wasting a couple of hundred KB doesn't seem like a big deal. The point is still valid, you're wasting space that you don't have to.
2. You could declare very large arrays at file scope or with the static keyword; this will allocate the array in a different memory segment (neither stack nor heap). The problem is that you only have that single instance of the array; if your function is meant to be re-entrant, this won't work.
When shall i use malloc instead of normal array definition in C?
I can't understand the difference between:
int a[3]={1,2,3}
int array[sizeof(a)/sizeof(int)]
and:
array=(int *)malloc(sizeof(int)*sizeof(a));
In general, use malloc() when:
the array is too large to be placed on the stack
the lifetime of the array must outlive the scope where it is created
Otherwise, use a stack allocated array.
int a[3]={1,2,3}
int array[sizeof(a)/sizeof(int)]
If used as local variables, both a and array would be allocated on the stack. Stack allocation has its pros and cons:
pro: it is very fast - it only takes one register subtraction operation to create stack space and one register addition operation to reclaim it back
con: stack size is usually limited (and also fixed at link time on Windows)
In both cases the number of elements in each arrays is a compile-time constant: 3 is obviously a constant while sizeof(a)/sizeof(int) can be computed at compile time since both the size of a and the size of int are known at the time when array is declared.
When the number of elements is known only at run-time or when the size of the array is too large to safely fit into the stack space, then heap allocation is used:
array=(int *)malloc(sizeof(int)*sizeof(a));
As already pointed out, this should be malloc(sizeof(a)) since the size of a is already the number of bytes it takes and not the number of elements and thus additional multiplication by sizeof(int) is not necessary.
Heap allocaiton and deallocation is relatively expensive operation (compared to stack allocation) and this should be carefully weighted against the benefits it provides, e.g. in code that gets called multitude of times in tight loops.
Modern C compilers support the C99 version of the C standard that introduces the so-called variable-length arrays (or VLAs) which resemble similar features available in other languages. VLA's size is specified at run-time, like in this case:
void func(int n)
{
int array[n];
...
}
array is still allocated on the stack as if memory for the array has been allocated by a call to alloca(3).
You definately have to use malloc() if you don't want your array to have a fixed size. Depending on what you are trying to do, you might not know in advance how much memory you are going to need for a given task or you might need to dynamically resize your array at runtime, for example you might enlarge it if there is more data coming in. The latter can be done using realloc() without data loss.
Instead of initializing an array as in your original post you should just initialize a pointer to integer like.
int* array; // this variable will just contain the addresse of an integer sized block in memory
int length = 5; // how long do you want your array to be;
array = malloc(sizeof(int) * length); // this allocates the memory needed for your array and sets the pointer created above to first block of that region;
int newLength = 10;
array = realloc(array, sizeof(int) * newLength); // increase the size of the array while leaving its contents intact;
Your code is very strange.
The answer to the question in the title is probably something like "use automatically allocated arrays when you need quite small amounts of data that is short-lived, heap allocations using malloc() for anything else". But it's hard to pin down an exact answer, it depends a lot on the situation.
Not sure why you are showing first an array, then another array that tries to compute its length from the first one, and finally a malloc() call which tries do to the same.
Normally you have an idea of the number of desired elements, rather than an existing array whose size you want to mimic.
The second line is better as:
int array[sizeof a / sizeof *a];
No need to repeat a dependency on the type of a, the above will define array as an array of int with the same number of elements as the array a. Note that this only works if a is indeed an array.
Also, the third line should probably be:
array = malloc(sizeof a);
No need to get too clever (especially since you got it wrong) about the sizeof argument, and no need to cast malloc()'s return value.