Why use malloc() when I can just define a variable-length array? - c

I was reading about creating array dynamically in C. So the proper way as described there was:
int n;
scanf("%d", &n);
int *arr = (int*)malloc(n*sizeof(int));
But then I thought if I could just do something like this-
int n, i, sum=0;
scanf("%d", &n);
int arr[n];
And I compiled and ran it without any error. So, my question is why should I use malloc()? Does this have something to do with the old and new C versions?

There are at least five benefits to using malloc over variable length arrays.
Most notably, objects created with malloc persist after execution of the current block ends. This means that such objects can be returned (by pointer) to callers of functions. This use is frequent in real-world applications. Arrays created as variable-length arrays cease to exist when execution of their block ends.
Arrays created with malloc can be resized with realloc. Variable-length arrays cannot be resized.
As of the 2011 C standard, variable-length arrays are optional for C implementations to support. A general-purpose C implementation of any quality will support them, but the fact they are optional means code that is intended to be portable must either not use variable-length arrays or must guard against the lack of support by testing the preprocessor macro __STDC_NO_VLA__ and providing alternate code.
Commonly, variable-length arrays are much more limited in size than arrays allocated with malloc. Variable-length arrays are generally implemented using stack space, and stacks are typically limited to some not-large number of mebibytes (although that can generally be increased when building an executable). For objects created with malloc, gibibytes of memory may be available in modern systems.
If creation of an array does fail with malloc, NULL will be returned, and the programmer can easily write code to detect that and deal with it. If creation of a variable-length array fails, the common behavior is for the operating system to terminate the program with some memory error. (Various C implementations may provide means to intercept this error, but it is considerably more of a nuisance than testing the malloc return value for NULL, and it is not portable.)

And I compiled and run it without any error. So, my question is why should I use malloc() really? Does this have something to do with the old and new C versions?
Stack-allocated arrays are not equivalent to buffers in the free-store (the heap, the memory area that malloc and calloc use).
Assuming the array exists on the stack (which is implied as an automatic-variable) then your array cannot exceed the maximum stack size for your platform. On Linux with pthreads the default is 2 megabytes. The limit is similar on Windows.
Because of scope and object lifetime: pointers to elements in an array that exist on the stack cannot live longer than the array they point into, which means you cannot return pointers to those arrays and elements after the scope they're declared in expires.
VLA arrays are optional in C11. In C++ they're not part of the spec at all (i.e. they're vendor extensions) so your code won't be portable.

Related

Dynamically allocating array with the [] operator instead of using malloc?

I am sure this has been asked a million times but I couldn't find an answer that explained it. I have been told to never do this but I haven't really understood why. Why doesn't something like this count as dynamically allocating memory and why is it that bad?
int a;
scanf("%d",&a);
int arr[a];
This is not dynamic allocation but a variable length array.
The lifetime of such an array is its enclosing scope, just like an array with a fixed size, so you don't need to worry about deallocation. These arrays typically reside on the stack, so the size of the stack does put limitations on how big a VLA (or any array) can be.
If you find that a VLA is too big for the stack, you can always fall back to using malloc.
Dynamically allocating array with the [] operator instead of using malloc?
The [] here is used to define an array. [] here, is not an operator.
int arr[a];
I have been told to never do this but i havent really understood why.
This mantra applies to many things in C (and life), do not use until you understand how to use it.
Why doesnt something like this count as dynamically allocating memory
It is dynamically allocating memory allocation, but usually uses a different memory pool and has a different lifetime than via malloc().
... and why is it that bad?
int arr[a]; is a variable length array (VLA) and has the following issues:
Not always available. VLAs supported in C99 and often, but not always, in later versions. Research __STDC_NO_VLA__.
With int a, code is undefined behavior (UB) when a <= 0.
When a is too large, there is no standard mechanism to detect insufficient memory.
VLA better used in controlled smaller size cases.
First of all, let me tell you that this is called variable length array, and it's an optional feature. Better not to rely on this feature (or compiler support).
On the other hand, allocator functions (malloc(), free() and family) are standard compliant and any conforming compiler / library will support these functions.
That said, the differences are mentioned in other answers, primary differences are with scope and portability.
Why doesnt something like this count as dynamically allocating memory
That the size of the object is determined at runtime is not sufficient for dynamic allocation. Dynamic allocation requires direct or indirect use of one of the functions that specifically performs dynamic memory allocation, with malloc, calloc, and realloc being the main examples. Your example ...
int arr[a];
... does not do this, so the object is not dynamically allocated. You arr is a more-or-less ordinary local variable with variable-length array type.
Semantically, dynamically allocated objects have "allocated" storage duration, which means that they exist and retain their last-stored value until explicitly deallocated. Your arr instead has "automatic" storage duration, which means that it ceases to exist when execution of its innermost containing block terminates.
and why is it that bad?
Opinions vary. VLAs were a required feature in C99, but support was made optional in C11. If you use VLAs in your code, then, you limit its portability. C++ does not support them (or at least did not -- I understand that they are now coming to C++, too, and maybe they have already arrived), which may be an issue for some.
However, the main risk cited is that the usual and natural implementation of VLAs is to allocate them on the stack, and stack size is typically a lot more limited than heap space. In a naive example such as yours, it is easy to create a stack overflow, and this may depend on user input, thus making it both difficult to test and a potential security risk.
I do use VLAs in my code from time to time, but usually in ways that are not subject to the stack-busting risk described above. I generally look skeptically on admonitions to "never do that", but indeed I would never write the exact code you present for a production application.
This is an example of a variable length array. Its lifetime is the same as any other auto variable (including fixed-length arrays), so memory for it will be released once you exit its enclosing scope.
Unlike fixed-length arrays, VLAs cannot be declared at file scope (outside of any function) or with the static keyword, nor can they be declared with an initializer. Because of how they are typically managed, they cannot be arbitrarily large. Despite the name, they cannot be resized once they are defined - the "variable" in variable length only means that their size can be different each time they are defined.
VLAs are useful but limited. They're great when you need some temporary working storage that's local to a function and doesn't need to be too big (not much bigger than a megabyte or so).
They were only introduced in C99 and made optional in C11, so support for them may be a little spotty.

How are Variable Length Array instructions generated?

Variable length arrays are supported in C:
int main(){
int num = 5;
int arr[num];
return 0;
}
I understand that arr is allocated during runtime. How is this accomplished?
Does it call a C runtime function to allocate the byes? As the allocation amount is not known during compile time hence instructions should not exist for stack allocation.
As a side question, is it good practice to use them over malloc and heap allocation, as VLAs are not officially supported in C++?
Edit:
Seems like it may be implemented using alloca which allocates on the stack frame.
How VLA allocation is accomplished is up to the individual implementation - the ones I'm familiar with allocate from the stack, but they don't have to.
VLAs are useful, but only in very limited circumstances. Since their size isn't known until runtime, they can't be members of struct types, they can't have static storage durations, and their sizes may be limited. If you need temporary storage that isn't too big (on the order of a few kilobytes or so) and you don't know how big it is ahead of time and it doesn't need to persist beyond the current scope, then VLAs can be handy and are easier to deal with than dynamic memory.
However, as of 2011 VLA support is optional, so I wouldn't come to rely on them too heavily.
Not only, as you correctly say
VLAs are not officially supported in C++.
but they also were relegated to "optional feature" since C11 (though they were added only in C99!). This is actually a reason for not using them, in the purpose of having portable code.
Its memory allocation details are unfortunately implementation dependant. They are are usually allocated in the stack by most compilers as automatic storage variables (according to my reasearch and to my personal experience), but can also be allocated in the heap.
Having arrays allocated in the stack can lead to stack overflow issues, especially in embedded environment. I suggest visiting this question (about VLAs not being supported in C++ standards); in particular, it is really interesting this answer, by #Quuxplusone (enphasis is mine):
Variable-length arrays in C99 were basically a misstep. In order to support VLAs, C99 had to make the [...] concessions to common sense.
Less importantly in the C++ world, but extremely important for C's target audience of embedded-systems programmers, declaring a VLA means chomping an arbitrarily large chunk of your stack. This is a guaranteed stack-overflow and crash. (Anytime you declare int A[n], you're implicitly asserting that you have 2GB of stack to spare. After all, if you know "n is definitely less than 1000 here", then you would just declare int A[1000].
As far as I can see, their main advantage is having arrays of variable length with local scope, something that cannot be achieved with its alternatives:
/* Fixed length array, either global or local */
int arr[100];
/* Dynamic allocation */
int * arr = malloc (100 * sizeof (int));
In most cases, anyway, the developer either
Knows what is the maximum size that VLA can have. So why not allocating it statically with a fixed length?
Has no control on the maximum size, so they will have to perform a sanity check to avoid stack overflows. So why not limiting its size with a fixed length allocation?
As a side question, is it good practice to use them over malloc and heap allocation,
No, consider instead using flexible array members as the last member of your variable-sized struct-s. Check that malloc did not fail at runtime.
In embedded programming, VLA-s are useful when you can guarantee that they are small enough since you don't want to blow up your call stack. A minima precede int arr[num]; with something like assert(num>0 && num<100); or better yet, add such a check at runtime. You could use tools like Frama-C to prove that statically.

Creating a variable-sized array without malloc

I'm writing a program in C which I want to read an array length, and create an array of that size. However, C does not support variable-lengthed arrays, so I was wondering how I could do this. I do not want to change my compiler settings.
I was thinking about somehow using preprocessor directive to my advantage, but I have not been able to do so. Pretty much, I have an integer variable containing my desired size, and I would like to declare the array with 0's. Also, I do not want to use malloc/other dynamic array methods.
This might seem basic, but I have been struggling to do this for some time. If it matters, I am receiving the array size through I/O.
There are several possible solutions, none of which satisfy all of your requirements.
A call to malloc is the obvious solution; that's what it's for. You've said you don't want to use malloc, but you haven't explained why.
C does support variable-length arrays -- more or less. VLAs did not exist in C90, were introduced in C99, and were made optional in C11. So if you want portable code, you can't assume that they're supported. If they are, you can do something like this:
int size;
// get value of size from input
int vla[size];
There are some restrictions. If there isn't enough memory (stack size can be more restrictive than heap size), the behavior is undefined. On the other hand, the same is true for ordinary fixed-size arrays, and VLAs can let you allocate a smaller amount of memory rather than assuming a fixed upper bound. VLAs exist only at block scope, so the object will cease to exist when control leaves the enclosing block (typically when the function returns).
You could define an array (probably at file scope, outside any function definition) that you know is big enough for your data. You'll have to specify some upper bound. For example, you can define int arr[10000]; and then reject any input bigger than 10,000. You could then use an initial subset of that array for your data.
You say you want to create a "variably-sized array", but you "do not want to use malloc/other dynamic array methods". It sounds like you want to create a dynamic array, but you don't want to create a dynamic array. It's like saying you want to drive a screw, but you don't want to use a screwdriver.
May I ask: why are you allergic to malloc()?
The reason I ask is that many attempts to define a safe profile for C propose that malloc is the source of all evil. In that case:
int *arr;
arr = mmap(0, sizeof *arr * N, PROT_READ|PROT_WRITE, MAP_PRIVATE, -1, 0);
What you can do is read the array length, and then generate the source code of a program:
fprintf(outfile, "int main(void) { static int arr[%d]; ...}\n", size);
Then execute the compiler on the generated program (e.g. using the system function), and run the resulting executable.
Any language which supports variable length arrays uses dynamic memory allocation mechanism underneath to implement the functionality. 'C' does not have a syntactic sugar which supports true variable length arrays, but it provides all the mechanics needed to mimic one.
malloc, realloc, free, and others can easily be used to handle dynamic allocations and deallocations for arrays of any size and types of elements. You can allocate data in memory and use a pointer to return the reference to caller functions or pass to other functions. ('C' VLAs on the other hand are of limited use and cannot be returned to the caller if allocated on stack).
So, your best option (unless you are in embedded software development) is to start using 'c' dynamic memory allocation.
However, C does not support variable-lengthed arrays,
Wrong. This is perfectly valid C code:
#include <stdio.h>
int main(void)
{
int size;
scanf("%d", &size);
int arr[size];
}
It's called VLA (variable length array) and has been a part of C since 1999. However, it's optional from C11, but big compilers like clang and gcc will never remove them. At least not in the foreseeable future.

C function: is this dynamic allocation? initializating an array with a changing length

Suppose I have a C function:
void myFunction(..., int nObs){
int myVec[nObs] ;
...
}
Is myVec being dynamically allocated? nObs is not constant whenever myFunction is called. I ask because I am currently programming with this habit, and a friend was having errors with his program where the culprit is he didn't dynamically allocate his arrays. I want to know whether my habit of programming (initializing like in the above example) is a safe habit.
Thanks.
To answer your question, it's not considered dynamic allocation because it's in the stack. Before this was allowed, you could on some platforms simulate the same variable length allocation on the stack with a function alloca, but that was not portable. This is (if you program for C99).
It's compiler-dependent. I know it's ok with gcc, but I don't think the C89 spec allows it. I'm not sure about newer C specs, like C99. Best bet for portability is not to use it.
It is known as a "variable length array". It is dynamic in the sense that its size is determined at run-time and can change from call to call, but it has auto storage class like any other local variable. I'd avoid using the term "dynamic allocation" for this, since it would only serve to confuse.
The term "dynamic allocation" is normally used for memory and objects allocated from the heap and whose lifetime are determined by the programmer (by new/delete, malloc/free), rather than the object's scope. Variable length arrays are allocated and destroyed automatically as they come in and out of scope like any other local variable with auto storage class.
Variable length arrays are not universally supported by compilers; particularly VC++ does not support C99 (and therefore variable length arrays), and there are no plans to do so. Neither does C++ currently support them.
With respect to it being a "safe habit", apart from the portability issue, there is the obvious potential to overflow the stack should nObs be sufficiently large a value. You could to some extent protect against this by making nObs a smaller integer type uint8_t or uint16_t for example, but it is not a very flexible solution, and makes bold assumptions about the size of the stack, and objects being allocated. An assert(nObs < MAX_OBS) might be advisable, but at that point the stack may already have overflowed (this may be OK though since an assert() causes termination in any case).
[edit]
Using variable length arrays is probably okay if the size is either not externally determined as in your example.
[/edit]
On the whole, the portability and the stack safety issues would suggest that variable length arrays are best avoided IMO.

Why does C need arrays if it has pointers?

If we can use pointers and malloc to create and use arrays, why does the array type exist in C? Isn't it unnecessary if we can use pointers instead?
Arrays are faster than dynamic memory allocation.
Arrays are "allocated" at "compile time" whereas malloc allocates at run time. Allocating takes time.
Also, C does not mandate that malloc() and friends are available in free-standing implementations.
Edit
Example of array
#define DECK_SIZE 52
int main(void) {
int deck[DECK_SIZE];
play(deck, DECK_SIZE);
return 0;
}
Example of malloc()
int main(void) {
size_t len = 52;
int *deck = malloc(len * sizeof *deck);
if (deck) {
play(deck, len);
}
free(deck);
return 0;
}
In the array version, the space for the deck array was reserved by the compiler when the program was created (but, of course, the memory is only reserved/occupied when the program is being run), in the malloc() version, space for the deck array has to be requested at every run of the program.
Arrays can never change size, malloc'd memory can grow when needed.
If you only need a fixed number of elements, use an array (within the limits of your implementation).
If you need memory that can grow or shrink during the running of the program, use malloc() and friends.
It's not a bad question. In fact, early C had no array types.
Global and static arrays are allocated at compile time (very fast). Other arrays are allocated on the stack at runtime (fast). Allocating memory with malloc (to be used for an array or otherwise) is much slower. A similar thing is seen in deallocation: dynamically allocated memory is slower to deallocate.
Speed is not the only issue. Array types are automatically deallocated when they go out of scope, so they cannot be "leaked" by mistake. You don't need to worry about accidentally freeing something twice, and so on. They also make it easier for static analysis tools to detect bugs.
You may argue that there is the function _alloca() which lets you allocate memory from the stack. Yes, there is no technical reason why arrays are needed over _alloca(). However, I think arrays are more convenient to use. Also, it is easier for the compiler to optimise the use of an array than a pointer with an _alloca() return value in it, since it's obvious what a stack-allocated array's offset from the stack pointer is, whereas if _alloca() is treated like a black-box function call, the compiler can't tell this value in advance.
EDIT, since tsubasa has asked for more details on how this allocation occurs:
On x86 architectures, the ebp register normally refers to the current function's stack frame, and is used to reference stack-allocated variables. For instance, you may have an int located at [ebp - 8] and a char array stretching from [ebp - 24] to [ebp - 9]. And perhaps more variables and arrays on the stack. (The compiler decides how to use the stack frame at compile time. C99 compilers allow variable-size arrays to be stack allocated, this is just a matter of doing a tiny bit of work at runtime.)
In x86 code, pointer offsets (such as [ebp - 16]) can be represented in a single instruction. Pretty efficient.
Now, an important point is that all stack-allocated variables and arrays in the current context are retrieved via offsets from a single register. If you call malloc there is (as I have said) some processing overhead in actually finding some memory for you. But also, malloc gives you a new memory address. Let's say it is stored in the ebx register. You can't use an offset from ebp anymore, because you can't tell what that offset will be at compile time. So you are basically "wasting" an extra register that you would not need if you used a normal array instead. If you malloc more arrays, you have more "unpredictable" pointer values that magnify this problem.
Arrays have their uses, and should be used when you can, as static allocation will help make programs more stable, and are a necessity at times due to the need to ensure memory leaks don't happen.
They exist because some requirements require them.
In a language such as BASIC, you have certain commands that are allowed, and this is known, due to the language construct. So, what is the benefit of using malloc to create the arrays, and then fill them in from strings?
If I have to define the names of the operations anyway, why not put them into an array?
C was written as a general purpose language, which means that it should be useful in any situation, so they had to ensure that it had the constructs to be useful for writing operating systems as well as embedded systems.
An array is a shorthand way to specify pointing to the beginning of a malloc for example.
But, imagine trying to do matrix math by using pointer manipulations rather than vec[x] * vec[y]. It would be very prone to difficult to find errors.
See this question discussing space hardening and C. Sometimes dynamic memory allocation is just a bad idea, I have worked with C libraries that are completely devoid of malloc() and friends.
You don't want a satellite dereferencing a NULL pointer any more than you want air traffic control software forgetting to zero out heap blocks.
Its also important (as others have pointed out) to understand what is part of C and what extends it into various uniform standards (i.e. POSIX).
Arrays are a nice syntax improvement compared to dealing with pointers. You can make all sorts of mistakes unknowingly when dealing with pointers. What if you move too many spaces across the memory because you're using the wrong byte size?
Explanation by Dennis Ritchie about C history:
Embryonic C
NB existed so briefly that no full description of it was written. It supplied the types int and char, arrays of them, and pointers to them, declared in a style typified by
int i, j;
char c, d;
int iarray[10];
int ipointer[];
char carray[10];
char cpointer[];
The semantics of arrays remained exactly as in B and BCPL: the declarations of iarray and carray create cells dynamically initialized with a value pointing to the first of a sequence of 10 integers and characters respectively. The declarations for ipointer and cpointer omit the size, to assert that no storage should be allocated automatically. Within procedures, the language's interpretation of the pointers was identical to that of the array variables: a pointer declaration created a cell differing from an array declaration only in that the programmer was expected to assign a referent, instead of letting the compiler allocate the space and initialize the cell.
Values stored in the cells bound to array and pointer names were the machine addresses, measured in bytes, of the corresponding storage area. Therefore, indirection through a pointer implied no run-time overhead to scale the pointer from word to byte offset. On the other hand, the machine code for array subscripting and pointer arithmetic now depended on the type of the array or the pointer: to compute iarray[i] or ipointer+i implied scaling the addend i by the size of the object referred to.
These semantics represented an easy transition from B, and I experimented with them for some months. Problems became evident when I tried to extend the type notation, especially to add structured (record) types. Structures, it seemed, should map in an intuitive way onto memory in the machine, but in a structure containing an array, there was no good place to stash the pointer containing the base of the array, nor any convenient way to arrange that it be initialized. For example, the directory entries of early Unix systems might be described in C as
struct {
int inumber;
char name[14];
};
I wanted the structure not merely to characterize an abstract object but also to describe a collection of bits that might be read from a directory. Where could the compiler hide the pointer to name that the semantics demanded? Even if structures were thought of more abstractly, and the space for pointers could be hidden somehow, how could I handle the technical problem of properly initializing these pointers when allocating a complicated object, perhaps one that specified structures containing arrays containing structures to arbitrary depth?
The solution constituted the crucial jump in the evolutionary chain between typeless BCPL and typed C. It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today's C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.
To summarize in my own words - if name above were just a pointer, any of that struct would contain an additional pointer, destroying the perfect mapping of it to an external object (like an directory entry).

Resources