Related
I'm writing a program in C which I want to read an array length, and create an array of that size. However, C does not support variable-lengthed arrays, so I was wondering how I could do this. I do not want to change my compiler settings.
I was thinking about somehow using preprocessor directive to my advantage, but I have not been able to do so. Pretty much, I have an integer variable containing my desired size, and I would like to declare the array with 0's. Also, I do not want to use malloc/other dynamic array methods.
This might seem basic, but I have been struggling to do this for some time. If it matters, I am receiving the array size through I/O.
There are several possible solutions, none of which satisfy all of your requirements.
A call to malloc is the obvious solution; that's what it's for. You've said you don't want to use malloc, but you haven't explained why.
C does support variable-length arrays -- more or less. VLAs did not exist in C90, were introduced in C99, and were made optional in C11. So if you want portable code, you can't assume that they're supported. If they are, you can do something like this:
int size;
// get value of size from input
int vla[size];
There are some restrictions. If there isn't enough memory (stack size can be more restrictive than heap size), the behavior is undefined. On the other hand, the same is true for ordinary fixed-size arrays, and VLAs can let you allocate a smaller amount of memory rather than assuming a fixed upper bound. VLAs exist only at block scope, so the object will cease to exist when control leaves the enclosing block (typically when the function returns).
You could define an array (probably at file scope, outside any function definition) that you know is big enough for your data. You'll have to specify some upper bound. For example, you can define int arr[10000]; and then reject any input bigger than 10,000. You could then use an initial subset of that array for your data.
You say you want to create a "variably-sized array", but you "do not want to use malloc/other dynamic array methods". It sounds like you want to create a dynamic array, but you don't want to create a dynamic array. It's like saying you want to drive a screw, but you don't want to use a screwdriver.
May I ask: why are you allergic to malloc()?
The reason I ask is that many attempts to define a safe profile for C propose that malloc is the source of all evil. In that case:
int *arr;
arr = mmap(0, sizeof *arr * N, PROT_READ|PROT_WRITE, MAP_PRIVATE, -1, 0);
What you can do is read the array length, and then generate the source code of a program:
fprintf(outfile, "int main(void) { static int arr[%d]; ...}\n", size);
Then execute the compiler on the generated program (e.g. using the system function), and run the resulting executable.
Any language which supports variable length arrays uses dynamic memory allocation mechanism underneath to implement the functionality. 'C' does not have a syntactic sugar which supports true variable length arrays, but it provides all the mechanics needed to mimic one.
malloc, realloc, free, and others can easily be used to handle dynamic allocations and deallocations for arrays of any size and types of elements. You can allocate data in memory and use a pointer to return the reference to caller functions or pass to other functions. ('C' VLAs on the other hand are of limited use and cannot be returned to the caller if allocated on stack).
So, your best option (unless you are in embedded software development) is to start using 'c' dynamic memory allocation.
However, C does not support variable-lengthed arrays,
Wrong. This is perfectly valid C code:
#include <stdio.h>
int main(void)
{
int size;
scanf("%d", &size);
int arr[size];
}
It's called VLA (variable length array) and has been a part of C since 1999. However, it's optional from C11, but big compilers like clang and gcc will never remove them. At least not in the foreseeable future.
This is my problem in essence. In the life of a function, I generate some integers, then use the array of integers in an algorithm that is also part of the same function. The array of integers will only be used within the function, so naturally it makes sense to store the array on the stack.
The problem is I don't know the size of the array until I'm finished generating all the integers.
I know how to allocate a fixed size and variable sized array on the stack. However, I do not know how to grow an array on the stack, and that seems like the best way to solve my problem. I'm fairly certain this is possible to do in assembly, you just increment stack pointer and store an int for each int generated, so the array of ints would be at the end of the stack frame. Is this possible to do in C though?
I would disagree with your assertion that "so naturally it makes sense to store the array on the stack". Stack memory is really designed for when you know the size at compile time. I would argue that dynamic memory is the way to go here
C doesn't define what the "stack" is. It only has static, automatic and dynamic allocations. Static and automatic allocations are handled by the compiler, and only dynamic allocation puts the controls in your hands. Thus, if you want to manually deallocate an object and allocate a bigger one, you must use dynamic allocation.
Don't use dynamic arrays on the stack (compare Why is the use of alloca() not considered good practice?), better allocate memory from the heap using malloc and resize it using realloc.
Never Use alloca()
IMHO this point hasn't been made well enough in the standard references.
One rule of thumb is:
If you're not prepared to statically allocate the maximum possible size as a
fixed length C array then you shouldn't do it dynamically with alloca() either.
Why? The reason you're trying to avoid malloc() is performance.
alloca() will be slower and won't work in any circumstance static allocation will fail. It's generally less likely to succeed than malloc() too.
One thing is sure. Statically allocating the maximum will outdo both malloc() and alloca().
Static allocation is typically damn near a no-op. Most systems will advance the stack pointer for the function call anyway. There's no appreciable difference for how far.
So what you're telling me is you care about performance but want to hold back on a no-op solution? Think about why you feel like that.
The overwhelming likelihood is you're concerned about the size allocated.
But as explained it's free and it gets taken back. What's the worry?
If the worry is "I don't have a maximum or don't know if it will overflow the stack" then you shouldn't be using alloca() because you don't have a maximum and know it if it will overflow the stack.
If you do have a maximum and know it isn't going to blow the stack then statically allocate the maximum and go home. It's a free lunch - remember?
That makes alloca() either wrong or sub-optimal.
Every time you use alloca() you're either wasting your time or coding in one of the difficult-to-test-for arbitrary scaling ceilings that sleep quietly until things really matter then f**k up someone's day.
Don't.
PS: If you need a big 'workspace' but the malloc()/free() overhead is a bottle-neck for example called repeatedly in a big loop, then consider allocating the workspace outside the loop and carrying it from iteration to iteration. You may need to reallocate the workspace if you find a 'big' case but it's often possible to divide the number of allocations by 100 or even 1000.
Footnote:
There must be some theoretical algorithm where a() calls b() and if a() requires a massive environment b() doesn't and vice versa.
In that event there could be some kind of freaky play-off where the stack overflow is prevented by alloca(). I have never heard of or seen such an algorithm. Plausible specimens will be gratefully received!
The innards of the C compiler requires stack sizes to be fixed or calculable at compile time. It's been a while since I used C (now a C++ convert) and I don't know exactly why this is. http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html provides a useful comparison of the pros and cons of the two approaches.
I appreciate your assembly code analogy but C is largely managed, if that makes any sense, by the Operating System, which imposes/provides the task, process and stack notations.
In order to address your issue dynamic memory allocation looks ideal.
int *a = malloc(sizeof(int));
and dereference it to store the value .
Each time a new integer needs to be added to the existing list of integers
int *temp = realloc(a,sizeof(int) * (n+1)); /* n = number of new elements */
if(temp != NULL)
a = temp;
Once done using this memory free() it.
Is there an upper limit on the size? If you can impose one, so the size is at most a few tens of KiB, then yes alloca is appropriate (especially if this is a leaf function, not one calling other functions that might also allocate non-tiny arrays this way).
Or since this is C, not C++, use a variable-length array like int foo[n];.
But always sanity-check your size, otherwise it's a stack-clash vulnerability waiting to happen. (Where a huge allocation moves the stack pointer so far that it ends up in the middle of another memory region, where other things get overwritten by local variables and return addresses.) Some distros enable hardening options that make GCC generate code to touch every page in between when moving the stack pointer by more than a page.
It's usually not worth it to check the size and use alloc for small, malloc for large, since you also need another check at the end of your function to call free if the size was large. It might give a speedup, but this makes your code more complicated and more likely to get broken during maintenance if future editors don't notice that the memory is only sometimes malloced. So only consider a dual strategy if profiling shows this is actually important, and you care about performance more than simplicity / human-readability / maintainability for this particular project.
A size check for an upper limit (else log an error and exit) is more reasonable, but then you have to choose an upper limit beyond which your program will intentionally bail out, even though there's plenty of RAM you're choosing not to use. If there is a reasonable limit where you can be pretty sure something's gone wrong, like the input being intentionally malicious from an exploit, then great, if(size>limit) error(); int arr[size];.
If neither of those conditions can be satisfied, your use case is not appropriate for C automatic storage (stack memory) because it might need to be large. Just use dynamic allocation autom don't want malloc.
Windows x86/x64 the default user-space stack size is 1MiB, I think. On x86-64 Linux it's 8MiB. (ulimit -s). Thread stacks are allocated with the same size. But remember, your function will be part of a chain of function calls (so if every function used a large fraction of the total size, you'd have a problem if they called each other). And any stack memory you dirty won't get handed back to the OS even after the function returns, unlike malloc/free where a large allocation can give back the memory instead of leaving it on the free list.
Kernel thread stack are much smaller, like 16 KiB total for x86-64 Linux, so you never want VLAs or alloca in kernel code, except maybe for a tiny max size, like up to 16 or maybe 32 bytes, not large compared to the size of a pointer that would be needed to store a kmalloc return value.
From my understanding of C it seems that you are supposed to use malloc(size) whenever you are trying to initialize, for instance, an array whose size you do not know of until runtime.
But I was wondering why the function malloc() returns a pointer to the location of the variable and why you even need that.
Basically, why doesn't C just hide it all from you, so that whenever you do something like this:
// 'n' gets stdin'ed from the user
...
int someArray[n];
for(int i = 0; i < n; i++)
someArray[i] = 5;
you can do it without ever having to call malloc() or some other function? Do other languages do it like this (by hiding the memory properties/location altogether)? I feel that as a beginner this whole process of dealing with the memory locations of variables you use just confuse programmers (and since other languages don't use it, C seems to make a simple initialization process such as this overly complicated)...
Basically, what I'm trying to ask is why malloc() is even necessary, because why the language doesn't take care of all that for you internally without the programmer having to be concerned about or having to see memory. Thanks
*edit: Ok, maybe there are some versions of C that I'm not aware of that allows you to forgo the use of malloc() but let's try to ignore that for now...
C lets you manage every little bit of your program. You can manage when memory gets allocated; you can manage when it gets deallocated; you can manage how to grow a small allocation, etc.
If you prefer not to manage that and let the compiler do it for you, use another language.
Actually C99 allows this (so you're not the only one thinking of it). The feature is called VLA (VAriable Length Array).
It's legal to read an int and then have an array of that size:
int n;
fscanf("%d", &n);
int array[n];
Of course there are limitations since malloc uses the heap and VLAs use the stack (so the VLAs can't be as big as the malloced objects).
*edit: Ok, maybe there are some versions of C that I'm not aware of that allows you to forgo the use of malloc() but let's try to ignore
that for now...
So we can concentrate on the flame ?
Basically, what I'm trying to ask is why malloc() is even necessary,
because why the language doesn't take care of all that for you
internally without the programmer having to be concerned about or
having to see memory.
The very point of malloc(), it's raison d'ĂȘtre, it's function, if you will, is to allocate a block of memory. The way we refer to a block of memory in C is by its starting address, which is by definition a pointer.
C is close to 40 years old, and it's not nearly as "high level" as some more modern languages. Some languages, like Java, attempt to prevent mistakes and simplify programming by hiding pointers and explicit memory management from the programmer. C is not like that. Why? Because it just isn't.
Basically, what I'm trying to ask is why malloc() is even necessary, because why the language doesn't take care of all that for you internally without the programmer having to be concerned about or having to see memory. Thanks
One of the hallmarks of C is its simplicity (C compilers are relatively easy to implement); one way of making a language simple is to force the programmer to do all his own memory management. Clearly, other languages do manage objects on the heap for you - Java and C# are modern examples, but the concept isn't new at all; Lisp implementations have been doing it for decades. But that convenience comes at a cost in both compiler complexity and runtime performance.
The Java/C# approach helps eliminate whole classes of memory-management bugs endemic to C (memory leaks, invalid pointer dereferences, etc.). By the same token, C provides a level of control over memory management that allows the programmer to achieve high levels of performance that would be difficult (not impossible) to match in other languages.
If the only purpose of dynamic allocation were to allocate variable-length arrays, then malloc() might not be necessary. (But note that malloc() was around long before variable-length arrays were added to the language.)
But the size of a VLA is fixed (at run time) when the object is created. It can't be resized, and it's deallocated only when you leave the scope in which it's declared. (And VLAs, unlike malloc(), don't have a mechanism for reporting allocation failures.)
malloc() gives you a lot more flexibility.
Consider creating a linked list. Each node is a structure, containing some data and a pointer to the next node in the list. You might know the size of each node in advance, but you don't know how many nodes to allocate. For example, you might read lines from a text file, creating and appending a new node for each line.
You can also use malloc() along with realloc() to create a buffer (say, an array of unsigned char) whose size can be changed after you created it.
Yes, there are languages that don't expose pointers, and that handle memory management for you.
A lot of them are implemented in C.
Maybe the question should be "why do you need something like int array[n] when you can use pointers?"
After all, pointers allow you to keep an object alive beyond the scope it was created in, you can use pointer to slice and dice arrays (for example strchr() returns a pointer to a string), pointers are light-weight objects, so it's cheap to pass them to functions and return them from functions, etc.
But the real answer is "that's how it is". Other options are possible, and the proof is that there are other languages that do other things (and even C99 allows different things).
C is treated as highly developed low-level language, basically malloc is used in dynamic arrays which is a key component in stack & queue. for other languages that hides the pointer part from the developer are not well capable of doing hardware related programming.
The short answer to your question is to ponder this question: What if you also need to control exactly when the memory is de-allocated?
C is a compiled language, not an interpreted one. If you don't know n at compile time, how is the compiler supposed to produce a binary?
Suppose I have a C function:
void myFunction(..., int nObs){
int myVec[nObs] ;
...
}
Is myVec being dynamically allocated? nObs is not constant whenever myFunction is called. I ask because I am currently programming with this habit, and a friend was having errors with his program where the culprit is he didn't dynamically allocate his arrays. I want to know whether my habit of programming (initializing like in the above example) is a safe habit.
Thanks.
To answer your question, it's not considered dynamic allocation because it's in the stack. Before this was allowed, you could on some platforms simulate the same variable length allocation on the stack with a function alloca, but that was not portable. This is (if you program for C99).
It's compiler-dependent. I know it's ok with gcc, but I don't think the C89 spec allows it. I'm not sure about newer C specs, like C99. Best bet for portability is not to use it.
It is known as a "variable length array". It is dynamic in the sense that its size is determined at run-time and can change from call to call, but it has auto storage class like any other local variable. I'd avoid using the term "dynamic allocation" for this, since it would only serve to confuse.
The term "dynamic allocation" is normally used for memory and objects allocated from the heap and whose lifetime are determined by the programmer (by new/delete, malloc/free), rather than the object's scope. Variable length arrays are allocated and destroyed automatically as they come in and out of scope like any other local variable with auto storage class.
Variable length arrays are not universally supported by compilers; particularly VC++ does not support C99 (and therefore variable length arrays), and there are no plans to do so. Neither does C++ currently support them.
With respect to it being a "safe habit", apart from the portability issue, there is the obvious potential to overflow the stack should nObs be sufficiently large a value. You could to some extent protect against this by making nObs a smaller integer type uint8_t or uint16_t for example, but it is not a very flexible solution, and makes bold assumptions about the size of the stack, and objects being allocated. An assert(nObs < MAX_OBS) might be advisable, but at that point the stack may already have overflowed (this may be OK though since an assert() causes termination in any case).
[edit]
Using variable length arrays is probably okay if the size is either not externally determined as in your example.
[/edit]
On the whole, the portability and the stack safety issues would suggest that variable length arrays are best avoided IMO.
If we can use pointers and malloc to create and use arrays, why does the array type exist in C? Isn't it unnecessary if we can use pointers instead?
Arrays are faster than dynamic memory allocation.
Arrays are "allocated" at "compile time" whereas malloc allocates at run time. Allocating takes time.
Also, C does not mandate that malloc() and friends are available in free-standing implementations.
Edit
Example of array
#define DECK_SIZE 52
int main(void) {
int deck[DECK_SIZE];
play(deck, DECK_SIZE);
return 0;
}
Example of malloc()
int main(void) {
size_t len = 52;
int *deck = malloc(len * sizeof *deck);
if (deck) {
play(deck, len);
}
free(deck);
return 0;
}
In the array version, the space for the deck array was reserved by the compiler when the program was created (but, of course, the memory is only reserved/occupied when the program is being run), in the malloc() version, space for the deck array has to be requested at every run of the program.
Arrays can never change size, malloc'd memory can grow when needed.
If you only need a fixed number of elements, use an array (within the limits of your implementation).
If you need memory that can grow or shrink during the running of the program, use malloc() and friends.
It's not a bad question. In fact, early C had no array types.
Global and static arrays are allocated at compile time (very fast). Other arrays are allocated on the stack at runtime (fast). Allocating memory with malloc (to be used for an array or otherwise) is much slower. A similar thing is seen in deallocation: dynamically allocated memory is slower to deallocate.
Speed is not the only issue. Array types are automatically deallocated when they go out of scope, so they cannot be "leaked" by mistake. You don't need to worry about accidentally freeing something twice, and so on. They also make it easier for static analysis tools to detect bugs.
You may argue that there is the function _alloca() which lets you allocate memory from the stack. Yes, there is no technical reason why arrays are needed over _alloca(). However, I think arrays are more convenient to use. Also, it is easier for the compiler to optimise the use of an array than a pointer with an _alloca() return value in it, since it's obvious what a stack-allocated array's offset from the stack pointer is, whereas if _alloca() is treated like a black-box function call, the compiler can't tell this value in advance.
EDIT, since tsubasa has asked for more details on how this allocation occurs:
On x86 architectures, the ebp register normally refers to the current function's stack frame, and is used to reference stack-allocated variables. For instance, you may have an int located at [ebp - 8] and a char array stretching from [ebp - 24] to [ebp - 9]. And perhaps more variables and arrays on the stack. (The compiler decides how to use the stack frame at compile time. C99 compilers allow variable-size arrays to be stack allocated, this is just a matter of doing a tiny bit of work at runtime.)
In x86 code, pointer offsets (such as [ebp - 16]) can be represented in a single instruction. Pretty efficient.
Now, an important point is that all stack-allocated variables and arrays in the current context are retrieved via offsets from a single register. If you call malloc there is (as I have said) some processing overhead in actually finding some memory for you. But also, malloc gives you a new memory address. Let's say it is stored in the ebx register. You can't use an offset from ebp anymore, because you can't tell what that offset will be at compile time. So you are basically "wasting" an extra register that you would not need if you used a normal array instead. If you malloc more arrays, you have more "unpredictable" pointer values that magnify this problem.
Arrays have their uses, and should be used when you can, as static allocation will help make programs more stable, and are a necessity at times due to the need to ensure memory leaks don't happen.
They exist because some requirements require them.
In a language such as BASIC, you have certain commands that are allowed, and this is known, due to the language construct. So, what is the benefit of using malloc to create the arrays, and then fill them in from strings?
If I have to define the names of the operations anyway, why not put them into an array?
C was written as a general purpose language, which means that it should be useful in any situation, so they had to ensure that it had the constructs to be useful for writing operating systems as well as embedded systems.
An array is a shorthand way to specify pointing to the beginning of a malloc for example.
But, imagine trying to do matrix math by using pointer manipulations rather than vec[x] * vec[y]. It would be very prone to difficult to find errors.
See this question discussing space hardening and C. Sometimes dynamic memory allocation is just a bad idea, I have worked with C libraries that are completely devoid of malloc() and friends.
You don't want a satellite dereferencing a NULL pointer any more than you want air traffic control software forgetting to zero out heap blocks.
Its also important (as others have pointed out) to understand what is part of C and what extends it into various uniform standards (i.e. POSIX).
Arrays are a nice syntax improvement compared to dealing with pointers. You can make all sorts of mistakes unknowingly when dealing with pointers. What if you move too many spaces across the memory because you're using the wrong byte size?
Explanation by Dennis Ritchie about C history:
Embryonic C
NB existed so briefly that no full description of it was written. It supplied the types int and char, arrays of them, and pointers to them, declared in a style typified by
int i, j;
char c, d;
int iarray[10];
int ipointer[];
char carray[10];
char cpointer[];
The semantics of arrays remained exactly as in B and BCPL: the declarations of iarray and carray create cells dynamically initialized with a value pointing to the first of a sequence of 10 integers and characters respectively. The declarations for ipointer and cpointer omit the size, to assert that no storage should be allocated automatically. Within procedures, the language's interpretation of the pointers was identical to that of the array variables: a pointer declaration created a cell differing from an array declaration only in that the programmer was expected to assign a referent, instead of letting the compiler allocate the space and initialize the cell.
Values stored in the cells bound to array and pointer names were the machine addresses, measured in bytes, of the corresponding storage area. Therefore, indirection through a pointer implied no run-time overhead to scale the pointer from word to byte offset. On the other hand, the machine code for array subscripting and pointer arithmetic now depended on the type of the array or the pointer: to compute iarray[i] or ipointer+i implied scaling the addend i by the size of the object referred to.
These semantics represented an easy transition from B, and I experimented with them for some months. Problems became evident when I tried to extend the type notation, especially to add structured (record) types. Structures, it seemed, should map in an intuitive way onto memory in the machine, but in a structure containing an array, there was no good place to stash the pointer containing the base of the array, nor any convenient way to arrange that it be initialized. For example, the directory entries of early Unix systems might be described in C as
struct {
int inumber;
char name[14];
};
I wanted the structure not merely to characterize an abstract object but also to describe a collection of bits that might be read from a directory. Where could the compiler hide the pointer to name that the semantics demanded? Even if structures were thought of more abstractly, and the space for pointers could be hidden somehow, how could I handle the technical problem of properly initializing these pointers when allocating a complicated object, perhaps one that specified structures containing arrays containing structures to arbitrary depth?
The solution constituted the crucial jump in the evolutionary chain between typeless BCPL and typed C. It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today's C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.
To summarize in my own words - if name above were just a pointer, any of that struct would contain an additional pointer, destroying the perfect mapping of it to an external object (like an directory entry).