Assert the allocation of a variable-length array - c

I apologize for the possible duplicate (have not been able to find an answer to that):
Do we need to ensure that the allocation of a variable-length array has completed successfully?
For example:
void func(int size)
{
int arr[size];
if (arr == NULL)
{
// Exit with a failure
}
else
{
// Continue as planned
}
}
It seems obvious that the answer is yes, but the syntax arr == NULL feels a bit unusual.
Thanks
UPDATE:
I admit to the fact that I haven't made sure that the code above even compiles (assuming that it does).
If it doesn't compile, then it means that there is no way to assert the allocation of a variable-length array.
Hence, I assume that if the allocation fails, then the program crashes immediately.
This would be a very awkward case, as it makes sense for a program to crash after an illegal memory access (read or write), but not after a non-successful memory allocation.
Or perhaps the allocation will not cause anything, but as soon as I access the array at an entry which "falls" outside the stack, I might get a memory access violation (as in a stack-overflow)...?
To be honest, I can't even see how VLAs are allocated on the stack if any more local variables follow them (other VLAs in particular), so I would appreciate an answer on that issue as well.

This question proceeds from a slightly flawed first premise. You cannot check if an array is NULL because, as is a popular discussion topic, an array is not a pointer in C. An array is the storage object, in-place.
You cannot get to code where the array name is accessible without the array having been allocated. A local array is exactly the same as any other local variable: its existence is inherent and assumed for the surrounding code to be running at all, and there's no notion in the language for checking whether any given variable slot has been "allocated" at all (as the comments on the question note, "the stack" is a notion below the level C operates on - the language assumes it "happens", by unspecified magic). It has to assume this much always succeeds in order for the code to make sense on a most basic level.
What happens in the case where the array couldn't be allocated is therefore the same as whatever happens when the runtime can't allocate space for any other local variable - the situation is inherently undefined and undefinable, because an assumption made by the C language abstract machine was violated. The language has no (fully formal) concepts that can even express this, let alone check for it or recover from it, so testing for it is similarly out of scope. Like a stack overflow, this is basically guaranteed to lead to a fatal crash.
This does not make VLAs useless, for several reasons:
Many uses of VLAs aren't going to be life-threateningly huge. Perhaps the only use of the variation is to choose a number between 3 and 5? This is no worse for space than using a few more scalar locals.
Just as avoiding infinite recursion requires the programmer to prove certain properties that a C compiler doesn't, similarly you should design your program with at least a weak bound on the amount of space VLAs will be allowed to consume at any given time. For instance, you can prove to yourself that no VLA functions are ever recursive, or called from a recursive function, and none of them use more than e.g. 10K space - that's plenty useful and should be safe.
You can view VLAs as an optimisation to allow you to save space where you otherwise would have had to allocate a statically-sized local array (e.g. in the first example, always allocating 5 instead of 3). As long as you know, and design around, the static upper bound, they are effectively guaranteed to make your program safer from overflow, by providing an option to not always use as much space when it isn't required.

Related

Why does C not define minimum size for an array?

C standard defines a lot of lower/upper limits (translation limits) and imposes an implementation should satisfy for each translation. Why there's no such minimum limit defined for an array size? The following program is going to compile fine and likely produce runtime error/segfault and would invoke undefined behaviour.
int main()
{
int a[99999999];
int i;
for(i=0;i<99999999;i++)
a[i]=i;
return 0;
}
A possible reason could be local arrays are allocated on automatic storage and it depends on the size of the stack frame allocated. But why not a minimum limit like other limits defined by C?
Let's forget about the undefined cases like above. Consider the following:
int main()
{
int a[10];
int i;
for(i=0;i<10;i++)
a[i]=i;
return 0;
}
In the above, what gives me the guarantee that the local array (despite a very small one) is going to work as expected and won't cause undefined behaviour due to allocation failure?
Although it's unlikely that an allocation for such a small array would fail on any modern systems. But the C standard doesn't define any requirements to satisfy and compilers don't (at least GCC doesn't) report allocation failures. Only a runtime error/undefined behaviour is possibility. The hard part is nobody can tell whether an arbitrary sized array is going cause undefined behaviour due to allocation failure.
Note that I am aware I can use dynamic arrays (via malloc & friends) for this purpose and have a better control over allocation failures. I am more interested in why there's no such limit defined for local arrays. Also, global arrays are going to be stored in static storage and is going to increase executable size which compilers can handle.
Because C, the language, should not be imposing limitations on your available stack size. C operates in many (many) different environments. How could it possibly come up with a reasonable number? Hell, automatic storage duration != stack, a stack is an implementation detail. C, the language, says nothing of a "stack".
The environment decides this stuff, and for good reason. What if a certain environment implements automatic storage duration via an alternative method which imposes no such limitation? What if a breakthrough in hardware occurs and all of a sudden modern machines do not require such a limitation?
Should we rev the standard in such an event? We would have to if C, the language, specified such implementation details.
You've already answered your own question; it's due to stack limitation.* Even this might not work:
void foo(void) {
int a;
...
}
if the ... is actually a recursive call to foo.
In other words, this is nothing to do with arrays, as the same problem affects all local variables. The standard couldn't enforce a requirement, because in practice that would translate into a requirement for an infinite-sized stack.
* Yes, I know the C standard(s) don't talk about stacks. But that's the implicit model, in the sense that the standard was really a formalisation of the implementations that existed at the time.
The MINIMUM limit is an array of 1 element. Why would you have a "limit" for that? Of course, if you call a function recursively forever, an array of 1 may not fit on the stack, or the call that calls the function next call around may not fit on the stack - the only way to solve that would be to know the size of the stack in the compiler - but the compiler doesn't actually know at that stage how big the stack is - never mind the problems of extremely complex call hierarchies were several different functions call into the same function, possibly with recursion and/or several layers of rather large consumers of stack - how do you size the stack for that - the worst possible case may not be ever encountered, because other things dictate that this doesn't happen - for example, the worst case in one function is only when an input file is empty, but the worst case in another function is when there is lots of data stored in the same file. Lots and lots of variations like this. It's just too unreliable to determine, so sooner or later it would just become guess-work or of lots of false positives.
Consider a program with thousands of functions, all of which call the same logging function that needs a 200 byte array on the stack for temporarily storing the log output. It's called from just about every function from main upwards.
The MAXIMUM for a local variable depends on the size of the stack, which, like I said above, is not something the compiler knows when compiling your code [the linker MAY know, but that's later on]. For global arrays and those allocated on the heap, the limit is "how much memory your process can get", so there's no upper limit there.
There's just no easy way to determine this. And many of the limits provided by the standard is there to guarantee that code can be compiled on "any compiler" as long as your code follows the rules. Be compiled and be able to run to completion is two different things.
int main()
{
while(1);
}
will never run to completion - but it will compile in every compiler I know of, and most won't say a thing about there being an infinite loop - it's your choice to do that.
It's also your choice to put large arrays on the stack. And it could well be that the linker is given several gigabytes of stack, in which case it'll be fine - or the stack is 200K, and you can't have 50000 array of integers...

defining a simple array of integers

Why can i do this?
int array[4]; // i assume this creates an array with 4 indexes
array[25]=1; // i have assigned an index larger than the int declaration
NSLog(#"result: %i", array[25]); // this prints "1" to the screen
Why does this work, if the index exceeds the declaration? what is the significance of the number in the declaration if it has no effect on what you can actually do with the array?
Thanks.
You are getting undefined behavior. It could print anything, it could crash, it could burst into singing (okay that isn't likely but you get the idea).
If it happens to write to a location that is mapped with the adequate permissions it will work. Until one day when it won't because of a different layout.
it is undefined. some OS will give you segmentation fault, while some tolerate this. anyhow, exceeding the array's size should be avoided.
An array is really just a pointer to the start of a contiguous, allocated block of memory.
In this case, you have allocated 4 ints worth of memory.
So if you went array[2] it would think "the memory at array + sizeof(int) * 2"
Change the 2 to 25, and you're just looking at 25 int's worth of memory past the start. Since there's no checks to verify you're in bounds (either when assigning or printing) it works.
There's nothing to say somehting else might be allocated there though!
The number in the decleration determines how many memory should be reserved, in that case 4 * sizeof(int).
If you write to memory out of the bounds, this is possible but not recommended. In C you can access any point in memory available to your program, but if you write to that if it's not reserved for that thing, you can cause memory corruption. Arrays are pointers (but not the other way around).
The behavior depends on the compiler, the platform and some randomness. Don't do it.
It's doing very bad things. If the array is declared locally to a function, you're probably writing on stack locations above the current function's stack frame. If it is declared globally, you're probably writing on memory allocated to adjacent variables. Either way, it is "working" by pure luck.
It is possible your C compiler had padded your array for memory alignment purposes, and by luck your array overrun just happens to still be within the rounded-up allocation. Don't rely on it though.
This is unsafe programming. It really should be avoided because it may not crash your program. Which is really the best thing you could hope for. It could give you garbage results. These are unpredictable and could really screw up your program. However since you don't know that is wrong because it not crashing it will ruin the integrity of your data. Since there is no try/catch with C you really should check inputs. Remember scanf returns an int.
C by design does not perform array bounds checking. It was designed as a systems level language, and the overhead of explicit run-time bounds checking could be prohibitive in a great many cases. Consequently C will allow 'dangerous' code and must be used carefully. If you need something 'safer' then C# or Java may be more appropriate, but there is a definite performance hit.
Automated static analysis tools may help, and there are run-time bounds checking tools for use in development and debugging.
In C an array is a contiguous block of memory. By accessing the array out-of-bounds, you are simply accessing memory beyond the end of the array. What accessing such memory will do is non-deterministic, it may be junk, it may belong to an adjacent variable, or it may belong to the variables of the calling function or above. It maybe a return address for the current function or a calling function. In a memory protected OS such as Windows or Linux, if you access so far out of bounds as to be no longer within the address range assigned to the process, a fault exception will occur.

Why variables start out with random values in C

I think this is wrong, it should start as NULL and not with a random value. In the case that you have a pointer with a random memory address as its default value it could be a very dangerous thing, no?
The variables start out uninitialized because that's the fastest way - why waste the CPU cycles on initialization if you're going to write another value there anyway?
If you want a variable to be initialized after creation, just initialize it. :)
About it being a dangerous thing: Every good compiler will warn you if you try to use a variable without initialization.
No. C is a very efficient language, one that has traditionally been faster that a lot of other languages. One of the reasons for this is that it doesn't do too much on it's own. The programmer controls this.
In the case of initialization, C variables are not initialized to a random value. Rather, they are not initialized and so they contain whatever was at the memory location before.
If you wanted to initialize a variable to, say, 1 in your program, then it would be inefficient if the variable had already been initialized to zero or null. That would mean it was initialized twice.
Execution speed and overhead (or lack thereof) are the main reasons why. C is notorious for letting you walk off the proverbial cliff because it always assumes that the user knows better than it does.
Note that if you declared the variable as static it actually is guaranteed to be initialized to 0.
Variables start out with a random value because you are just handed a block of memory and told to deal with it yourself. It has whatever value that block of memory had before hand. Why should the program waste time setting the value to some arbitrary default when you are likely going to set it yourself later?
The design choice is performance, and it is one of the many reasons why C isn't the preferred language for most projects.
This has nothing to do with "if C were being designed today" or with efficiency of one initialization. Instead think of something like
void foo()
{
struct bar *ptrs[10000];
/* do something where only a few indices end up actually getting used */
}
Any language that forces useless initialization on you is doomed to be slow as hell for algorithms that can make use of sparse arrays where you don't care about the majority of the values, and have an easy way of knowing which values you care about.
If you don't like my example with such a large object on the stack, substitute malloc instead. It has the same semantics with regard to initialization.
In either case, if you want zero-initialization, you can get it with {0} or calloc.
It was a design choice made many ears ago, probably for efficiency reasons.
Statically allocated variables (globals and statics) are initialized to 0 if there's no explicit initialization - this could be justified even taking efficiency into account becuase it only occurs once. I'd guess the thinking was that for automatic variables (locals) that are allocated each time a scope is entered, implicit initialization was considered something that might cost too much and therefore should be left to the programmer's responsibility.
If C were being designed today, I wouldn't be surprised if that design decision were changed - especially since compilers are intelligent enough today to be able to optimize away an initialization that gets overwritten before any other use (or potential use).
However, there are so many C compiler toolchains that follow the spec of not initializing automatically, it would be foolish for a compiler to perform implicit initialization to a 'useful' value (like 0 or NULL). That would just encourage people targeting that tool chain to write code that didn't work correctly on other tool chains.
However, compilers can initialize local variables, and they often do. It's just that they initialize the locals to a values that's not generally useful (especially, that doesn't set a pointer to the null pointer). That kind of initialization isn't useful in writing your programming logic against, and it's not intended for that. It's intended to cause deterministic and reproducible errors so that if you erroneously use values that have been set by implicit initialization, you'll be able to find it easily in test/debug.
Usually this compiler behavior is turned on only for debug builds; I could see an argument being made for turning it on in release builds as well - particular if the release build can still optimize it away when the compiler can prove that the implicit initialized value is never used.

Is it bad practice to declare an array mid-function

in an effort to only ask what I'm really looking for here... I'm really only concerned if it's considered bad practice or not to declare an array like below where the size could vary. If it is... I would generally malloc() instead.
void MyFunction()
{
int size;
//do a bunch of stuff
size = 10; //but could have been something else
int array[size];
//do more stuff...
}
Generally yes, this is bad practice, although new standards allow you to use this syntax. In my opinion you must allocate (on the heap) the memory you want to use and release it once you're done with it. Since there is no portable way of checking if the stack is enough to hold that array you should use some methods that can really be checked - like malloc/calloc & free. In the embedded world stack size can be an issue.
If you are worried about fragmentation you can create your own memory allocator, but this is a totally different story.
That depends. The first clearly isn't what I'd call "proper", and the second is only under rather limited circumstances.
In the first, you shouldn't cast the return from malloc in C -- doing so can cover up the bug of accidentally omitting inclusion of the correct header (<stdlib.h>).
In the second, you're restricting the code to C99 or a gcc extension. As long as you're aware of that, and it works for your purposes, it's all right, but hardly what I'd call an ideal of portability.
As far as what you're really asking: with the minor bug mentioned above fixed, the first is portable, but may be slower than you'd like. If the second is portable enough for your purposes, it'll normally be faster.
For your question, I think each has its advantages and disadvantages.
Dynamic Allocation:
Slow, but you can detect when there is no memory to be given to your programmer by checking the pointer.
Stack Allocation:
Only in C99 and it is blazingly fast but in case of stackoverflow you are out of luck.
In summary, when you need a small array, reserve it on the stack. Otherwise, use dynamic memory wisely.
The argument against VLAs runs that because of the absolute badness of overflowing the stack, by the time you've done enough thinking/checking to make them safe, you've done enough thinking/checking to use a fixed-size array:
1) In order to safely use VLAs, you must know that there is enough stack available.
2) In the vast majority of cases, the way that you know there's enough stack is that you know an upper bound on the size required, and you know (or at least are willing to guess or require) a lower bound on the stack available, and the one is smaller than the other. So just use a fixed-size array.
3) In the vast majority of the few cases that aren't that simple, you're using multiple VLAs (perhaps one in each call to a recursive function), and you know an upper bound on their total size, which is less than a lower bound on available stack. So you could use a fixed-size array and divide it into pieces as required.
4) If you ever encounter one of the remaining cases, in a situation where the performance of malloc is unacceptable, do let me know...
It may be more convenient, from the POV of the source code, to use VLAs. For instance you can use sizeof (in the defining scope) instead of maintaining the size in a variable, and that business with dividing an array into chunks might require passing an extra parameter around. So there's some small gain in convenience, sometimes.
It's also easier to miss that you're using a humongous amount of stack, yielding undefined behavior, if instead of a rather scary-looking int buf[1920*1024] or int buf[MAX_IMG_SIZE] you have an int buf[img->size]. That works fine right up to the first time you actually handle a big image. That's broadly an issue of proper testing, but if you miss some possible difficult inputs, then it won't be the first or last test suite to do so. I find that a fixed-size array reminds me either to put in fixed-size checks of the input, or to replace it with a dynamic allocation and stop worrying whether it fits on the stack or not. There is no valid option to put it on the stack and not worry whether it fits...
two points from a UNIX/C perspective -
malloc is only slow when you force it to call brk(). Meaning for reasonable arrays it is the same as allocating stack space for a variable. By the way when you use method #2 (via alloca and in the code libc I have seen) also invokes brk() for huge objects. So it is a wash. Note: with #2 & #1 you still have to invoke directly or indirectly a memset-type call to zero the bytes in the array. This is just a side note to the real issue (IMO):
The real issue is memory leaks. alloca cleans up after itself when the function retruns so #2 is less likely to cause a problem. With malloc/calloc you have to call free() or start a leak.

C function: is this dynamic allocation? initializating an array with a changing length

Suppose I have a C function:
void myFunction(..., int nObs){
int myVec[nObs] ;
...
}
Is myVec being dynamically allocated? nObs is not constant whenever myFunction is called. I ask because I am currently programming with this habit, and a friend was having errors with his program where the culprit is he didn't dynamically allocate his arrays. I want to know whether my habit of programming (initializing like in the above example) is a safe habit.
Thanks.
To answer your question, it's not considered dynamic allocation because it's in the stack. Before this was allowed, you could on some platforms simulate the same variable length allocation on the stack with a function alloca, but that was not portable. This is (if you program for C99).
It's compiler-dependent. I know it's ok with gcc, but I don't think the C89 spec allows it. I'm not sure about newer C specs, like C99. Best bet for portability is not to use it.
It is known as a "variable length array". It is dynamic in the sense that its size is determined at run-time and can change from call to call, but it has auto storage class like any other local variable. I'd avoid using the term "dynamic allocation" for this, since it would only serve to confuse.
The term "dynamic allocation" is normally used for memory and objects allocated from the heap and whose lifetime are determined by the programmer (by new/delete, malloc/free), rather than the object's scope. Variable length arrays are allocated and destroyed automatically as they come in and out of scope like any other local variable with auto storage class.
Variable length arrays are not universally supported by compilers; particularly VC++ does not support C99 (and therefore variable length arrays), and there are no plans to do so. Neither does C++ currently support them.
With respect to it being a "safe habit", apart from the portability issue, there is the obvious potential to overflow the stack should nObs be sufficiently large a value. You could to some extent protect against this by making nObs a smaller integer type uint8_t or uint16_t for example, but it is not a very flexible solution, and makes bold assumptions about the size of the stack, and objects being allocated. An assert(nObs < MAX_OBS) might be advisable, but at that point the stack may already have overflowed (this may be OK though since an assert() causes termination in any case).
[edit]
Using variable length arrays is probably okay if the size is either not externally determined as in your example.
[/edit]
On the whole, the portability and the stack safety issues would suggest that variable length arrays are best avoided IMO.

Resources