What's the point of VLA anyway? - c

I understand what variable length arrays are and how they are implemented. This question is about why they exist.
We know that VLAs are only allowed within function blocks (or prototypes) and that they basically cannot be anywhere but on the stack (assuming the normal implementation): C11, 6.7.6.2-2:
If an identifier is declared as having a variably modified type, it shall be an ordinary
identifier (as defined in 6.2.3), have no linkage, and have either block scope or function
prototype scope. If an identifier is declared to be an object with static or thread storage
duration, it shall not have a variable length array type.
Let's take a small example:
void f(int n)
{
int array[n];
/* etc */
}
there are two cases that need to be taken care of:
n <= 0: f has to guard against this, otherwise the behavior is undefined: C11, 6.7.6.2-5 (emphasis mine):
If the size is an expression that is not an integer constant expression: if it occurs in a
declaration at function prototype scope, it is treated as if it were replaced by *; otherwise,
each time it is evaluated it shall have a value greater than zero. The size of each instance
of a variable length array type does not change during its lifetime. Where a size
expression is part of the operand of a sizeof operator and changing the value of the
size expression would not affect the result of the operator, it is unspecified whether or not
the size expression is evaluated.
n > stack_space_left / element_size: There is no standard way of finding how much stack space is left (since there is no such thing as stack so long as the standard is concerned). So this test is impossible. Only sensible solution is to have a predefined maximum possible size for n, say N, to make sure stack overflow doesn't occur.
In other words, the programmer must make sure 0 < n <= N for some N of choice. However, the program should work for n == N anyway, so one might as well declare the array with constant size N rather than variable length n.
I am aware that VLAs were introduced to replace alloca (as also mentioned in this answer), but in effect they are the same thing (allocate variable size memory on stack).
So the question is why did alloca and consequently VLA exist and why weren't they deprecated? The only safe way to use VLAs seem to me to be with a bounded size in which case taking a normal array with the maximum size is always a viable solution.

For reasons that are not entirely clear to me, almost every time the topic of C99 VLA pops up in a discussion, people start talking predominantly about the possibility of declaring run-time-sized arrays as local objects (i.e. creating them "on the stack"). This is rather surprising and misleading, since this facet of VLA functionality - support for local array declarations - happens to be a rather auxiliary, secondary capability provided by VLA. It does not really play any significant role in what VLA can do. Most of the time, the matter of local VLA declarations and their accompanying potential pitfalls is forced into the foreground by VLA critics, who use it as a "straw man" intended to derail the discussion and bog it down among barely relevant details.
The essence of VLA support in C is, first and foremost, a revolutionary qualitative extension of the language's concept of type. It involves the introduction of such fundamentally new kind of types as variably modified types. Virtually every important implementation detail associated with VLA is actually attached to its type, not to the VLA object per se. It is the very introduction of variably modified types into the language that makes up the bulk of the proverbial VLA cake, while the ability to declare objects of such types in local memory is nothing more than a insignificant and fairly inconsequential icing on that cake.
Consider this: every time one declares something like this in one's code
/* Block scope */
int n = 10;
...
typedef int A[n];
...
n = 5; /* <- Does not affect `A` */
size-related characteristics of the variably modified type A (e.g. the value of n) are finalized at the exact moment when the control passes over the above typedef-declaration. Any changes in the value of n made further down the line (below this declaration of A) don't affect the size of A. Stop for a second and think about what it means. It means that the implementation is supposed to associate with A a hidden internal variable, which will store the size of the array type. This hidden internal variable is initialized from n at run time when the control passes over the declaration of A.
This gives the above typedef-declaration a rather interesting and unusual property, something we haven't seen before: this typedef-declaration generates executable code (!). Moreover, it doesn't just generate executable code, it generates critically important executable code. If we somehow forget to initialize the internal variable associated with such typedef-declaration, we'll end up with a "broken"/uninitialized typedef alias. The importance of that internal code is the reason why the language imposes some unusual restrictions on such variably modified declarations: the language prohibits passing control into their scope from outside of their scope
/* Block scope */
int n = 10;
goto skip; /* Error: invalid goto */
typedef int A[n];
skip:;
Note once again that the above code does not define any VLA arrays. It simply declares a seemingly innocent alias for a variably modified type. Yet, it is illegal to jump over such typedef-declaration. (We are already familiar with such jump-related restrictions in C++, albeit in other contexts).
A code-generating typedef, a typedef that requires run-time initialization is a significant departure from what typedef is in the "classic" language. (It also happens to pose a significant hurdle of the way of adoption of VLA in C++.)
When one declares an actual VLA object, in addition to allocating the actual array memory the compiler also creates one or more hidden internal variables, which hold the size(s) of the array in question. One has to understand that these hidden variables are associated not with the array itself, but rather with its variably modified type.
One important and remarkable consequence of this approach is as follows: the additional information about array size, associated with a VLA, is not built directly into the object representation of the VLA. It is actually stored besides the array, as "sidecar" data. This means that object representation of a (possibly multidimensional) VLA is fully compatible with object representation of an ordinary classic compile-time-sized array of the same dimensionality and the same sizes. For example
void foo(unsigned n, unsigned m, unsigned k, int a[n][m][k]) {}
void bar(int a[5][5][5]) {}
int main(void)
{
unsigned n = 5;
int vla_a[n][n][n];
bar(a);
int classic_a[5][6][7];
foo(5, 6, 7, classic_a);
}
Both function calls in the above code are perfectly valid and their behavior is fully defined by the language, despite the fact that we pass a VLA where a "classic" array is expected, and vice versa. Granted, the compiler cannot control the type compatibility in such calls (since at least one of the involved types is run-time-sized). However, if desired, the compiler (or the user) has everything necessary to perform the run-time check in debug version of code.
(Note: As usual, parameters of array type are always implicitly adjusted into parameters of pointer type. This applies to VLA parameter declarations exactly as it applies to "classic" array parameter declarations. This means that in the above example parameter a actually has type int (*)[m][k]. This type is unaffected by the value of n. I intentionally added a few extra dimensions to the array to maintain its dependence on run-time values.)
Compatibility between VLA and "classic" arrays as function parameters is also supported by the fact that the compiler does not have to accompany a variably modified parameter with any additional hidden information about its size. Instead, the language syntax forces the user to pass this extra information in the open. In the above example the user was forced to first include parameters n, m and k into function parameter list. Without declaring n, m and k first, the user would not have been able to declare a (see also the above note about n). These parameters, explicitly passed into the function by the user, will bring over the information about the actual sizes of a.
For another example, by taking advantage of VLA support we can write the following code
#include <stdio.h>
#include <stdlib.h>
void init(unsigned n, unsigned m, int a[n][m])
{
for (unsigned i = 0; i < n; ++i)
for (unsigned j = 0; j < m; ++j)
a[i][j] = rand() % 100;
}
void display(unsigned n, unsigned m, int a[n][m])
{
for (unsigned i = 0; i < n; ++i)
for (unsigned j = 0; j < m; ++j)
printf("%2d%s", a[i][j], j + 1 < m ? " " : "\n");
printf("\n");
}
int main(void)
{
int a1[5][5] = { 42 };
display(5, 5, a1);
init(5, 5, a1);
display(5, 5, a1);
unsigned n = rand() % 10 + 5, m = rand() % 10 + 5;
int (*a2)[n][m] = malloc(sizeof *a2);
init(n, m, *a2);
display(n, m, *a2);
free(a2);
}
This code is intended to draw your attention to the following fact: this code makes heavy use of valuable properties of variably modified types. It is impossible to implement elegantly without VLA. This is the primary reason why these properties are desperately needed in C to replace the ugly hacks that were used in their place previously. Yet at the same time, not even a single VLA is created in local memory in the above program, meaning that this popular vector of VLA criticism is not applicable to this code at all.
Basically, the two last examples above is a concise illustration of what the point of VLA support is.

Looking at the comments and the answers, it seems to me that VLAs are useful when you know that normally your input is not too big (similar to knowing your recursion is probably not too deep), but you don't actually have an upper bound, and you would generally ignore the possible stack overflow (similar to ignoring them with recursion) hoping they don't happen.
It may actually be not an issue altogether either, for example if you have unlimited stack size.
That said, here's another use for them I have found which doesn't actually allocate memory on stack, but makes working with dynamic multi-dimensional arrays easier. I'll demonstrate by a simple example:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
size_t n, m;
scanf("%zu %zu", &n, &m);
int (*array)[n][m] = malloc(sizeof *array);
for (size_t i = 0; i < n; ++i)
for (size_t j = 0; j < m; ++j)
(*array)[i][j] = i + j;
free(array);
return 0;
}

Despite of all the points you mentioned about VLA, the best part of VLA is that the compiler automatically handles the storage management and the complexities of index calculations of arrays whose bounds are not compile-time constants.
If you want local dynamic memory allocation then the only option is VLA.
I think this could be the reason that VLA is adopted in C99 (optional on C11).
One thing I want to clear that is there are some remarkable differences between alloca and VLA. This post points out the differences:
The memory alloca() returns is valid as long as the current function persists. The lifetime of the memory occupied by a VLA is valid as long as the VLA's identifier remains in scope.
You can alloca() memory in a loop for example and use the memory outside the loop, a VLA would be gone because the identifier goes out of scope when the loop terminates.

Your argument seems to be that since one has to bound check the size of the VLA, why not just allocate the maximum size and be done with the runtime allocation.
That argument overlooks the fact that memory is a limited resource in the system, shared between many processes. Memory wastefully allocated in one process is not available to any other (or perhaps it is, but at the expense of swapping to disk).
By the same argument we would not need to malloc an array at run time when we could statically allocate the maximum size that could be needed. In the end heap exhaustion is only slightly preferable to stack overflow.

Stack allocation (a so VLA allocation) is VERY fast, just requires a quick modification to the stack pointer (typically a single CPU instuction). No need for expensive heap allocation/deallocation.
But, why not just use a constant size array instead?
Let suppose you are writing a high performance code, and you need a variable size buffer, let say between 8 and 512 elements. You can just declare a 512 elements array, but if most of the times you only require 8 elements then overallocating can affect the performance due to affecting the cache locality in the stack memory. Now imagine this function has to be called millions of times.
Another example, imagine your function (with a local VLA) is recursive, you know beforehand that in any moment the total size of all recursively allocated VLAs is limited (i.e. the arrays have variable size, but the sum of all sizes is bounded). In this case, if you use the maximun possible size as fixed local array size you may allocate much more memory than otherwise required, making your code slower (due to cache misses) and even causing stack overflows.

VLAs do not have to allocate any memory or only stack memory. They are very handy in many aspects of programming.
Some examples
Used as function parameters.
int foo(size_t cols, int (*array)[cols])
{
//access as normal 2D array
prinf("%d", array[5][6]);
/* ... */
}
Allocate 2D(or more) array dynamically
inr foo(size_t rows, size_t cols)
{
int (*array)[cols] = malloc(rows * sizeof(*array));
/* ... */
//access as normal 2D array
prinf("%d", array[5][6]);
/* ... */

Related

Static array size vs array size through input in C

I was trying to find out the reason why arrays have static size in C, and so far I know that dynamic allocation can impact how long it takes for code to execute. Also, I know that another reason would be that its size needed to be fixed at compile time, but here comes the problem:
What happens when I have something like
int n;
scanf("%d", &n);
int arr[n];
What's the difference if my array size were a static value, just like
int arr[3];
What's the difference if my array size were a static value ... like int arr[3]?
You'll notice the difference when someone types in "1234567890", in which case the variable-length array will overflow the stack. Of course, if you try to define an overly large static array, you'll also get a stack overflow.
The performance question surely can't matter, since any code in which you parse formatted I/O is not going to be in your performance-critical tight loop. If you were to get n from a function parameter, then there are probably cases where the compiler's ability to know the size of your data could impact its ability to apply various optimizations. For example, if you were to call memset() on your fixed-size array, the compiler might well be able to replace that with one or a few machine instructions rather than the full code of memset(), efficient though it might be.
int arr[n]; it is not dynamic allocation. Dynamic allocation happens when you use malloc family of functions.
What's the difference if my array size were a static value, just like
int arr[3];
There is almost no difference from the performance perspective (a few more machine code instructions.
Bear in mind that VLAs can be only defined in the function (or more general block) scope (ie they have to have automatic storage duration). They cannot also be initialized when defined.
In this code snippet
int n;
scanf("%d", &n);
int arr[n];
there is declared a variable length array. Its size is determined at run-time according to the entered value of the variable n.
Pay attention to that the value of the variable n shall be greater than zero.
You may not initialize such an array in its declaration. And such an array may be declared only in a block scope. That is the array shall have automatic storage duration.
In this declaration
int arr[3];
there is declared an array with a fixed size. You may initialize it in its declaration like for example
int arr[3] = { 0 };
Such an array may be declared in a file scope or a block scope.

Integer array size in C without using dynamic memory allocation

I need to declare an array of structures with size symbolnum, but because symbolnum is variable C will produce an error when i write the following code:
extern int symbolnum;
struct SymbTab stab[symbolnum];
I already tried:
extern int symbolnum;
const int size = symbolnum;
struct SymTab stab[size];
Is there a way to achieve this without using dynamic memory allocation functions like malloc() or initializing the size of array using a very big number?
The size of global arrays must be fixed at compile time. VLAs are only allowed inside of functions.
If the value of symbolnum isn't known until runtime, you'll need to declare stab as a pointer type and allocate memory for it dynamically.
Alternately, if the array doesn't take up more than a few 10s of KB, you can define the VLA in the main function and set a global pointer to it.
C11 and later permit variable-length arrays as an optional feature. C99 permitted them as a mandatory feature. However, in no case are VLAs permitted at file scope, which appears to be what you are trying to achieve.
And file-scope VLAs wouldn't make sense in light of C semantics anyway. Objects declared at file scope have static storage duration, meaning that they come into existence at or before the beginning of program execution, and live until program termination. That means the array length is needed before the variable can take anything other than its initial value (which is zero or an integer constant expression), so one might as well just use that initial value directly.
Additionally, some C implementations (notably MSVC) never implemented VLAs even when C99 was the current standard, and have no intention to do so now that the feature is optional in the current standard.
So,
Is there a way to achieve this without using dynamic memory allocation
functions like malloc() or initializing the size of array using a very
big number?
It depends a bit on your exact needs (and on your C implementation), but likely not. One possibility would be to use a local VLA in main(), or in some other function whose execution encompasses the whole need for the array. If your C implementation supports it them you could declare a VLA there, and pass around a pointer to that. But note well that if the upper bound on the number of elements you need is really "a very large number" then this is unlikely to be suitable. VLAs are typically allocated on the stack, which puts a relatively tight bound on how large they can be.
"Is there a way to achieve this without using dynamic memory allocation functions like malloc() or initializing the size of array
using a very big number?"
If you can use variable length array (VLA), then yes it can be done. The following illustrates one way...
With a struct definition in global space, (eg, top of .c file, or in .h file) local array instances of that struct can be created using a VLA, keeping in mind the stipulations mentioned in the link for using VLA. The VLA struct array can then be passed as a function parameter, to either be used in the called function, or to be updated and returned, just as any other function parameter is used. Here is a simple example:
//define either at top of .c file in file global space
//or in a header file that is included in any .c. Then
//the typedef num_s can be used to create instances where needed
//
typedef struct SymbTab{
int iVal;
double dVal;
} SymbTab_s;
void populate_struct(size_t symbolnum, SymbTab_s *stab);
int main(void)
{
size_t symbolnum = 0;//(using size_t). note, as you describe, this
//comes after flex analysis normally, but for
//this demo user input is used for simple
// example of dynamically sizing array.
printf("Enter symbolnum of struct array:\n");
scanf("%zu", &symbolnum);
SymbTab_s stab[symbolnum]; //dynamically sized array of SymbTab_s
memset(stab, 0, sizeof(SymbTab_s) * symbolnum); //initialize new array
populate_struct(symbolnum, stab); //pass as function argument and update values
//demo updated values
for(int i=0; i < symbolnum; i++)
{
printf("%d, %lf\n", stab[i].iVal, stab[i].dVal);
}
return 0;
}
//simple function to demo form of parameters
//note size parameter is passed first
void populate_struct(size_t symbolnum, SymbTab_s *stab)
{
for(int i=0; i < symbolnum; i++)
{
stab[i].iVal = i;
stab[i].dVal = 1.0*i;
}
}

Passing parameters to a function to efficiently create array allocated on the stack

I have a function that needs external parameters and afterwards creates variables that are heavily used inside that function. E.g. the code could look like this:
void abc(const int dim);
void abc(const int dim) {
double arr[dim] = { 0.0 };
for (int i = 0; i != dim; ++i)
arr[i] = i;
// heavy usage of the arr
}
int main() {
const int par = 5;
abc(par);
return 0;
}
But I am getting a compiler error, because the allocation on the stack needs compile-time constants. When I tried allocating manually on the stack with _malloca, the time performance of the code worsened (compared to the case when I declare the constant par inside the abc() function). And I don't want the array arr to be on the heap, because it is supposed to contain only small amount of values and it is going to get used quite often inside the function. Is there some way to combine the efficiency while keeping the possibility to pass the size parameter of an array to the function?
EDIT: I am using MSVC compiler and I received an error C2131: expression did not evaluate to a constant in VC 2017.
If you're using a modern C compiler, that implements the entire C99, or the C11 with variable-length array extension, this would work, with one little modification:
void abc(const int dim);
void abc(const int dim) {
double arr[dim];
for (int i = 0; i != dim; ++i)
arr[i] = i;
// heavy usage of the arr
}
int main(void) {
const int par = 5;
abc(par);
return 0;
}
I.e. double arr[dim] would work - it doesn't have a compile-time constant size, but it is enough to know its size at runtime. However, such a VLA cannot be initialized.
Unfortunately MSVC is not a modern C compiler / at MS they don't want to implement the VLA themselves - and I even suspect they're a big part of why the VLA's were made optional in C11, so you'd need to define the array in main then pass a pointer to it to the function abc; or if the size is globally constant, use an actual compile-time constant, i.e. a #define.
However, you're not showing the actual code that you're having performance problems with. It might very well be that the compiler can produce optimized output if it knows the number of iterations - if that is true, then the "globally defined size" might be the only way to get excellent performance.
Unfortunately the Microsoft Compiler does not support variable length arrays.
If the array is not too large you could allocate by the largest possible size needed and pass a pointer to that stack array and a dimension to the function. This approach could help limit the number of allocations.
Another option is to implement a simple heap allocated global pool for functions of this type to use. The pool would allocate a large continuous chunk on the heap and then you can get a pointer to your reservation in the pool. The benefit of this approach is you will not have to worry about over allocation on the stack causing a segmentation fault (which can happen with variable length arrays).

Why can't I create an array with size determined by a global variable?

Why does the array a not get initialized by global variable size?
#include<stdio.h>
int size = 5;
int main()
{
int a[size] = {1, 2, 3, 4, 5};
printf("%d", a[0]);
return 0;
}
The compilation error is shown as
variable-sized object may not be initialized
According to me, the array should get initialized by size.
And what would be the answer if I insist on using global variable (if it is possible)?
In C99, 6.7.8/3:
The type of the entity to be
initialized shall be an array of
unknown size or an object type that is
not a variable length array type.
6.6/2:
A constant expression can be evaluated
during translation rather than runtime
6.6/6:
An integer constant expression
shall have integer type and shall only
have operands that are integer
constants, enumeration constants,
character constants, sizeof
expressions whose results are integer
constants, and floating constants that
are the immediate operands of casts.
6.7.5.2/4:
If the size is an integer constant
expression and the element type has a
known constant size, the array type is
not a variable length array type;
otherwise, the array type is a
variable length array type.
a has variable length array type, because size is not an integer constant expression. Thus, it cannot have an initializer list.
In C90, there are no VLAs, so the code is illegal for that reason.
In C++ there are also no VLAs, but you could make size a const int. That's because in C++ you can use const int variables in ICEs. In C you can't.
Presumably you didn't intend a to have variable length, so what you need is:
#define size 5
If you actually did intend a to have variable length, I suppose you could do something like this:
int a[size];
int initlen = size;
if (initlen > 5) initlen = 5;
memcpy(a, (int[]){1,2,3,4,5}, initlen*sizeof(int));
Or maybe:
int a[size];
for (int i = 0; i < size && i < 5; ++i) {
a[i] = i+1;
}
It's difficult to say, though, what "should" happen here in the case where size != 5. It doesn't really make sense to specify a fixed-size initial value for a variable-length array.
You don't need to tell the compiler what size the array is if you're giving an initializer. The compiler will figure out the size based on how many elements you're initializing it with.
int a[] = {1,2,3,4,5};
Then you can even let the compiler tell you the size by getting the total size of the array in bytes sizeof(a) and dividing it by the size of one element sizeof(a[0]):
int size = sizeof(a) / sizeof(a[0]);
The compiler cannot assume that the value of size is still 5 by the time main() gets control. If you want a true constant in an old-style C project, use:
#define size 5
size is a variable, and C does not allow you to declare (edit: C99 allows you to declare them, just not initialize them like you are doing) arrays with variable size like that. If you want to create an array whose size is a variable, use malloc or make the size a constant.
It looks like that your compiler is not C99 Compliant...speaking of which, which compiler are you using? If it's gcc you need to pass the switch '-std=c99'.... if you are using a pre-C99 compiler, that statement is illegal, if that's the case, do this:
int main() {
int a[5]={1,2,3,4,5};
printf("%d",a[0]);
return 0;
}
In pre-C99 standard compilers, use a constant instead of a variable.
Edit: You can find out more about the C99 standard here... and here....
The compiler needs to know the size of the array while declaring it.
Because the size of an array doesn't change after its declaration.
If you put the size of the array in a variable, you can imagine that the value of that variable will change when the program is executed.
In this case, the compiler will be forced to allocate extra memory to this array.
In this case, this is not possible because the array is a static data structure allocated on the stack.
I hope that this will help.
#include<stdio.h>
/* int size=5; */
#define size 5 /* use this instead*/
/*OR*/
int a[size]={1,2,3,4,5}; /* this*/
int main()
{
int a[size]={1,2,3,4,5};
printf("%d",a[0]);
return 0;
}
int size means that size is a variable and C does not allow variablesize arrays.
I am using VS2008 where using
const int size=5;
allows
int a[size]={1,2,3,4,5};
You cannot create arrays with globally variable size the same reason why you cannot create an array with size determined by a variable in general. The reason is because C++ enables manual memory management, which let's be honest is the reason why we learn this language, and when we allocate memory, we need to keep in mind the advantages and drawbacks of its two types and what we can do with it.
Stack memory literally implements the stack data structure. It has fixed size (a few megabytes as far as I know), and when you put any data on the stack, you push it to the top. Stack fits perfectly for scoped variables that are supposed to only live for a restricted amount of time, which means it will disappear as soon as they see }. Stacks use functions as a unit of grouping all variables, and whenever we call any function (including main), it pushes the total amount of memory this function uses, which is the sum of all variables it allocates. This total size is called stack frame, and it must be constant. Otherwise, if we allocate an array dynamically on the stack:
int size;
scanf("%d", %size);
int array[size];
We don't know how much space we need to reserve for the array and its stack frame responsively, which makes this a forbidden operations. You can, however, initialise it with a constant and constant expression:
constexprt int getSize(int n){
return n * 2; //This can be anything, I just multiplied to get arbitrary value.
}
const int size = 45;
int constantArray[size]; //Works
int constantExpressionArray[getSize(2)]; //Also works
The reason why constant works is because they are always assured to have the same and known size, and furthermore they don't take up space in memory but compilers just substitute all constant calls with their respective values. Constant expresions also are functions that are executed in the compile-time, which means all values must be known, and then it just substitudes the result (that must always be the same) into its call, which assure the compiler in both cases the size of the arrays are going to be the same (45 for the first one and 4 for the second one).
If you want to have an array with size that veries each time you execute it, you should either allocate it dynamically:
int* dynamicArray = new int[variableSize];
or use std::vector:
std::vector<int> dynamicArray(variableSize);
In the latter case it sets variableSize to be the capacity,which is the length of the array it needs to exceed to resize. If you use it responsibly, you will end up simply using the array underneath it and will only suffer from the performance penalties imposed by dynamically allocating memory and jumping between the functions.

Is there a standard function in C that would return the length of an array?

Is there a standard function in C that would return the length of an array?
Often the technique described in other answers is encapsulated in a macro to make it easier on the eyes. Something like:
#define COUNT_OF( arr) (sizeof(arr)/sizeof(0[arr]))
Note that the macro above uses a small trick of putting the array name in the index operator ('[]') instead of the 0 - this is done in case the macro is mistakenly used in C++ code with an item that overloads operator[](). The compiler will complain instead of giving a bad result.
However, also note that if you happen to pass a pointer instead of an array, the macro will silently give a bad result - this is one of the major problems with using this technique.
I have recently started to use a more complex version that I stole from Google Chromium's codebase:
#define COUNT_OF(x) ((sizeof(x)/sizeof(0[x])) / ((size_t)(!(sizeof(x) % sizeof(0[x])))))
In this version if a pointer is mistakenly passed as the argument, the compiler will complain in some cases - specifically if the pointer's size isn't evenly divisible by the size of the object the pointer points to. In that situation a divide-by-zero will cause the compiler to error out. Actually at least one compiler I've used gives a warning instead of an error - I'm not sure what it generates for the expression that has a divide by zero in it.
That macro doesn't close the door on using it erroneously, but it comes as close as I've ever seen in straight C.
If you want an even safer solution for when you're working in C++, take a look at Compile time sizeof_array without using a macro which describes a rather complex template-based method Microsoft uses in winnt.h.
No, there is not.
For constant size arrays you can use the common trick Andrew mentioned, sizeof(array) / sizeof(array[0]) - but this works only in the scope the array was declared in.
sizeof(array) gives you the size of the whole array, while sizeof(array[0]) gives you the size of the first element.
See Michaels answer on how to wrap that in a macro.
For dynamically allocated arrays you either keep track of the size in an integral type or make it 0-terminated if possible (i.e. allocate 1 more element and set the last element to 0).
sizeof array / sizeof array[0]
The number of elements in an array x can be obtained by:
sizeof(x)/sizeof(x[0])
You need to be aware that arrays, when passed to functions, are degraded into pointers which do not carry the size information. In reality, the size information is never available to the runtime since it's calculated at compile time, but you can act as if it is available where the array is visible (i.e., where it hasn't been degraded).
When I pass arrays to a function that I need to treat as arrays, I always ensure two arguments are passed:
the length of the array; and
the pointer to the array.
So, whilst the array can be treated as an array where it's declared, it's treated as a size and pointer everywhere else.
I tend to have code like:
#define countof(x) (sizeof(x)/sizeof(x[0]))
: : :
int numbers[10];
a = fn (countof(numbers),numbers);
then fn() will have the size information available to it.
Another trick I've used in the past (a bit messier in my opinion but I'll give it here for completeness) is to have an array of a union and make the first element the length, something like:
typedef union {
int len;
float number;
} tNumber;
tNumber number[10];
: : :
number[0].len = 5;
a = fn (number);
then fn() can access the length and all the elements and you don't have to worry about the array/pointer dichotomy.
This has the added advantage of allowing the length to vary (i.e., the number of elements in use, not the number of units allocated). But I tend not to use this anymore since I consider the two-argument array version (size and data) better.
I created a macro that returns the size of an array, but yields a compiler error if used on a pointer. Do however note that it relies on gcc extensions. Because of this, it's not a portable solution.
#define COUNT(a) (__builtin_choose_expr( \
__builtin_types_compatible_p(typeof(a), typeof(&(a)[0])), \
(void)0, \
(sizeof(a)/sizeof((a)[0]))))
int main(void)
{
int arr[5];
int *p;
int x = COUNT(arr);
// int y = COUNT(p);
}
If you remove the comment, this will yield: error: void value not ignored as it ought to be
The simple answer, of course, is no. But the practical answer is "I need to know anyway," so let's discuss methods for working around this.
One way to get away with it for a while, as mentioned about a million times already, is with sizeof():
int i[] = {0, 1, 2};
...
size_t i_len = sizeof(i) / sizeof(i[0]);
This works, until we try to pass i to a function, or take a pointer to i. So what about more general solutions?
The accepted general solution is to pass the array length to a function along with the array. We see this a lot in the standard library:
void *memcpy(void *s1, void *s2, size_t n);
Will copy n bytes from s1 to s2, allowing us to use n to ensure that our buffers never overflow. This is a good strategy - it has low overhead, and it actually generates some efficient code (compare to strcpy(), which has to check for the end of the string and has no way of "knowing" how many iterations it must make, and poor confused strncpy(), which has to check both - both can be slower, and either could be sped up by using memcpy() if you happen to have already calculated the string's length for some reason).
Another approach is to encapsulate your code in a struct. The common hack is this:
typedef struct _arr {
size_t len;
int arr[0];
} arr;
If we want an array of length 5, we do this:
arr *a = malloc(sizeof(*a) + sizeof(int) * 5);
a->len = 5;
However, this is a hack that is only moderately well-defined (C99 lets you use int arr[]) and is rather labor-intensive. A "better-defined" way to do this is:
typedef struct _arr {
size_t len;
int *arr;
} arr;
But then our allocations (and deallocations) become much more complicated. The benefit of either of these approaches is, of course, that now arrays you make will carry around their lengths with them. It's slightly less memory-efficient, but it's quite safe. If you chose one of these paths, be sure to write helper functions so that you don't have to manually allocate and deallocate (and work with) these structures.
If you have an object a of array type, the number of elements in the array can be expressed as sizeof a / sizeof *a. If you allowed your array object to decay to pointer type (or had only a pointer object to begin with), then in general case there's no way to determine the number of elements in the array.

Resources