Is using array arguments in C considered bad practice? - c

When declaring a function that accesses several consecutive values in memory, I usually use array arguments like
f(int a[4]);
It works fine for my purposes. However, I recently read the opinion of Linus Torvalds.
So I wonder if the array arguments are today considered obsolete? More particularly,
is there any case where the compiler can utilize this information (array size) to check out-of-bound access, or
is there any case where this technique brings some optimization opportunities?
In any case, what about pointers to arrays?
void f(int (*a)[4]);
Note that this form is not prone to "sizeof" mistakes. But what about efficiency in this case? I know that GCC generates the same code (link). Is that always so? And what about further optimization opportunities in this case?

If you write
void f(int a[4]);
that has exactly the same meaning to the compiler as if you wrote
void f(int *a);
This is why Linus has the opinion that he does. The [4] looks like it defines the expected size of the array, but it doesn't. Mismatches between what the code looks like it means and what it actually means are very bad when you're trying to maintain a large and complicated program.
(In general I advise people not to assume that Linus' opinions are correct. In this case I agree with him, but I wouldn't have put it so angrily.)
Since C99, there is a variation that does mean what it looks like it means:
void f(int a[static 4]);
That is, all callers of f are required to supply a pointer to an array of at least four ints; if they don't, the program has undefined behavior. This can help the optimizer, at least in principle (e.g. maybe it means the loop over a[i] inside f can be vectorized).
Your alternative construct
void f(int (*a)[4]);
gives the parameter a a different type ('pointer to array of 4 int' rather than 'pointer to int'). The array-notation equivalent of this type is
void f(int a[][4]);
Written that way, it should be immediately clear that that declaration is appropriate when the argument to f is a two-dimensional array whose inner size is 4, but not otherwise.
sizeof issues are another can of worms; my recommendation is to avoid needing to use sizeof on function arguments at almost any cost. Do not contort the parameter list of a function to make sizeof come out "right" inside the function; that makes it harder to call the function correctly, and you probably call the function a lot more times than you implement it.

Unless it is the operand of the sizeof or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element in the array.
When you pass an array expression as an argument to a function:
int arr[100];
...
foo( arr );
what the function actually receives is a pointer to the first element of the array, not a copy of the array. The behavior is exactly the same as if you had written
foo( &arr[0] );
There's a rule that function parameters of type T a[N] or T a[] are "adjusted" to T *a, so if your function declaration is
void foo( int a[100] )
it will be interpreted as though you wrote
void foo( int *a )
There are a couple of significant consequences of this:
Arrays are implicitly passed "by reference" to functions, so changes to the array contents in the function are reflected in the caller (unlike literally every other type);
You can't use sizeof to determine how many elements are in the passed array because there's no way to get that information from a pointer. If your function needs to know the physical size of the array in order to use it properly, then you must pass that length as a separate parameter1.
In my own code, I do not use array-style declarations in function parameter lists - what the function receives is a pointer, so I use pointer-style declarations. I can see the argument for using array-style declarations, mostly as a matter of documentation (this function is expecting an array of this size), but I think it's valuable to reinforce the pointer-ness of the parameter.
Note that you have the same problem with pointers to arrays - if I call
foo( &arr );
then the prototype for foo needs to be
void foo( int (*a)[100] );
But that's also the same prototype as if I had called it as
void bar[10][100];
foo( bar );
Just like you cannot know whether the parameter a points to a single int or the first in a sequence of ints, you can't know whether bar points to a single 100-element array, or to the first in a sequence of 100-element arrays.
This is why the gets function was deprecated in after C99 and removed from the standard library in C2011 - there's no way to tell it the size of the target buffer, so it will happily write input past the end of the array and clobber whatever follows. That's why it was such a popular malware exploit.

Related

Why can arrays be assigned directly?

Consider this code snippet:
void foo(int a[], int b[]){
static_assert(sizeof(a) == sizeof(int*));
static_assert(sizeof(b) == sizeof(int*));
b = a;
printf("%d", b[1]);
assert(a == b); // This also works!
}
int a[3] = {[1] = 2}, b[1];
foo(a, b);
Output (no compilation error):
2
I can't get the point why b = a is valid. Even though arrays may decay to pointers, shouldn't they decay to const pointers (T * const)?
They can't.
Arrays cannot be assigned to. There are no arrays in the foo function. The syntax int a[] in a function parameter list means to declare that a has type "pointer to int". The behaviour is exactly the same as if the code were void foo(int *a, int *b). (C11 6.7.6.3/7)
It is valid to assign one pointer to another. The result is that both pointers point to the same location.
Even though arrays may decay to pointers, shouldn't they decay to const pointers (T * const)?
The pointer that results from array "decay" is an rvalue. The const qualifier is only meaningful for lvalues (C11 6.7.3/4). (The term "decay" refers to conversion of the argument, not the adjustment of the parameter).
Quoting C11, chapter §6.7.6.3, Function declarators (including prototypes)
A declaration of a parameter as ‘‘array of type’’ shall be adjusted to ‘‘qualified pointer to
type’’, where the type qualifiers (if any) are those specified within the [ and ] of the
array type derivation. [...]
So, a and b are actually pointers, not arrays.
There's no assignment to any array type happennning here, hence there's no problem with the code.
Yes, it would have made sense for array parameters declared with [] to be adjusted to const-qualified pointers. However, const did not exist when this behavior was established.
When the C language was being developed, it made sense to pass an array by passing its address, or, more specifically, the address of the first element. You certainly did not want to copy the entire array to pass it. Passing the address was an easy way to make the array known to the called function. (The semantics for the reference types we see in C++ had not been invented yet.) To make that easy for programmers, so that they could write foo(ArrayA, ArrayB) instead of foo(&Array[0], &ArrayB[0]), the mechanism of converting an array to a pointer to its first element was invented. (Per M.M. and The Development of the C Language by Dennis M. Ritchie, this notation for parameters already existed in C’s predecessor language, B.)
That is fine, you have hidden the conversion. But that is only where the function is called. In the called routine, the programmer who is thinking about passing an array is going to write void foo(int ArrayA[], int ArrayB[]). But since we are actually passing pointers, not arrays, these need to be changed to int *ArrayA and int *ArrayB. So the notion that parameters declared as arrays are automatically adjusted to pointers was created.
As you observe, this leaves the programmer able to assign values to the parameters, which changes the apparent base address of the array. It would have made sense for a parameter declared as int ArrayA[] to be adjusted to int * const ArrayA, so that the value of the parameter ArrayA could not be changed. Then it would act more like an array, whose address also cannot be changed, so this better fits the goal of pretending to pass arrays even though we are passing addresses.
However, at the time, const did not exist, so this was not possible, and nobody thought of inventing const at that time (or at least did work on it enough to get it adopted into the language).
Now there is a large amount of source code in the world that works with the non-const adjustment. Changing the specification of the C language now would cause problems with the existing code.

How many ways are there to pass char array to function in C?

foo(char *s)
foo(char *s[ ])
foo(char s[ ])
What is the difference in all these ?
Is there any way in which I will be able to modify the elements of the array which is passed as argument, just as we pass int or float using & and value of actual arguments gets modified?
It is not possible in C to pass an array by value. Each working solution that you listed (which unfortunately excludes #2) will not only let you, but force you to modify the original array.
Because of argument decay, foo(char* s) and foo(char s[]) are exactly equivalent to one another. In both cases, you pass the array with its name:
char array[4];
foo(array); // regardless of whether foo accepts a char* or a char[]
The array, in both cases, is converted into a pointer to its first element.
The pointer-to-array solution is less common. It needs to be prototyped this way (notice the parentheses around *s):
void foo(char (*s)[]);
Without the parentheses, you're asking for an array of char pointers.
In this case, to invoke the function, you need to pass the address of the array:
foo(&array);
You also need to dereference the pointer from foo each time you want to access an array element:
void foo(char (*s)[])
{
char c = (*s)[3];
}
Just like that, it's not especially convenient. However, it is the only form that allows you to specify an array length, which you may find useful. It's one of my personal favourites.
void foo(char (*s)[4]);
The compiler will then warn you if the array you try to pass does not have exactly 4 characters. Additionally, sizeof still works as expected. (The obvious downside is that the array must have the exact number of elements.)
Scenario: You're calling the function with an array as argument.
In that case,
foo(char *s)
and
foo(char s[])
are considered equivalent. They both expect to be called with a char array.
OTOH, foo(char *s[ ]), is different, as it takes the address of a char * array.
foo(char *s) is pretty straight-forward, it is just a pointer, which could potentially be a pointer to the first element of an array.
foo(char s[]) is a declaration of an array of char, but because of a (stupid) rule in the C standard, this gets translated to a pointer to char. Similarly, if you declare any array size, you still get a pointer to char.
Mentioned rule can be found in C11 6.7.6.3 Function declarators:
A declaration of a parameter as ‘‘array of type’’ shall be adjusted to
‘‘qualified pointer to type’’
It would perhaps have been much better if this strange, confusing "would-be-array" syntax was banned entirely, as it is 100% superfluous and fills no other purpose but to confuse beginners who try to join the "C club". But then C was never a rational language...
Some will say that the the rationale is that you should never be allowed to pass arrays by value in C, for performance reasons. Well, tough luck, because you still can, despite this rule.
You can "hack the C standard" and pass an array by value:
typedef struct
{
char array[10];
} by_val_t;
foo (by_val_t arr); // bad practice but possible
Now there's really never a reason why you would want to pass an array by value, this is very bad program design and doesn't make any sense. I merely included this example here to prove that the rule in 6.7.6.3 completely lacks any rationale.
You could pass an array pointer as a parameter to a function. This is where things turn advanced, as this is quite obscure and has very limited practical use:
foo(char(*s)[])
If you specify an array size here, you actually get a bit of added type safety because compilers tend to warn about casts between array pointers of different size.
But note that array pointers are mainly just there to make the C language more consistent. There's very few reasons why you would ever want to pass an array pointer to a function (here is one somewhat complex example where I apparently found a valid use for it).
foo(char *s), foo(char *s[ ]), foo(char s[ ]) , what is the difference in all these ?
In these foo(char *s) and foo(char s[ ]) both will expect char array as argument.
Exception remaining this -foo(char *s[ ]) this will expect a array of pointers to chars as argument.

Why is the syntax "int (*)[*]" necessary in C?

Just was looking something up in the ISO/IEC9899 When I stumbled on this:
6.7.6 Type names
[...]
Semantics
2
In several contexts, it is necessary to specify a type. This is accomplished using a type
name, which is syntactically a declaration for a function or an object of that type that
omits the identifier.128)
3 EXAMPLE The constructions
(a) int
(b) int *
(c) int *[3]
(d) int (*)[3]
(e) int (*)[*]
(f) int *()
(g) int (*)(void)
(h) int (*const [])(unsigned int, ...)
name respectively the types (a) int, (b) pointer to int, (c) array of three pointers to int, (d) pointer to an
array of three ints, (e) pointer to a variable length array of an unspecified number of ints, (f) function
with no parameter specification returning a pointer to int, (g) pointer to function with no parameters
returning an int, and (h) array of an unspecified number of constant pointers to functions, each with one
parameter that has type unsigned int and an unspecified number of other parameters, returning an
int.
What most confused me was:
(e) pointer to a variable length array of an unspecified number of ints
The others I can understand more or less. But what is the use of a pointer to a VLA of unspecified number of 'ints'?
And is there even a need for compiler's to support the syntax of
int foo[*];
?
EDIT for clarification
This Question primaly aims on "Is it even neccessary to support this for a compiler?".
Whilest this post ANSI-C grammar - array declarations like [*] et alii clearly improved my knowledge. There is still no answer for: Why does the compiler need to know if the parameter of the prototype just is a address containing unknown size. as with simply doing int foo[] or it will be unspecified size?
So is this realy neccessary to be supported?
And if not so, why the standard even is implementing this semantic?
Why does the compiler need to know if the parameter of the prototype just is a address containing unknown size. as with simply doing int foo[] or it will be unspecified size?
The compiler doesn't need to "know" anything, it's a tool.
The difference between int (*)[*] and int[] is about the same as between int (*)[5] and int[]. If you agree that the latter pair is not interchangeable, then the former isn't either.
In pre-C99, the way to specify an array of unknown number of T elements is T[]. This is an incomplete type, which means you cannot have an array of T[]. There is no T[][]. Inside a function declarator, T[] means the same as T*. OTOH T[*] is a variable-length array, which is different from an array of unknown number of elements. You can have an array of variable-size arrays, i.e. there is T[*][*]. The syntax you are asking about is necessary to support this variable-size-array type. Luckily you are not asking why we need different types, because the answer would be really long-winded, but here's my stab at it.
The purpose of types is two-fold. First, types are needed for object code generation (things like a++ typically generate different object code, depending on the type of a). Second, types are needed for type-checking (things like a++ may be allowed or not depending on the type of a).
The [*] types are only allowed in function declarators that are not parts of function definitions. So code generation and is not relevant here. This leaves us with type checking. Indeed,
int foo(int, int (*)[*]);
int bar(int, int (*)[5]);
int main ()
{
int a;
int aa[5];
int aaa[5][5];
foo(1, &a); // incorrect, `&a` is `int*`, `int*` and `int (*)[*]` are different
bar(1, &a); // incorrect, `&a` is `int*`, `int*` and `int (*)[5]` are different
foo(5, aa); // incorrect, `aa` is `int*` (!), `int*` and `int (*)[*]` are different
bar(5, aa); // incorrect, `aa` is `int*` (!), `int*` and `int (*)[5]` are different
foo(5, &aa); // correct
bar(5, &aa); // correct
foo(5, aaa); // correct
bar(5, aaa); // correct
}
If we are agree on which calls to bar are correct and which are not, we must agree also on calls to foo.
The only remaining question is, why int foo(int m, int (*)[m]); is not enough for this purpose? It probably is, but the C language does not force the programmer to name formal parameters in function declarators where parameter names are not needed. [*] allows this small freedom in case of VLAs.
I am going to answer your question strictly as asked:
This Question primaly aims on "Is it even neccessary to support this for a compiler?"
For a C99 compiler, yes: it is part of the standard so a C99-conforming compiler must support it. The question of what int foo[*]; is useful for is quite orthogonal to the question of whether it must be supported. All compilers claiming to conform to C99 that I tested supported it (but I am not sure what it is useful for, either).
For a C11 compiler, good news! Variable-Length Arrays have been made a “conditional feature”. You can implement a C11-compliant without Variable-Length Arrays as long as it defines __STDC_NO_VLA__:
6.10.8.3 Conditional feature macros
…
__STDC_NO_VLA__ The integer constant 1, intended to indicate that the implementation does not support variable length arrays or variably modified types.
If I pass an array with more than one dimension to a function, and if the function parameters used to express the number of elements in a given dimension of the array come after the array parameter itself, the [*] syntax may be used. In the case of an array with more than two dimensions, and if the array parameter, again, precedes the element count parameters, this syntax must be used, as array decay only ever occurs once. After all, you can't very well use int (*)[][] or int [][][] because the standard requires that in int [A][B] and int [A][B][C][D], only A may be omitted due to the array decaying to a pointer. If you use pointer notation in the function parameter, you're allowed to use int (*)[], but this makes very little sense to me, especially since:
sizeof ptr[0] and sizeof *ptr are both illegal -- how should the compiler determine the size of an array that has an indeterminate element count? Instead you must find it at runtime using N * sizeof **ptr or sizeof(int (*)[N]). This also means that any arithmetic operations on ptr, such as the usage of ++ptr, are illegal since they rely upon the size information, which cannot be calculated. Type casts may be used to get around this, but it is easier just to use a local variable with the proper type information. Then again, why not just use the [*] syntax and include the proper type information from the start?
sizeof ptr[0][0] is illegal, but sizeof (*ptr)[0] is not -- array indexing is still performed even when simply getting info like size, so it is like writing sizeof (*(ptr + 0))[0], which is illegal because you cannot apply arithmetic operations to an incomplete type as previously mentioned.
Someone who has never encountered this issue before might think [] can be replaced by *, yielding int ** instead of int (*)[], which is incorrect because that sub-array has not decayed. Array decay only occurs once.
I noted that the [*] syntax is unnecessary if the parameters used as element counts came first, which is true, but when is the last time anybody saw any of the following?
void foo (int a, int b, int c, int arr[a][b][c]);
void bar (int a, int b, int c, int arr[][b][c]);
void baz (int a, int b, int c, int (*arr)[b][c]);
So to answer your question:
if a function is able to operate upon multidimensional arrays of various lengths (or a pointer to a 1-D array),
and
the parameters denoting element count are listed after the array parameter itself,
the [*] syntax may be required. I actually encourage usage of [*] since [] comes with problems when size information is required.
In C99 it is possible to declare arrays using variable dimensions, providing the variable has a ( positive integer ) value at the time the declaration is made. It turns out that this carries over to the declaration of arrays in function parameters as well, which can be particularly useful for multi-dimensional arrays.
For example, in the prototype:
int arrayFunction( int nRows, int nCols, double x[ nRows ],
double y[ nRows ][ nCols ] );
the variable dimension on x is informative to the human but not necessary for the computer, since we could have declared it as x[ ]. However the nCols dimension on y is very useful, because otherwise the function would have to be written for arrays with pre-determined row sizes, and now we can write a function that will work for arrays with any row length.
For two array types to be compatible, both must have compatible element types, and if both size specifiers are present and are integer constant expressions, then both sizes must have the same value. A VLA is always compatible with another array type if they both have the same element type. If the two array types are used in a context that requires them to be compatible, it is undefined behavior if the dimension sizes are unequal at run time
It might be useful if you want to work with "jagged" arrays, when the size of the rows of that matrix, while unknown at compile-time, will be initialized in run-time and the remain the same during the whole execution time. But to make sure you will stay in bounds for each row, you will have to store actual sizes of that array somehow, separately for each row if want it "jagged", because sizeof operator will not work properly for run-time initialized arrays (it will return the size of the pointer at best, since it's a compile-time operator).

Why are C's arrays first dimension ignored by the compiler as a function parameter?

I know that in C if we were to write:
void myFunction(int x[30]) // 30 is ignored by the compiler
void myFunction(int x[][30]) // 30 is not ignored here but if I put say '40' in the first dimension
// it would be ignored.
Why is it that the first dimension is ignored by the compiler?
void myFunction(int x[30])
is equivalent to
void myFunction(int *x)
i.e, when arrays are used as parameters to function then array names are treated by compiler as pointer to first element of array. In this case the length of first dimension is of no use.
This way you must have to pass size of array explicitly to the function.
In the context of a function parameter declaration, both T a[] and T a[N] are interpreted as T *a; that is, all three declare a as a pointer to T. This goes along with the fact that, unless it is the operand of the sizeofor the unary & operator, an expression of type "N-element array of T" will be converted to an expression of type "pointer to T" and its value will be the address of the first element of the array.
It's not that the dimension is being ignored, it's that it's not meaningful in this context.
Since the C function does not check whether an array reference is in bounds, and since it does not allocate any space for it, the dimension has no use there. It only calculates an offset from the pointer (start of the array) and it already knows how to do that (based on the size of int).
When you specify more than one dimension, it needs to know that dimension only so it can calculate the proper offset for an array reference.
It is not ignored/useless according to the language. It may be ignored by the compiler.
If inside myFunction, you write:
... x[29] ...
you get a valid program.
If you write
... x[30] ...
your program has undefined behavior. The compiler may or may not check for this.
The fact that compiler can't always check everything is the price one pays for having a language as close to the machine as C is.

Can I use arrays as a function parameter in C99?

The C99 standard says the following in 6.7.5.3/7:
A declaration of a parameter as ‘‘array of type’’ shall be adjusted to ‘‘qualified pointer to
type’’, where the type qualifiers (if any) are those specified within the [ and ] of the
array type derivation.
Which I understand as:
void foo(int * arr) {} // valid
void foo(int arr[]) {} // invalid
However, gcc 4.7.3 will happily accept both function definitions, even when compiled with gcc -Wall -Werror -std=c99 -pedantic-errors. Since I am not a C expert, I am unsure if maybe I misinterpreted what the standard is saying.
I also noticed that
size_t foo(int arr[]) { return sizeof(arr); }
will always return sizeof(int *) instead of the array size, which firms my belief that int arr[] is handled as int * and gcc is just trying to make me feel more comfortable.
Can someone shed some light on this issue? Just for reference, this question arose from this comment.
Some context:
First of all, remember that when an expression of type "N-element array of T" appears in a context where it isn't the operand of the sizeof or unary & operator, or isn't a string literal being used to initialize another array in a declaration, it will be converted to an expression of type "pointer to T" and its value will be the address of the first element in the array.
That means when you pass an array argument to a function, the function will receive a pointer value as a parameter; the array expression is converted to a pointer type before the function is called.
That's all well and good, but why is arr[] allowed as a pointer declaration? I can't say that this is the reason for sure, but I suspect it's a holdover from the B language, from which C was derived. In fact, pretty much everything hinky or unintuitive about arrays in C is a holdover from B.
B was a "typeless" language; you didn't have different types for floats, integers, text, whatever. Everything was stored as fixed-size words, or "cells", and memory was treated as a linear array of cells. When you declared an array in B, as in
auto arr[10];
the compiler would set aside 10 cells for the array, and then set aside an additional 11th cell that would store an offset to the first element of the array, and that additional cell would be bound to the variable arr. As in C, array indexing in B was computed as *(arr + i); you'd take the value stored in arr, add an offset i, and dereference the result. Ritchie retained most of these semantics, with the huge exception of no longer setting aside storage for the pointer to the first element of the array; instead, that pointer value would be computed from the array expression itself when the code was translated. This is why array expressions are converted to pointer types, why &arr and arr give the same value, if different types (the address of the array and the address of the first element of the array are the same) and why an array expression cannot be the target of an assignment (there's nothing to assign to; no storage has been set aside for a variable independent of the array elements).
Now here's the fun bit; in B, you'd declare a "pointer" as
auto ptr[];
This had the effect of allocating the cell to store the offset to the first element of the array and binding it to ptr, but ptr didn't point anywhere in particular; you could assign it to point to various locations. I suspect that notation was held over for a couple of reasons:
Most of the guys who worked on the initial version of C were familiar with it;
It sort of emphasizes that the parameter represents an array in the caller;
Personally, I would have preferred that Ritchie had used * to designate pointers everywhere, but he didn't (or, alternately, use [] to designate a pointer in all contexts, not just a function parameter declaration). I will normally recommend that everyone use * notation for function parameters instead of [], simply because it more accurately conveys the type of the parameter, but I can understand why people would prefer the second notation.
Both your valid and invalid declarations are internally equivalent, i.e., the compiler converts the latter to the former.
What your function sees is the pointer to the first element of the array.
PS. The alternative would be to push the whole array on the stack, which would be grossly inefficient from both time and space viewpoints.

Resources