Declare an array in your include file omitting the first dimension size:
extern float mvp[][4];
Then define the array following the previous declaration in a translation unit:
float mvp[4][4];
No problem. Until you try to get the size of that array in a file which includes the first declaration. Then you would get:
error: invalid application of 'sizeof' to an incomplete type 'float [][4]'
I understand that arrays decays into pointers to their first element when used as lvalue, that array declarations in function prototypes are actually pointers in disguise but here it's not the case. But the first declaration does not declare a pointer, it declares an "incomplete array type" different from:
extern float (*mvp)[4];
When declaring variables, the compiler just reference a "dummy" base address offset and the associated type that the linker will resolve.
I wonder why this "incomplete array type" – which cannot be incremented like a pointer to array but is also not fully an array since its size cannot be retrieved – would be allowed to exist ?
Why not implicitly convert it to a pointer (just a base address offset) or even better, why not throw an error for omitting the size in the first dimension ?
Quoting this
If expression in an array declarator is omitted, it declares an array of unknown size. Except in function parameter lists (where such arrays are transformed to pointers) and when an initializer is available, such type is an incomplete type (note that VLA of unspecified size, declared with * as the size, is a complete type)
So really, the type is incomplete and waiting to be completed later by a later declaration or tentative definition.
Using extern doesn't make things exist it just used to state that something may exist in a different translation unit. sizeof() can only be used on complete types. This has nothing to do with array pointer decay. extern float (*mvp)[4] is a complete type, it is a pointer to an array of 4 floats. extern float mvp[][4] is incomplete it is a 2D array of floats where one of the dimension is unspecified. These are two very different things. In either case mvp can be used as an array, when using correct syntax, but you can only use sizeof if it can actually determine its size.
Also float mvp[][4] is an array, it's just that its size is indeterminate. What makes it an array is that it's memory is laid out like an array.
It is possible to declare all dimensions of the extern array:
extern float mvp[4][4];
It is just an option to leave the external declaration incomplete and let the definition worry about the dimension. It is useful exactly because the size is not part of its external interface! Should the outermost size change from compilation to another then a translation unit that merely uses the object need not be recompiled.
For this to work, there should probably be a sentinel value that ends the array / a variable that would tell how many elements there are, otherwise it is not very useful.
Why not implicitly convert it to a pointer (just a base address offset) or even better, why not throw an error for omitting the size in the first dimension?
It cannot be converted to a pointer because the declaration is not a definition. It just tells that such an object does exist. The definition of that object exists independent of the external declaration. The actual object that is being declared here is an array, not a pointer.
It is just that in case of arrays the external declaration can declare the outermost dimension or can omit it.
As for the claim that
arrays decays into pointers to their first element when used as lvalue
that is quite wrong. An array expression is an lvalue, and when it decays it is no longer an lvalue - the only case where it stays as an lvalue is as the operand of &.
Related
I have heard many people saying that when we want to pass a 1D array to a function fun the following prototypes are equivalent:
1.
int fun(int a[]);
int fun(int a[10]);
int fun(int *a);
I have even heard people say that the 1st and 2nd one are internally converted to the 3rd one in C. I guess that this is true because doing something as sizeof(a) in the definition of the function declared in 2 gives the size of a pointer in bytes (and not 10*sizeof(int)).
Now that being said, I have seen texts claiming that to pass a 2D array to a function the following are equivalent:
1.
int fun(int a[][10]);
int fun(int (*a)[10]);
And here again I have heard people say that in C the 1st one is internally converted to the second one. If that is true, the following should have been equivalent right?
1.
int fun(int a[][]);
int fun(int (*a)[]);
But unfortunately the first one puts forth a compilation error but the second one does not:
1 | int fun(int a[][]);
| ^
t.c:2:13: note: declaration of ‘a’ as multidimensional array must have bounds for all dimensions except the first
This makes me feel that C is treating a in the first as a multidimensional array each of whose element is an integer array but their type is not complete (int[] namely).
While in the second one, a is just a pointer to an array of integers (with size not specified or incomplete type). And the two are indeed different and one format is not equivalent to the other...
Can anyone guide me in details as to what actually happens in C, in each of these cases?
First, the rules for declarations say that the element type of an array must be complete, per C 2018 6.7.6.2 1. So int a[][] gets a compiler error since the first [] specifies an array whose elements would be int [], which is incomplete.
After the declaration is analyzed, a declaration of a function parameter to be an array is adjusted to be a pointer, per C 2018 6.7.6.3 7.
int (*a)[] is allowed with no error because there is no rule that a pointer must point to a complete type. (If arithmetic is performed on the pointer, then the pointed-to type must be complete, per C 2018 6.5.6 2 and 3.)
The relevant rule is found in C17 6.7.6.3/7:
A declaration of a parameter as ‘‘array of type’’ shall be adjusted to ‘‘qualified pointer to type’’, where the type qualifiers (if any) are those specified within the [ and ] of the array type derivation
This is sometimes informally referred to as "array decay" and is similar to the rule where an array identifier is used in an expression.
I have even heard people say that the 1st and 2nd one are internally converted to the 3rd one in C
Yes that is correct, as per the above quoted rule. All 3 declarations in your first example are 100% equivalent.
Now that being said, I have seen texts claiming that to pass a 2D array to a function the following are equivalent: /--/
And here again I have heard people say that in C the 1st one is internally converted to the second one.
Correct.
int fun(int a[][]); is an invalid declaration because this is actually an array of incomplete array types, where all items of each array is an incomplete type of int. C does not allow declarations of arrays with elements of incomplete type.
C17 6.7.6.2/1
The element type shall not be an incomplete or function type.
We can however leave the outermost dimension with incomplete array type since it gets adjusted to a pointer anyway, making its size irrelevant. But that can't be done for the inner dimensions as per the above rule that elements of arrays must be of a complete type. (We can't have arrays of incomplete struct types either.)
int (*a)[] is valid since it's a pointer to an array of incomplete type. Similarly C allows us to use pointers to incomplete struct types, but we cannot declare objects of an incomplete struct type.
In some programs involving 2d array, written in C, I noted that row size is not mentioned and the compiler is also not throwing any error regarding this. But when I tried this by mentioning the row size but not the column size, the compiler throws an error.
Eg:
int arr[][5]; // correct
int arr[5][]; //compiler throws error
What's the reason?
We can define a 2-D array in C as:
A [][n];
where n is some constant
We must include the number of columns in the array because this specifies the size of each row. The two dimensional array can be viewed as an array of rows.Once the compiler knows the size of a row in the array (which is defined by the value in the second square bracket, n here), it is able to correctly determine the beginning of each row.
In other words,it is needed to compute the relative offset of the item you're actually accessing.
We have offset = (row*colwidth + col)
The offsets are computed by the compiler using the size of the row, which happens to be the number/count of the columns.
6.7.6.2 Array declarators
Constraints
1 In addition to optional type qualifiers and the keyword static, the [ and ] may delimit
an expression or *. If they delimit an expression (which specifies the size of an array), the
expression shall have an integer type. If the expression is a constant expression, it shall
have a value greater than zero. The element type shall not be an incomplete or function
type. The optional type qualifiers and the keyword static shall appear only in a
declaration of a function parameter with an array type, and then only in the outermost
array type derivation.
...
Semantics
...
4 If the size is not present, the array type is an incomplete type...
C 2011 Online Draft
Emphasis added. Given an array declaration
T a[];
the type of a is incomplete - it's "unknown size array of T". However, per the constraint above, T itself must be a complete type. If T is an array type, its size must be known, a la R [N]:
R a[][N]; // a is an unknown-size array of N-element arrays of R
This is why the compiler accepts
int arr[][5];
since, while we don't yet know how many elements will be in arr, we know how big each of those elements will be (5 * sizeof (int)). Note that arr must be given a size before it can actually be used. The converse,
int arr[5][];
says that arr is a 5-element array of unknown-size arrays of int. We know how many elements we need, but we don't know how big those elements are going to be.
Now, why does C make this restriction? I can't provide an authoritative answer for that, but I suspect it has to do with the relationship between array and pointer operations in C. Remember that the expression a[i] is defined as *(a + i) - that is, take the address a and offset i elements (not bytes!!) from that address and dereference the result. That only works if the size of the element type is known.
It should be possible to model an array of N elements of unknown size, but I suspect that such a model is cumbersome enough that it's more trouble to implement than it's worth.
I'm using a Keil C51 compiler to program a 8051 microcontroller. For some reason my code didn't run - I managed to track down the bug, but I still have difficulties understanding it. Why is the first code wrong, comparing to the other one? It's worth noting that the compiler didn't throw any error, the code just didn't even start on the microcontroller.
Wrong code:
file1.h
extern STRUCT_TYPEDEF array_var[];
file2.c
// Global variable initialization
STRUCT_TYPEDEF array_var[] = some_struct.array2_var;
After changing these to:
file1.h
extern STRUCT_TYPEDEF *array_var;
file2.c
// Global variable initialization
STRUCT_TYPEDEF *array_var = &some_struct.array2_var[0];
it started working.
Also, this portion of code was referenced only in functions like "array_var[0].property = ...", but none of these functions were ever called from the application.
some_struct variable is declared in yet another module.
Why could it behave like that? Is there some difference between [] and * I don't know about?
EDIT1:
It is said that pointers and arrays are different things... but then, how does the "[]" syntax differ from "*"? I thought compiler would just convert it to a pointer in case the square brackets are empty (like it does with the function arguments). I also thought providing an array would result in giving me the address of the first element.
Now, everyone is saying pointers and arrays are different - but I can't find any information about what exactly is different in them. How does compiler see it when I give an array as rvalue instead of a pointer to its first element?
STRUCT_TYPEDEF array_var[] = some_struct.array2_var;
is not a valid way to initialize an array in a declaration. An array initializer must be a brace-enclosed list of initializers, such as
T arr[] = { init1, init2, init3 };
You cannot initialize an array with another array1, nor can you assign one array to another this way:
T foo[] = { /* list of initializers */ }
T bar[] = foo; // not allowed
T bar[N];
...
bar = foo; // also not allowed
If you want to copy the contents of some_struct.array2_var to array_var, you must use a library function like memcpy:
memcpy( array_var, some_struct.array2_var, sizeof some_struct.array2_var );
You must also declare array_var with a size; you can't leave it incomplete if you want to use it. If you know ahead of time how big it needs to be, it's easy:
STRUCT_TYPEDEF array_var[SIZE];
...
mempcy( array_var, some_struct.array2_var );
If you don't know ahead of time how big it needs to be, then you'll either have to declare it as a variable-length array (which won't work if it needs to be at file scope or otherwise have static storage duration), or you can declare the memory dynamically:
STRUCT_TYPEDEF *array_var = NULL;
...
array_var = malloc( sizeof some_struct.array2_var );
if ( array_var )
{
memcpy( array_var, some_struct.array2_var, sizeof some_struct.array2_var );
}
This all assumes that some_struct.array2_var is an array declared like
STRUCT_TYPEDEF array2_var[SIZE];
If it's also just a pointer, then you'll have to keep track of the array size some other way.
EDIT
If you want array_var to simply point to the first element of some_struct.array2_var, you'd do the following:
STRUCT_TYPEDEF *array_var = some_struct.array2_var;
Except when it is the operand of the sizeof or unary & operators, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array. The code above is exactly equivalent to
STRUCT_TYPEDEF *array_var = &some_struct.array2_var[0];
Except for string literals, such as char message[] = "Hello";; the string literal "Hello" is an array expression, but the language treats it as a special case.
This ...
extern STRUCT_TYPEDEF array_var[];
... is a declaration of an array of unknown size and with external linkage. Because the size is not specified, that declaration leaves array_var with an "incomplete type"; that prevents some uses of that variable until and unless its type is completed by another declaration in the same translation unit. For example, it cannot be the operand of the sizeof operator.
This ...
STRUCT_TYPEDEF array_var[] = some_struct.array2_var;
... claims to be a definition of array_var, on account of providing an initializer. The initializer is not of the correct form for a variable of array type, however. An array initializer consists of a comma delimited sequence of one or more array elements, inside mandatory curly braces ({}). Just as C does not support whole-array assignment, it does not support array values as array initializers.
In contrast, this ...
extern STRUCT_TYPEDEF *array_var;
... is a declaration of a pointer with external linkage. It has a complete type. And this ...
STRUCT_TYPEDEF *array_var = &some_struct.array2_var[0];
... is a valid definition of the variable, with a suitable initializer. Because array values decay to pointers in this context, as in most (but not all) others, it is equivalent to this:
STRUCT_TYPEDEF *array_var = some_struct.array2_var;
In comparing this to the original code, it is essential to understand that although they have a close association, pointers and arrays are completely separate types.
Also, this portion of code was referenced only in functions like "array_var[0].property = ...", but none of these functions were ever called from the application.
Whether the variable is ever accessed normally has no bearing on whether the compiler is willing to accept the code.
Is there some difference between [] and * I don't know about?
Apparently so, since the question seems to assume that there is no difference.
The two forms can be used interchangeably for declaring function parameters. In that context, both declare the parameter as a pointer. This is a notational and code clarity convenience made possible by the fact that array values appearing as function arguments decay to pointers. You can never actually pass an array as a function argument -- when an argument designates an array, a pointer is passed instead.
As described above, however, the two forms are not equivalent for declaring an ordinary variable.
I know that in C if we were to write:
void myFunction(int x[30]) // 30 is ignored by the compiler
void myFunction(int x[][30]) // 30 is not ignored here but if I put say '40' in the first dimension
// it would be ignored.
Why is it that the first dimension is ignored by the compiler?
void myFunction(int x[30])
is equivalent to
void myFunction(int *x)
i.e, when arrays are used as parameters to function then array names are treated by compiler as pointer to first element of array. In this case the length of first dimension is of no use.
This way you must have to pass size of array explicitly to the function.
In the context of a function parameter declaration, both T a[] and T a[N] are interpreted as T *a; that is, all three declare a as a pointer to T. This goes along with the fact that, unless it is the operand of the sizeofor the unary & operator, an expression of type "N-element array of T" will be converted to an expression of type "pointer to T" and its value will be the address of the first element of the array.
It's not that the dimension is being ignored, it's that it's not meaningful in this context.
Since the C function does not check whether an array reference is in bounds, and since it does not allocate any space for it, the dimension has no use there. It only calculates an offset from the pointer (start of the array) and it already knows how to do that (based on the size of int).
When you specify more than one dimension, it needs to know that dimension only so it can calculate the proper offset for an array reference.
It is not ignored/useless according to the language. It may be ignored by the compiler.
If inside myFunction, you write:
... x[29] ...
you get a valid program.
If you write
... x[30] ...
your program has undefined behavior. The compiler may or may not check for this.
The fact that compiler can't always check everything is the price one pays for having a language as close to the machine as C is.
The C99 standard says the following in 6.7.5.3/7:
A declaration of a parameter as ‘‘array of type’’ shall be adjusted to ‘‘qualified pointer to
type’’, where the type qualifiers (if any) are those specified within the [ and ] of the
array type derivation.
Which I understand as:
void foo(int * arr) {} // valid
void foo(int arr[]) {} // invalid
However, gcc 4.7.3 will happily accept both function definitions, even when compiled with gcc -Wall -Werror -std=c99 -pedantic-errors. Since I am not a C expert, I am unsure if maybe I misinterpreted what the standard is saying.
I also noticed that
size_t foo(int arr[]) { return sizeof(arr); }
will always return sizeof(int *) instead of the array size, which firms my belief that int arr[] is handled as int * and gcc is just trying to make me feel more comfortable.
Can someone shed some light on this issue? Just for reference, this question arose from this comment.
Some context:
First of all, remember that when an expression of type "N-element array of T" appears in a context where it isn't the operand of the sizeof or unary & operator, or isn't a string literal being used to initialize another array in a declaration, it will be converted to an expression of type "pointer to T" and its value will be the address of the first element in the array.
That means when you pass an array argument to a function, the function will receive a pointer value as a parameter; the array expression is converted to a pointer type before the function is called.
That's all well and good, but why is arr[] allowed as a pointer declaration? I can't say that this is the reason for sure, but I suspect it's a holdover from the B language, from which C was derived. In fact, pretty much everything hinky or unintuitive about arrays in C is a holdover from B.
B was a "typeless" language; you didn't have different types for floats, integers, text, whatever. Everything was stored as fixed-size words, or "cells", and memory was treated as a linear array of cells. When you declared an array in B, as in
auto arr[10];
the compiler would set aside 10 cells for the array, and then set aside an additional 11th cell that would store an offset to the first element of the array, and that additional cell would be bound to the variable arr. As in C, array indexing in B was computed as *(arr + i); you'd take the value stored in arr, add an offset i, and dereference the result. Ritchie retained most of these semantics, with the huge exception of no longer setting aside storage for the pointer to the first element of the array; instead, that pointer value would be computed from the array expression itself when the code was translated. This is why array expressions are converted to pointer types, why &arr and arr give the same value, if different types (the address of the array and the address of the first element of the array are the same) and why an array expression cannot be the target of an assignment (there's nothing to assign to; no storage has been set aside for a variable independent of the array elements).
Now here's the fun bit; in B, you'd declare a "pointer" as
auto ptr[];
This had the effect of allocating the cell to store the offset to the first element of the array and binding it to ptr, but ptr didn't point anywhere in particular; you could assign it to point to various locations. I suspect that notation was held over for a couple of reasons:
Most of the guys who worked on the initial version of C were familiar with it;
It sort of emphasizes that the parameter represents an array in the caller;
Personally, I would have preferred that Ritchie had used * to designate pointers everywhere, but he didn't (or, alternately, use [] to designate a pointer in all contexts, not just a function parameter declaration). I will normally recommend that everyone use * notation for function parameters instead of [], simply because it more accurately conveys the type of the parameter, but I can understand why people would prefer the second notation.
Both your valid and invalid declarations are internally equivalent, i.e., the compiler converts the latter to the former.
What your function sees is the pointer to the first element of the array.
PS. The alternative would be to push the whole array on the stack, which would be grossly inefficient from both time and space viewpoints.