array [] vs pointer * - why will the first code fail in C? - c

I'm using a Keil C51 compiler to program a 8051 microcontroller. For some reason my code didn't run - I managed to track down the bug, but I still have difficulties understanding it. Why is the first code wrong, comparing to the other one? It's worth noting that the compiler didn't throw any error, the code just didn't even start on the microcontroller.
Wrong code:
file1.h
extern STRUCT_TYPEDEF array_var[];
file2.c
// Global variable initialization
STRUCT_TYPEDEF array_var[] = some_struct.array2_var;
After changing these to:
file1.h
extern STRUCT_TYPEDEF *array_var;
file2.c
// Global variable initialization
STRUCT_TYPEDEF *array_var = &some_struct.array2_var[0];
it started working.
Also, this portion of code was referenced only in functions like "array_var[0].property = ...", but none of these functions were ever called from the application.
some_struct variable is declared in yet another module.
Why could it behave like that? Is there some difference between [] and * I don't know about?
EDIT1:
It is said that pointers and arrays are different things... but then, how does the "[]" syntax differ from "*"? I thought compiler would just convert it to a pointer in case the square brackets are empty (like it does with the function arguments). I also thought providing an array would result in giving me the address of the first element.
Now, everyone is saying pointers and arrays are different - but I can't find any information about what exactly is different in them. How does compiler see it when I give an array as rvalue instead of a pointer to its first element?

STRUCT_TYPEDEF array_var[] = some_struct.array2_var;
is not a valid way to initialize an array in a declaration. An array initializer must be a brace-enclosed list of initializers, such as
T arr[] = { init1, init2, init3 };
You cannot initialize an array with another array1, nor can you assign one array to another this way:
T foo[] = { /* list of initializers */ }
T bar[] = foo; // not allowed
T bar[N];
...
bar = foo; // also not allowed
If you want to copy the contents of some_struct.array2_var to array_var, you must use a library function like memcpy:
memcpy( array_var, some_struct.array2_var, sizeof some_struct.array2_var );
You must also declare array_var with a size; you can't leave it incomplete if you want to use it. If you know ahead of time how big it needs to be, it's easy:
STRUCT_TYPEDEF array_var[SIZE];
...
mempcy( array_var, some_struct.array2_var );
If you don't know ahead of time how big it needs to be, then you'll either have to declare it as a variable-length array (which won't work if it needs to be at file scope or otherwise have static storage duration), or you can declare the memory dynamically:
STRUCT_TYPEDEF *array_var = NULL;
...
array_var = malloc( sizeof some_struct.array2_var );
if ( array_var )
{
memcpy( array_var, some_struct.array2_var, sizeof some_struct.array2_var );
}
This all assumes that some_struct.array2_var is an array declared like
STRUCT_TYPEDEF array2_var[SIZE];
If it's also just a pointer, then you'll have to keep track of the array size some other way.
EDIT
If you want array_var to simply point to the first element of some_struct.array2_var, you'd do the following:
STRUCT_TYPEDEF *array_var = some_struct.array2_var;
Except when it is the operand of the sizeof or unary & operators, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array. The code above is exactly equivalent to
STRUCT_TYPEDEF *array_var = &some_struct.array2_var[0];
Except for string literals, such as char message[] = "Hello";; the string literal "Hello" is an array expression, but the language treats it as a special case.

This ...
extern STRUCT_TYPEDEF array_var[];
... is a declaration of an array of unknown size and with external linkage. Because the size is not specified, that declaration leaves array_var with an "incomplete type"; that prevents some uses of that variable until and unless its type is completed by another declaration in the same translation unit. For example, it cannot be the operand of the sizeof operator.
This ...
STRUCT_TYPEDEF array_var[] = some_struct.array2_var;
... claims to be a definition of array_var, on account of providing an initializer. The initializer is not of the correct form for a variable of array type, however. An array initializer consists of a comma delimited sequence of one or more array elements, inside mandatory curly braces ({}). Just as C does not support whole-array assignment, it does not support array values as array initializers.
In contrast, this ...
extern STRUCT_TYPEDEF *array_var;
... is a declaration of a pointer with external linkage. It has a complete type. And this ...
STRUCT_TYPEDEF *array_var = &some_struct.array2_var[0];
... is a valid definition of the variable, with a suitable initializer. Because array values decay to pointers in this context, as in most (but not all) others, it is equivalent to this:
STRUCT_TYPEDEF *array_var = some_struct.array2_var;
In comparing this to the original code, it is essential to understand that although they have a close association, pointers and arrays are completely separate types.
Also, this portion of code was referenced only in functions like "array_var[0].property = ...", but none of these functions were ever called from the application.
Whether the variable is ever accessed normally has no bearing on whether the compiler is willing to accept the code.
Is there some difference between [] and * I don't know about?
Apparently so, since the question seems to assume that there is no difference.
The two forms can be used interchangeably for declaring function parameters. In that context, both declare the parameter as a pointer. This is a notational and code clarity convenience made possible by the fact that array values appearing as function arguments decay to pointers. You can never actually pass an array as a function argument -- when an argument designates an array, a pointer is passed instead.
As described above, however, the two forms are not equivalent for declaring an ordinary variable.

Related

C array declaration syntax

Declare an array in your include file omitting the first dimension size:
extern float mvp[][4];
Then define the array following the previous declaration in a translation unit:
float mvp[4][4];
No problem. Until you try to get the size of that array in a file which includes the first declaration. Then you would get:
error: invalid application of 'sizeof' to an incomplete type 'float [][4]'
I understand that arrays decays into pointers to their first element when used as lvalue, that array declarations in function prototypes are actually pointers in disguise but here it's not the case. But the first declaration does not declare a pointer, it declares an "incomplete array type" different from:
extern float (*mvp)[4];
When declaring variables, the compiler just reference a "dummy" base address offset and the associated type that the linker will resolve.
I wonder why this "incomplete array type" – which cannot be incremented like a pointer to array but is also not fully an array since its size cannot be retrieved – would be allowed to exist ?
Why not implicitly convert it to a pointer (just a base address offset) or even better, why not throw an error for omitting the size in the first dimension ?
Quoting this
If expression in an array declarator is omitted, it declares an array of unknown size. Except in function parameter lists (where such arrays are transformed to pointers) and when an initializer is available, such type is an incomplete type (note that VLA of unspecified size, declared with * as the size, is a complete type)
So really, the type is incomplete and waiting to be completed later by a later declaration or tentative definition.
Using extern doesn't make things exist it just used to state that something may exist in a different translation unit. sizeof() can only be used on complete types. This has nothing to do with array pointer decay. extern float (*mvp)[4] is a complete type, it is a pointer to an array of 4 floats. extern float mvp[][4] is incomplete it is a 2D array of floats where one of the dimension is unspecified. These are two very different things. In either case mvp can be used as an array, when using correct syntax, but you can only use sizeof if it can actually determine its size.
Also float mvp[][4] is an array, it's just that its size is indeterminate. What makes it an array is that it's memory is laid out like an array.
It is possible to declare all dimensions of the extern array:
extern float mvp[4][4];
It is just an option to leave the external declaration incomplete and let the definition worry about the dimension. It is useful exactly because the size is not part of its external interface! Should the outermost size change from compilation to another then a translation unit that merely uses the object need not be recompiled.
For this to work, there should probably be a sentinel value that ends the array / a variable that would tell how many elements there are, otherwise it is not very useful.
Why not implicitly convert it to a pointer (just a base address offset) or even better, why not throw an error for omitting the size in the first dimension?
It cannot be converted to a pointer because the declaration is not a definition. It just tells that such an object does exist. The definition of that object exists independent of the external declaration. The actual object that is being declared here is an array, not a pointer.
It is just that in case of arrays the external declaration can declare the outermost dimension or can omit it.
As for the claim that
arrays decays into pointers to their first element when used as lvalue
that is quite wrong. An array expression is an lvalue, and when it decays it is no longer an lvalue - the only case where it stays as an lvalue is as the operand of &.

Length of arbitrary array type in C function

I'd like to replace the following macro with an actual function in C.
#define ARRAY_LENGTH(a) (sizeof(a)/sizeof((a)[0]))
Keep your macro. Replacing it is a mistake. When you pass an array to a function it decays into a pointer and you lose size information.
As others have said, you can't.
When you pass an argument to a function, the value of that expression is copied into a new object.
One problem is functions can't have arrays as arguments. Array declarations in function prototypes are converted to pointer declarations.
Similarly, the expression denoting the array that you're passing will be converted to a pointer to the first element of the array.
Another problem standing in your way is that C has no generic functions. There is no way to provide a function with an "array of T", where T can be any type you like, aside from using a void * parameter and passing size information separately.
Function-like macros as expanded at a different stage, however. They're translated during compilation; imagine copying and pasting the code for the macro everywhere it's mentioned, substituting the arguments, prior to compilation. That's what your compiler does with macros.
For example, when you write printf("%zu\n", ARRAY_LENGTH(foo)); it replaces this with: printf("%zu\n", (sizeof(foo)/sizeof((foo)[0])));.
P.S. sizeof is not a function; it's an operator... Coincidentally, it is one of the few (the others being the &address-of operator and the newly adopted _AlignOf operator) which don't cause the array expression to be converted to a pointer expression.
int arraySz(void **a)
{
return(sizeof(a[])/sizeof(a[][]));
}
However a would have to be pointing to an existing rectangular array, not just be a pointer to a pointer, or return value from malloc().

Can I use arrays as a function parameter in C99?

The C99 standard says the following in 6.7.5.3/7:
A declaration of a parameter as ‘‘array of type’’ shall be adjusted to ‘‘qualified pointer to
type’’, where the type qualifiers (if any) are those specified within the [ and ] of the
array type derivation.
Which I understand as:
void foo(int * arr) {} // valid
void foo(int arr[]) {} // invalid
However, gcc 4.7.3 will happily accept both function definitions, even when compiled with gcc -Wall -Werror -std=c99 -pedantic-errors. Since I am not a C expert, I am unsure if maybe I misinterpreted what the standard is saying.
I also noticed that
size_t foo(int arr[]) { return sizeof(arr); }
will always return sizeof(int *) instead of the array size, which firms my belief that int arr[] is handled as int * and gcc is just trying to make me feel more comfortable.
Can someone shed some light on this issue? Just for reference, this question arose from this comment.
Some context:
First of all, remember that when an expression of type "N-element array of T" appears in a context where it isn't the operand of the sizeof or unary & operator, or isn't a string literal being used to initialize another array in a declaration, it will be converted to an expression of type "pointer to T" and its value will be the address of the first element in the array.
That means when you pass an array argument to a function, the function will receive a pointer value as a parameter; the array expression is converted to a pointer type before the function is called.
That's all well and good, but why is arr[] allowed as a pointer declaration? I can't say that this is the reason for sure, but I suspect it's a holdover from the B language, from which C was derived. In fact, pretty much everything hinky or unintuitive about arrays in C is a holdover from B.
B was a "typeless" language; you didn't have different types for floats, integers, text, whatever. Everything was stored as fixed-size words, or "cells", and memory was treated as a linear array of cells. When you declared an array in B, as in
auto arr[10];
the compiler would set aside 10 cells for the array, and then set aside an additional 11th cell that would store an offset to the first element of the array, and that additional cell would be bound to the variable arr. As in C, array indexing in B was computed as *(arr + i); you'd take the value stored in arr, add an offset i, and dereference the result. Ritchie retained most of these semantics, with the huge exception of no longer setting aside storage for the pointer to the first element of the array; instead, that pointer value would be computed from the array expression itself when the code was translated. This is why array expressions are converted to pointer types, why &arr and arr give the same value, if different types (the address of the array and the address of the first element of the array are the same) and why an array expression cannot be the target of an assignment (there's nothing to assign to; no storage has been set aside for a variable independent of the array elements).
Now here's the fun bit; in B, you'd declare a "pointer" as
auto ptr[];
This had the effect of allocating the cell to store the offset to the first element of the array and binding it to ptr, but ptr didn't point anywhere in particular; you could assign it to point to various locations. I suspect that notation was held over for a couple of reasons:
Most of the guys who worked on the initial version of C were familiar with it;
It sort of emphasizes that the parameter represents an array in the caller;
Personally, I would have preferred that Ritchie had used * to designate pointers everywhere, but he didn't (or, alternately, use [] to designate a pointer in all contexts, not just a function parameter declaration). I will normally recommend that everyone use * notation for function parameters instead of [], simply because it more accurately conveys the type of the parameter, but I can understand why people would prefer the second notation.
Both your valid and invalid declarations are internally equivalent, i.e., the compiler converts the latter to the former.
What your function sees is the pointer to the first element of the array.
PS. The alternative would be to push the whole array on the stack, which would be grossly inefficient from both time and space viewpoints.

For struct variables s1,s2,why can I initialize "s1={25,3.5}",assign s2 as "s1=s2",but then can't use "s1={59,3.14}?

In C we are allowed to assign the value of one structure variable to other if they are of the same type.In accordance with that, in my following program I am allowed to use s1=s2 when both are struct variables of the same type.But why then I am not allowed to use s1={59,3.14} after that?
I know we can't assign a string "Test" to a character array arr other than in the initialization statement because for the string "Test",it decomposes to type char* during assignment and hence there is a type mismatch error.But in my program, {59,3.14} doesn't decompose to any pointer,does it?Why then it is not allowed to be assigned to s1 even though it is of same type,especially since it is allowed during the initialization?What is the different between s2 and {59,3.14} such that one is allowed to be assigned to s1 but the other is not?
#include<stdio.h>
int main(void)
{
struct test1
{
int a;
float b;
} s1= {25,3.5},s2= {38,9.25};
printf("%d,%f\n",s1.a,s1.b);
s1=s2; // Successful
printf("%d,%f\n",s1.a,s1.b);
s1= {59,3.14}; //ERROR:expected expression before '{' token|
printf("%d,%f\n",s1.a,s1.b);
}
The C grammar strictly distinguishes between assignment and initialization.
For initialization it is clear what the type on the right side ought to be: the type of the object that is declared. So the initializer notation is unambiguous; { a, b, c } are the fields in declaration order.
For assignment things are less clear. An assignment expression X = Y first evaluates both subexpressions (X and Y), looks at their types and then does the necessary conversions, if possible, from the type of Y to the type of X. An expression of the form { a, b, c } has no type, so the mechanism doesn't work.
The construct that yoones uses in his answer is yet another animal, called compound literal. This is a way of creating an unnamed auxiliary object of the specified type. You may use it in initializations or any other place where you'd want to use a temporary object. The storage class and lifetime of a compound literal is deduced from the context where it is used. If it is in function scope, it is automatic (on the "stack") as would be a normal variable that would be declared in the same block, only that it doesn't have a name. If it is used in file scope (intialization of a "global" variable, e.g) is has static storage duration and a lifetime that is the whole duration of the program execution.
You need to cast it this way: s1 = (struct test1){59, 3.14}; to let the compiler know that it should consider your {...} of type struct test1.
Put in an other way, your data gathered between brackets doesn't have a type, that's why you need to specify one using a cast.
Edit:
The compiler needs to know the expected type for each struct's fields. This is needed to know the right number of bytes for each argument, for padding, etc. Otherwise it could as well copy the value 59 (which is meant to be an int) as a char since it's a value that fits in one byte.

What does "int *a = (int[2]){0, 2};" exactly do?

I was very surprised when I saw this notation. What does it do and what kind of C notion is it?
This is a compound literal as defined in section 6.5.2.5 of the C99 standard.
It's not part of the C++ language, so it's not surprising that C++ compilers don't compile it. (or Java or Ada compilers for that matter)
The value of the compound literal is that of an unnamed object initialized by the
initializer list. If the compound literal occurs outside the body of a function, the object
has static storage duration; otherwise, it has automatic storage duration associated with
the enclosing block.
So no, it won't destroy the stack. The compiler allocates storage for the object.
Parenthesis are put around the type and it is then followed by an initializer list - it's not a cast, as a bare initialiser list has no meaning in C99 syntax; instead, it is a postfix operator applied to a type which yields an object of the given type. You are not creating { 0, 3 } and casting it to an array, you're initialising an int[2] with the values 0 and 3.
As to why it's used, I can't see a good reason for it in your single line, although it might be that a could be reassigned to point at some other array, and so it's a shorter way of doing the first two lines of:
int default_a[] = { 0, 2 };
int *a = default_a;
if (some_test) a = get_another_array();
I've found it useful for passing temporary unions to functions
// fills an array of unions with a value
kin_array_fill ( array, ( kin_variant_t ) { .ref = value } )
This is a c99 construct, called a compound literal.
From the May 2005 committee draft section 6.5.2.5:
A postfix expression that consists of
a parenthesized type name followed by
a brace- enclosed list of initializers
is a compound literal. It provides an
unnamed object whose value is given by
the initializer list.
...
EXAMPLE 1 The file scope definition
int *p = (int []){2, 4};
initializes p
to point to the first element of an
array of two ints, the first having
the value two and the second, four.
The expressions in this compound
literal are required to be constant.
The unnamed object has static storage
duration.
Allocates, on the stack, space for [an array of] two ints.
Populates [the array of] the two ints with the values 0 and 2, respectively.
Declares a local variable of type int* and assigns to that variable the address of [the array of] the two ints.
(int[2]) tells the compiler that the following expression should be casted to int[2]. This is required since {0, 2} can be casted to different types, like long[2]. Cast occurs at compile time - not runtime.
The entire expression creates an array in memory and sets a to point to this array.
{0, 2} is the notation for an array consisting of 0 and 2.
(int[2]) casts it to an array (don't know why).
int * a = assigns it to the int pointer a.

Resources