Why sizeof(array_name) is the size of the array and sizeof(&a[0]) is the size of pointer even though, when an array name is passed to a function, what is passed is the location of the beginning of the array.
In most expressions, when an array is used, it is automatically converted to a pointer to its first element.
There is a special rule for sizeof: When an array is an operand of sizeof, it is not automatically converted to a pointer. Therefore, sizeof array_name gives the size of the array, not the size of a pointer.
This rule also applies to the unary & operator : &array_name is the address of the array, not the address of a pointer.
Also, if an array is a string literal used to initialize an array, it is not converted to a pointer. The string literal is used to initialize the array.
The rule for this is C 2018 6.3.2.1 3:
Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
Array parameters to functions don't exist. They turn into pointers (the innermost index does, if the array is multidimensional). It's just syntactic sugar inherited from the B language. void f(int *X) == void f(int X[]);
(Same for function params to functions. void g(void X(void)) == void g(void (*X)(void))).
Related
I want to understand where exactly in code an array gets converted to a pointer. For example:
void foo( int* pData, int len){}
int main(void){
char data[] = "Hello world";
foo( (int*)data, sizeof(data));
return 0;
}
I know that an array decays to a pointer to the first element if it is assigned to a pointer. However in the example above, I typecast the array data to int* first before passing it in function and assigning it to a pointer. Does the conversion/decay to pointer occurs at the typecasting point ? If so, isn't it true to say that the typecasting operation has the same effect as using the assignment operator with respect to array conversion/decay? Also would sizeof(data) be equal to the address length or array length?
Thank you for help!
The conversion of arrays to pointers in C is spelled out in section 6.3.2.1p3 of the C standard:
Except when it is the operand of the sizeof operator, the
_Alignof operator, or the unary & operator, or is a string
literal used to initialize an array, an expression that has
type "array of type" is converted to an expression with type "pointer
to type" that points to the initial element of the array
object and is not an lvalue. If the array object has
register storage class, the behavior is undefined.
This means that the array is immediately converted to a pointer anywhere it is used except for the three cases listed above.
So applying the above to (int*)data, data is the operand of the typecast operator. Since this operator is not one of the ones listed above, data in this expression is converted from char [12] to char *, then the cast converts the char * to an int *.
Also, as mentioned above, the array is not converted when passed to sizeof. This means sizeof(data) evaluates to the size of char [12] which is 12.
Outwith the declaration, you can consider data to be equivalent to a pointer to the start of the array, but with the following exceptions:
sizeof(data) will give you the size of the array in bytes.
_Alignof(data) will be the same as _Alignof(*data) (and both give you the alignment of the type of the array elements)
&data has the same value as just data, but have a type of char (*)[sizeof(data] so arithmetic will use the full size of the array. Eg &data+1 will give you the address after the whole array rather than the address of the second element. See How come an array's address is equal to its value in C?
you can't change it's value (ie in this sense it is equivalent to a char *const).
When you call your function, you are taking the value of data (ie the address of the start of the array) and typecasting to an int *. The resulting int * behaves like any other int * and the exceptions don't apply, this is the 'decay'.
Suppose if int A[5] is declared then variable A will be pointer to the A[0]. Which means A is just a pointer and A stores the base address of array A[5] . Then how come sizeof(A) gives answer as 20
A isn't a pointer to the first element. It is an int[5], or a five element array of ints (the size is part of the type). It can decay into a pointer to the first element when you do stuff like pass it to a function taking a pointer.
Contrary to what you might have heard, arrays are not the same as pointers.
In most contexts, the name of an array will decay into a pointer to the first element, such as being passed to a function or as the subject of pointer arithmetic.
One of the cases where this decay does not happen is when the array is the subject of the sizeof operator. In that case the operator returns the full size of the array in bytes.
This is detailed in section 6.3.2.1p3 of the C standard:
Except when it is the operand of the sizeof operator, the _Alignof
operator, or the unary & operator, or is a string literal used to
initialize an array, an expression that has type "array of type" is
converted to an expression with type "pointer to type" that points
to the initial element of the array object and is not an lvalue. If
the array object has register storage class, the behavior is
undefined.
Section 6.5.3.4p4, which details the sizeof operator, additionally states:
When sizeof is applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is
1. When applied to an operand that has array type, the result is the total number of bytes in the array. When applied to an operand
that has structure or union type, the result is the total number of
bytes in such an object, including internal and trailing padding.
If you had some something like this:
int A[5];
int *B;
B = A;
printf("sizeof(B)=%zu\n", sizeof(B));
You would get the size of an int * on your system, most likely either 4 or 8.
How does char s[] act as a pointer while it looks like an array declaration?
#include<stdio.h>
void test(char s[]);
int main()
{
char s[10]="test";
char a[]=s;
test(s);
char p[]=s;
return 0;
}
void test(char s[])
{
printf(s);
}
In the context of a function parameter declaration (and only in that context), T a[N] and T a[] are the same as T *a; they declare a as a pointer to T, not an array of T.
Chapter and verse:
6.7.6.3 Function declarators (including prototypes)
...
7 A declaration of a parameter as ‘‘array of type’’ shall be adjusted to ‘‘qualified pointer to
type’’, where the type qualifiers (if any) are those specified within the [ and ] of the
array type derivation. If the keyword static also appears within the [ and ] of the
array type derivation, then for each call to the function, the value of the corresponding
actual argument shall provide access to the first element of an array with at least as many
elements as specified by the size expression.
Anywhere else, T a[]; declares a as an array with an as-yet-unspecified size. At this point the declaration is incomplete, and a cannot be used anywhere until a size has been specified, either by specifying it explicitly:
T a[N];
or using an initializer:
T a[] = { /* comma-separated list of initial values */ }
Chapter and verse, again:
6.7.6.2 Array declarators
...
3 If, in the declaration ‘‘T D1’’, D1 has one of the forms: D[ type-qualifier-listopt assignment-expressionopt ]
D[ static type-qualifier-listopt assignment-expression ]
D[ type-qualifier-list static assignment-expression ]
D[ type-qualifier-listopt * ]
and the type specified for ident in the declaration ‘‘T D’’ is ‘‘derived-declarator-type-list
T’’, then the type specified for ident is ‘‘derived-declarator-type-list array of T’’.142)
(See 6.7.6.3 for the meaning of the optional type qualifiers and the keyword static.)
4 If the size is not present, the array type is an incomplete type. If the size is * instead of
being an expression, the array type is a variable length array type of unspecified size,
which can only be used in declarations or type names with function prototype scope;143)
such arrays are nonetheless complete types. If the size is an integer constant expression
and the element type has a known constant size, the array type is not a variable length
array type; otherwise, the array type is a variable length array type. (Variable length
arrays are a conditional feature that implementations need not support; see 6.10.8.3.)
142) When several ‘‘array of’’ specifications are adjacent, a multidimensional array is declared.
143) Thus, * can be used only in function declarations that are not definitions (see 6.7.6.3).
...
6.7.9 Initialization
...
22 If an array of unknown size is initialized, its size is determined by the largest indexed
element with an explicit initializer. The array type is completed at the end of its
initializer list.
So, why are arrays as function parameters treated differently than arrays as regular objects? This is why:
6.3.2.1 Lvalues, arrays, and function designators
...
3 Except when it is the operand of the sizeof operator, the _Alignof operator, or the
unary & operator, or is a string literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points
to the initial element of the array object and is not an lvalue. If the array object has
register storage class, the behavior is undefined.
Under most circumstances, an expression of type "N-element array of T" is converted ("decays") to an expression of type "pointer to T". If you pass an array expression as an argument to a function, like so:
int foo[10];
...
bar( foo );
what the function bar actually receives is a pointer to int, not a 10-element array of int, so the prototype for bar can be written
void bar( int *f );
"But why..." I hear you starting to ask. I'm getting to it, really.
C was derived from an earlier language called B (go figure), and in B the following things were true:
Pointer objects were declared using empty bracket notation - auto p[];. This syntax was kept for pointer function parameter declarations in C.
Array declarations set aside memory not only for the array elements, but also for an explicit pointer to the first element, which was bound to the array variable (i.e., auto v[10] would set aside 11 memory cells, 10 for the array contents, and the remaining one to store an offset to the first element).
The array subscript operation a[i] was defined as *(a + i) -- you'd offset i elements from the base address stored in the variable a and dereference the result (B was a "typeless" language, so all scalar objects were the same size).
For various reasons, Ritchie got rid of the explicit pointer to the first element of the array, but kept the definition of the subscript operation. So in C, when the compiler sees an array expression that isn't the operand of the sizeof or unary & operators, it replaces that expression with a pointer expression that evaluates to the address of the first element of the array; that way the *(a + i) operation still works the way it did in B, without actually setting aside any storage for that pointer value. However, it means arrays lose their "array-ness" in most circumstances, which will bite you in the ass if you aren't careful.
You can't assign to arrays, only copy to them or initialize them with a valid initializer (and another array is not a valid initializer).
And when you declare a function like
void test(char s[]);
it's actually the same as declaring it
void test(char *s);
What happens when you call a function taking an "array" is that the array decays to a pointer to the first element, and it's that pointer that is passed to the function.
So the call
test(s);
is the same as
test(&s[0]);
Regarding the function declaration, declaring a function taking an array of arrays is not the same as declaring a function taking a pointer to a pointer. See e.g. this old answer of mine as an explanation of why.
So if you want a function taking an array of arrays, like
void func2(char a[][X]);
it's not the same as
void func2(char **a);
Instead it's the same as
void func2(char (*a)[X]);
For more "dimensions" it doesn't change anything, e.g.
void func3(char a[][X][Y]);
is the same as
void func3(char (*a)[X][Y]);
char a[] is an array and not a pointer, the inittialization is invalid.
s is an array, too, but in the context of an expression it evaluates to a pointer to its first element.
char a[] is only valid if you have a following initializer list. It has to be an array of characters. As a special case for strings, C allows you to type an array of characters as "str", rather than {'s','t','r','\0'}. Either initializer is fine.
In the code char a[]=s;, s is an array type and not a valid initializer, so the code will not compile.
void test(char s[]) is another special case, because arrays passed as parameters always get replaced by the compiler with a pointer to the first element. Don't confuse this with array initialization, even though the syntax is similar.
For eg. I have an array of structs 'a' as below:
struct mystruct{
int b
int num;
};
struct bigger_struct {
struct my_struct a[10];
}
struct bigger_struct *some_var;
i know that the name of an array when used as a value implicitly refers to the address of the first element of the array.(Which is how the array subscript operator works at-least)
Can i know do the other way around i.e
if i do:
some_var->a->b, it should be equivalent to some_var->a[0]->b, am i right? I have tested this and it seems to work , but is this semantically 100% correct?
Is some_var->a->b equivalent to some_var->a[0]->b?
No, it is equivalent to some_var->a[0].b.
The exact specification of the array-to-pointer conversion is actually quite straightforward:
Except when it is the operand of the sizeof operator or the unary & operator, or is a
string literal used to initialize an array, an expression that has type array of type is converted to an expression with type pointer to type that points to the initial element of the array object and is not an lvalue (C99 6.3.2.1).
some_var->a has the type my_struct[10], which is an array type, and since it is not the operand of the sizeof or unary & operator and is not a string literal, it is converted to a pointer to the initial element of the array.
Assuming that you have allocated memory for some_var, its safe to do some_var->a->b (as arrays decay into pointer).
Yes, _var->a[0]->b and _var->a->b are equivalent
Because a[0] and a is representing the base address of the structure.
Why if I increment an array string I get an error while if I pass the value to a function I can make it work? A string array object is not already a pointer to the array elements?
e.g.
void foo(char *a){
printf("%c", *a);
a++; // this works
printf("%c", *a);
}
int main(){
char a[] = "ciao";
a++; // I get the error
foo(a);
return 1;
}
thanks!
Because arrays are not pointers. They may decay into pointers under certain circumstances (such as passing to a function, as you can see in your code) but, while they remain an array, you cannot increment them.
What you can do it create a pointer from the array such as by changing:
foo(a);
to:
foo(&(a[1]));
which will pass an explicit pointer to the second character instead of an implicit pointer to the first character that happens with foo(a);.
Section 6.3.2.1 of C99 ("Lvalues, arrays, and function designators"), paragraph 3 has the definitive reason why you can't do what you're trying to do:
Except when it is the operand of the sizeof operator or the unary & operator, or is a
string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue.
It's that "not an lvalue" that's stopping you. You cannot change it if it's not an lvalue (so named because they typically appear on the left of assignment statements).
The reason you can do in your first function is because of section 6.7.5.3 ("Function declarators"), paragraph 7:
A declaration of a parameter as "array of type" shall be adjusted to "qualified pointer to type"
In other words, the parameter in the function is an lvalue pointer, which can be changed.
The type of your foo's a is a pointer, which you can increment.
The type of your main's a is an array, which you cannot increment.
When you call foo, the address of your array is passed as a new variable of pointer type. You can increment this without the original a being affected.
You can try defining a like this instead:
char* a = "ciao";