Is pointer = &buffer[0] redundant? - c

I was recently reading through some source code and read the following at the beginning of a function:
char buffer[ 1000 ];
char *pointer;
pointer = &buffer[0];
I guess I don't understand this. Why not just write:
pointer = buffer;
Is there some secret meaning I am missing here?

Some people may find it easier to understand depending on the occasion.
Someone might say that when you use pointer = buffer; you intend to use the pointer as the buffer,
while if you use pointer = &buffer[0]; you intend to use the pointer as a pointer or an item of the buffer.
It just happens that those 2 cases point to the same address.

Both expressions give the same result value. So in your given case it is mainly a question of preferred style.
But there is a difference if you use the expresions for example in a function call. A static code analysis tool should complain about
memcpy(&buffer[0], src, 2 * sizeof(buffer[0]));
because you state that you are writing two elements into one array element. But the tool should not complain about
memcpy(&buffer, src, 2 * sizeof(buffer[0]));
or
memcpy(buffer, src, 2 * sizeof(buffer[0]));
because you now say that you want to write into the complete array.
Relevant parts in the standard:
6.3.2.1 Lvalues, arrays, and function designators
3 Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type “array of type” is converted to an expression with type “pointer to type” that points to the initial element of the array object and is not an lvalue. [...]
6.5.3.2 Address and indirection operators
Semantics 3 The unary & operator yields the address of its operand. If the operand has type “type”, the result has type “pointer to type”. [...] Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary* that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a+ operator. Otherwise, the result is a pointer to the object or function designated by its operand.

They're absolutely the same. I also prefer the simpler version
pointer = array; // implicit conversion from array to address of its 1st element
pointer = &array[0]; // explicitly set pointer to the address of array's 1st element
In some cases, depending on how you're going to use the pointer, the explicit version may be more self-documented.

From C Standards#6.5.2.1
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))..
So,
&buffer[0]
can be written as
&(*(buffer + 0))
Note that the operator & is used to get the address and the operator * is used for dereferencing. These operators cancel the effect of each other when used one after another. So, it is equivalent to
(buffer + 0)
which is nothing but
buffer
So, &buffer[0] is equivalent to buffer.

Related

Dereferencing an uninitialized pointer to pass into sizeof()

In a recent post, I realised that when allocating a structure variable, passing the dereferenced pointer deemed a better practice in contrast to passing the structure type to sizeof(). This is basically because the former is more resilient to code changes than the latter.
Which suggests, that in the following code method 1 is deemed a better practice than method 2.
typedef struct X_ {
int x;
int y;
int z;
} X;
int main() {
X* obj1 = malloc(sizeof(*obj1)); // ----> method 1
X* obj2 = malloc(sizeof(X)); // ----> method 2
return 0;
}
The question is, how valid is it to dereference obj1 in method 1 ? Inside malloc, obj1 is still unconstructed/uninitialized memory which suggests that dereferencing of obj1 happening inside sizeof() shouldn't be valid.
Let me make a guess what makes method 1 valid. Is this because since sizeof() is a compile time operation dereferencing obj1 gets translated into method 2 by the compiler?
Could someone please elaborate the technical validity of this by referring to the relevant C standards?
The sizeof expression where the operand is not a variable length array is a non evaluated expression. So this expression
sizeof(*obj1)
is well-formed.
From the C Standard (6.5.3.4 The sizeof and alignof operators)
2 The sizeof operator yields the size (in bytes) of its operand, which
may be an expression or the parenthesized name of a type. The size is
determined from the type of the operand. The result is an integer.
If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the
result is an integer constant
As for your question relative to the best method of specifying an argument of malloc
X* obj1 = malloc(sizeof(*obj1)); // ----> method 1
X* obj2 = malloc(sizeof(X)); // ----> method 2
then if the type X is visible in the point of using malloc like in this case
X* obj1 = malloc(sizeof(*obj1)); // ----> method 1
then this approach is preferable.
However if the type is not visible like for example
obj1 = malloc(sizeof(*obj1)); // ----> method 1
then I prefer explicitly to specify the type like
obj1 = malloc(sizeof( X ));
Otherwise for example this code snippet
p = malloc( *p );
q = malloc( *q );
does not give enough information for the reader of the code. And the reader will need to scroll the source code forward and backward to find the declarations of p and q to determine their types.
The question is, how valid is it to dereference obj1 in method 1?
It's 100% valid. You could use it without parenthesis though, sizeof *obj1.
From N1570 ISO/IEC 9899:201x §6.5.3.4 The sizeof and _Alignof operators
2 -
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
In fact one can say it's a preferable method, the reason being that if you change, for some reason, the type of the object it's easy to forget to also change the sizeof argument, using the derefenced pointer will avoid this potencial silent error.

Is dereferencing a pointer equal to getting it's first index? [duplicate]

(Self-answered Q&A - this matter keeps popping up)
I assume that the reader is aware of how pointer arithmetic works.
int arr[3] = {1,2,3};
int* ptr = arr;
...
*(ptr + i) = value;
Teachers/C books keep telling me I shouldn't use *(ptr + i) like in the above example, because "pointers support array style indexing" and I should be using ptr[i] = value; instead. No argument there - much easier to read.
But looking through the C standard, I find nothing called "array style indexing". In fact, the operator [] is not expecting either operand to be an array, but instead a pointer or an integer!
6.5.2.1 Array subscripting
Constraints
One of the expressions shall have type ‘‘pointer to complete object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.
Why does the array subscripting operator not expect an array? Is the standard wrong? Is my teacher/C book confused?
You should indeed be using ptr[i] over *(ptr + i) for readability reasons. But apart from that, the [] operator is, strictly speaking, actually never used with an array operand.
Arrays, when used in an expression, always "decay" into a pointer to the first element (with some exceptions). C17 6.3.2.1/3, emphasis mine:
Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue.
Meaning that whenever you type arr[i], the operand arr gets replaced by a pointer to the first element inside that array. This is informally referred to as "array decaying". More info here: What is array decaying?
So whenever you use the [] operator, you use it on a pointer. Always.
The C standard says that this operator is guaranteed to be equivalent to the pointer arithmetic (C17 6.5.2.1/2):
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).
So whenever we type arr[i], it actually gets silently replaced by *(arr+i). Where arr is still a pointer to the first element.
And this is why the description you quoted tells you that either operand could be a pointer and the other an integer. Because obviously it doesn't matter if we type *(arr+i) or *(i+arr) - that's equivalent code.
Which in turn allows us to write obfuscated "joke" code like i[arr], which is actually valid C and fully equivalent to arr[i]. But don't write such code in real applications.

Do pointers support "array style indexing"?

(Self-answered Q&A - this matter keeps popping up)
I assume that the reader is aware of how pointer arithmetic works.
int arr[3] = {1,2,3};
int* ptr = arr;
...
*(ptr + i) = value;
Teachers/C books keep telling me I shouldn't use *(ptr + i) like in the above example, because "pointers support array style indexing" and I should be using ptr[i] = value; instead. No argument there - much easier to read.
But looking through the C standard, I find nothing called "array style indexing". In fact, the operator [] is not expecting either operand to be an array, but instead a pointer or an integer!
6.5.2.1 Array subscripting
Constraints
One of the expressions shall have type ‘‘pointer to complete object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.
Why does the array subscripting operator not expect an array? Is the standard wrong? Is my teacher/C book confused?
You should indeed be using ptr[i] over *(ptr + i) for readability reasons. But apart from that, the [] operator is, strictly speaking, actually never used with an array operand.
Arrays, when used in an expression, always "decay" into a pointer to the first element (with some exceptions). C17 6.3.2.1/3, emphasis mine:
Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue.
Meaning that whenever you type arr[i], the operand arr gets replaced by a pointer to the first element inside that array. This is informally referred to as "array decaying". More info here: What is array decaying?
So whenever you use the [] operator, you use it on a pointer. Always.
The C standard says that this operator is guaranteed to be equivalent to the pointer arithmetic (C17 6.5.2.1/2):
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).
So whenever we type arr[i], it actually gets silently replaced by *(arr+i). Where arr is still a pointer to the first element.
And this is why the description you quoted tells you that either operand could be a pointer and the other an integer. Because obviously it doesn't matter if we type *(arr+i) or *(i+arr) - that's equivalent code.
Which in turn allows us to write obfuscated "joke" code like i[arr], which is actually valid C and fully equivalent to arr[i]. But don't write such code in real applications.

sizeof operator on array name when offset is added

I was curious about sizeof(arrayName + offset). It gives me the sizeof(pointer). Though array name is actually a constant pointer in C, sizeof(arrayName) gives the size in bytes of an array. So I guess the compiler treat (arrayName+offset) as pure pointer even for sizeof() and hence only exception while using the array name would be sizeof(arrayName).
Is this behavior sizeof(arrayName + offset) well defined by the compiler? I am using MinGW 32 bit compiler.
Also is there any way we can know the size of partial array other than by using simple math like (sizeof(arrayName) - offset*sizeof(arrayName[0]))?
Is sizeof(arrayName) is not an inconsistent language construct in C/C++? For all other purpose, arrayName is treated as an address. And when we pass array to a function, this behavior may lead to bugs and beginners always have issue with this.
An array name is converted to a pointer to its first element in all but three cases:
The operand of the address-of operator &
The operand of the sizeof operator.
The operand of the _Alignof operator.
This is detailed in section 6.3.2.1 of the C standard:
3 Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string
literal used to initialize an array, an expression that has
type "array of type" is converted to an expression with type "pointer
to type" that points to the initial element of the array
object and is not an lvalue. If the array object has
register storage class, the behavior is undefined.
In the case of sizeof(arrayName + offset), the operand of sizeof is the expression arrayName + offset. The type of this expression is a pointer type, since arrayName is converted to a pointer in order to perform pointer arithmetic with offset. So the sizeof expression evaluates to the size of a pointer.
In the case of sizeof(arrayName), the operand of sizeof is an array, so it evaluated to the size of the array in bytes.
Both of these behaviors are well defined by the C standard.

Is the operand of `sizeof` evaluated with a VLA?

An argument in the comments section of this answer prompted me to ask this question.
In the following code, bar points to a variable length array, so the sizeof is determined at runtime instead of compile time.
int foo = 100;
double (*bar)[foo];
The argument was about whether or not using sizeof evaluates its operand when the operand is a variable length array, making sizeof(*bar) undefined behavior when bar is not initialized.
Is it undefined behavior to use sizeof(*bar) because I'm dereferencing an uninitialized pointer? Is the operand of sizeof actually evaluated when the type is a variable length array, or does it just determine its type (how sizeof usually works)?
Edit: Everyone seems to be quoting this passage from the C11 draft. Does anyone know if this is the wording in the official standard?
Yes, this causes undefined behaviour.
In N1570 6.5.3.4/2 we have:
The sizeof operator yields the size (in bytes) of its operand, which may be an
expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
Now we have the question: is the type of *bar a variable length array type?
Since bar is declared as pointer to VLA, dereferencing it should yield a VLA. (But I do not see concrete text specifying whether or not it does).
Note: Further discussion could be had here, perhaps it could be argued that *bar has type double[100] which is not a VLA.
Supposing we agree that the type of *bar is actually a VLA type, then in sizeof *bar, the expression *bar is evaluated.
bar is indeterminate at this point. Now looking at 6.3.2.1/1:
if an lvalue does not designate an object when it is evaluated, the
behavior is undefined
Since bar does not point to an object (by virtue of being indeterminate), evaluating *bar causes undefined behaviour.
Two other answers have already quoted N1570 6.5.3.4p2:
The sizeof operator yields the size (in bytes) of its operand, which
may be an expression or the parenthesized name of a type. The size is
determined from the type of the operand. The result is an integer. If
the type of the operand is a variable length array type, the operand
is evaluated; otherwise, the operand is not evaluated and the result
is an integer constant.
According to that paragraph from the standard, yes, the operand of sizeof is evaluated.
I'm going to argue that this is a defect in the standard; something is evaluated at run time, but the operand is not.
Let's consider a simpler example:
int len = 100;
double vla[len];
printf("sizeof vla = %zu\n", sizeof vla);
According to the standard, sizeof vla evaluates the expression vla. But what does that mean?
In most contexts, evaluating an array expression yields the address of the initial element -- but the sizeof operator is an explicit exception to that. We might assume that evaluating vla means accessing the values of its elements, which has undefined behavior since those elements have not been initialized. But there is no other context in which evaluation of an array expression accesses the values of its elements, and absolutely no need to do so in this case. (Correction: If a string literal is used to initialize an array object, the values of the elements are evaluated.)
When the declaration of vla is executed, the compiler will create some anonymous metadata to hold the length of the array (it has to, since assigning a new value to len after vla is defined and allocated doesn't change the length of vla). All that has to be done to determine sizeof vla is to multiply that stored value by sizeof (double) (or just to retrieve the stored value if it stores the size in bytes).
sizeof can also be applied to a parenthesized type name:
int len = 100;
printf("sizeof (double[len]) = %zu\n", sizeof (double[len]));
According to the standard, the sizeof expression evaluates the type. What does that mean? Clearly it has to evaluate the current value of len. Another example:
size_t func(void);
printf("sizeof (double[func()]) = %zu\n", sizeof (double[func()]));
Here the type name includes a function call. Evaluating the sizeof expression must call the function.
But in all of these cases, there's no actual need to evaluate the elements of the array object (if there is one), and no point in doing so.
sizeof applied to anything other than a VLA can be evaluated at compile time. The difference when sizeof is applied to a VLA (either an object or a type) is that something has to be evaluated at run time. But the thing that has to be evaluated is not the operand of sizeof; it's just whatever is needed to determine the size of the operand, which is never the operand itself.
The standard says that the operand of sizeof is evaluated if that operand is of variable length array type. That's a defect in the standard.
Getting back to the example in the question:
int foo = 100;
double (*bar)[foo] = NULL;
printf("sizeof *bar = %zu\n", sizeof *bar);
I've added an initialization to NULL to make it even clearer that dereferencing bar has undefined behavior.
*bar is of type double[foo], which is a VLA type. In principle, *bar is evaluated, which would have undefined behavior since bar is uninitialized. But again, there is no need to dereference bar. The compiler will generate some code when it processes the type double[foo], including saving the value of foo (or foo * sizeof (double)) in an anonymous variable. All it has to do to evaluate sizeof *bar is to retrieve the value of that anonymous variable. And if the standard were updated to define the semantics of sizeof consistently, it would be clear that evaluating sizeof *bar is well defined and yields 100 * sizeof (double) without having to dereference bar.
Indeed the Standard seems to imply that behaviour be undefined:
re-quoting N1570 6.5.3.4/2:
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
I think the wording from the Standard is confusing: the operand is evaluated does not mean that *bar will be evaluated. Evaluating *bar does not in any way help compute its size. sizeof(*bar) does need to be computed at run time, but the code generated for this has no need to dereference bar, it will more likely retrieve the size information from a hidden variable holding the result of the size computation at the time of bar's instantiation.

Resources