I was curious about sizeof(arrayName + offset). It gives me the sizeof(pointer). Though array name is actually a constant pointer in C, sizeof(arrayName) gives the size in bytes of an array. So I guess the compiler treat (arrayName+offset) as pure pointer even for sizeof() and hence only exception while using the array name would be sizeof(arrayName).
Is this behavior sizeof(arrayName + offset) well defined by the compiler? I am using MinGW 32 bit compiler.
Also is there any way we can know the size of partial array other than by using simple math like (sizeof(arrayName) - offset*sizeof(arrayName[0]))?
Is sizeof(arrayName) is not an inconsistent language construct in C/C++? For all other purpose, arrayName is treated as an address. And when we pass array to a function, this behavior may lead to bugs and beginners always have issue with this.
An array name is converted to a pointer to its first element in all but three cases:
The operand of the address-of operator &
The operand of the sizeof operator.
The operand of the _Alignof operator.
This is detailed in section 6.3.2.1 of the C standard:
3 Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string
literal used to initialize an array, an expression that has
type "array of type" is converted to an expression with type "pointer
to type" that points to the initial element of the array
object and is not an lvalue. If the array object has
register storage class, the behavior is undefined.
In the case of sizeof(arrayName + offset), the operand of sizeof is the expression arrayName + offset. The type of this expression is a pointer type, since arrayName is converted to a pointer in order to perform pointer arithmetic with offset. So the sizeof expression evaluates to the size of a pointer.
In the case of sizeof(arrayName), the operand of sizeof is an array, so it evaluated to the size of the array in bytes.
Both of these behaviors are well defined by the C standard.
Related
(Self-answered Q&A - this matter keeps popping up)
I assume that the reader is aware of how pointer arithmetic works.
int arr[3] = {1,2,3};
int* ptr = arr;
...
*(ptr + i) = value;
Teachers/C books keep telling me I shouldn't use *(ptr + i) like in the above example, because "pointers support array style indexing" and I should be using ptr[i] = value; instead. No argument there - much easier to read.
But looking through the C standard, I find nothing called "array style indexing". In fact, the operator [] is not expecting either operand to be an array, but instead a pointer or an integer!
6.5.2.1 Array subscripting
Constraints
One of the expressions shall have type ‘‘pointer to complete object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.
Why does the array subscripting operator not expect an array? Is the standard wrong? Is my teacher/C book confused?
You should indeed be using ptr[i] over *(ptr + i) for readability reasons. But apart from that, the [] operator is, strictly speaking, actually never used with an array operand.
Arrays, when used in an expression, always "decay" into a pointer to the first element (with some exceptions). C17 6.3.2.1/3, emphasis mine:
Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue.
Meaning that whenever you type arr[i], the operand arr gets replaced by a pointer to the first element inside that array. This is informally referred to as "array decaying". More info here: What is array decaying?
So whenever you use the [] operator, you use it on a pointer. Always.
The C standard says that this operator is guaranteed to be equivalent to the pointer arithmetic (C17 6.5.2.1/2):
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).
So whenever we type arr[i], it actually gets silently replaced by *(arr+i). Where arr is still a pointer to the first element.
And this is why the description you quoted tells you that either operand could be a pointer and the other an integer. Because obviously it doesn't matter if we type *(arr+i) or *(i+arr) - that's equivalent code.
Which in turn allows us to write obfuscated "joke" code like i[arr], which is actually valid C and fully equivalent to arr[i]. But don't write such code in real applications.
(Self-answered Q&A - this matter keeps popping up)
I assume that the reader is aware of how pointer arithmetic works.
int arr[3] = {1,2,3};
int* ptr = arr;
...
*(ptr + i) = value;
Teachers/C books keep telling me I shouldn't use *(ptr + i) like in the above example, because "pointers support array style indexing" and I should be using ptr[i] = value; instead. No argument there - much easier to read.
But looking through the C standard, I find nothing called "array style indexing". In fact, the operator [] is not expecting either operand to be an array, but instead a pointer or an integer!
6.5.2.1 Array subscripting
Constraints
One of the expressions shall have type ‘‘pointer to complete object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.
Why does the array subscripting operator not expect an array? Is the standard wrong? Is my teacher/C book confused?
You should indeed be using ptr[i] over *(ptr + i) for readability reasons. But apart from that, the [] operator is, strictly speaking, actually never used with an array operand.
Arrays, when used in an expression, always "decay" into a pointer to the first element (with some exceptions). C17 6.3.2.1/3, emphasis mine:
Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue.
Meaning that whenever you type arr[i], the operand arr gets replaced by a pointer to the first element inside that array. This is informally referred to as "array decaying". More info here: What is array decaying?
So whenever you use the [] operator, you use it on a pointer. Always.
The C standard says that this operator is guaranteed to be equivalent to the pointer arithmetic (C17 6.5.2.1/2):
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).
So whenever we type arr[i], it actually gets silently replaced by *(arr+i). Where arr is still a pointer to the first element.
And this is why the description you quoted tells you that either operand could be a pointer and the other an integer. Because obviously it doesn't matter if we type *(arr+i) or *(i+arr) - that's equivalent code.
Which in turn allows us to write obfuscated "joke" code like i[arr], which is actually valid C and fully equivalent to arr[i]. But don't write such code in real applications.
I was recently reading through some source code and read the following at the beginning of a function:
char buffer[ 1000 ];
char *pointer;
pointer = &buffer[0];
I guess I don't understand this. Why not just write:
pointer = buffer;
Is there some secret meaning I am missing here?
Some people may find it easier to understand depending on the occasion.
Someone might say that when you use pointer = buffer; you intend to use the pointer as the buffer,
while if you use pointer = &buffer[0]; you intend to use the pointer as a pointer or an item of the buffer.
It just happens that those 2 cases point to the same address.
Both expressions give the same result value. So in your given case it is mainly a question of preferred style.
But there is a difference if you use the expresions for example in a function call. A static code analysis tool should complain about
memcpy(&buffer[0], src, 2 * sizeof(buffer[0]));
because you state that you are writing two elements into one array element. But the tool should not complain about
memcpy(&buffer, src, 2 * sizeof(buffer[0]));
or
memcpy(buffer, src, 2 * sizeof(buffer[0]));
because you now say that you want to write into the complete array.
Relevant parts in the standard:
6.3.2.1 Lvalues, arrays, and function designators
3 Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type “array of type” is converted to an expression with type “pointer to type” that points to the initial element of the array object and is not an lvalue. [...]
6.5.3.2 Address and indirection operators
Semantics 3 The unary & operator yields the address of its operand. If the operand has type “type”, the result has type “pointer to type”. [...] Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary* that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a+ operator. Otherwise, the result is a pointer to the object or function designated by its operand.
They're absolutely the same. I also prefer the simpler version
pointer = array; // implicit conversion from array to address of its 1st element
pointer = &array[0]; // explicitly set pointer to the address of array's 1st element
In some cases, depending on how you're going to use the pointer, the explicit version may be more self-documented.
From C Standards#6.5.2.1
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))..
So,
&buffer[0]
can be written as
&(*(buffer + 0))
Note that the operator & is used to get the address and the operator * is used for dereferencing. These operators cancel the effect of each other when used one after another. So, it is equivalent to
(buffer + 0)
which is nothing but
buffer
So, &buffer[0] is equivalent to buffer.
This question is inspired by answers to this question.
Following code has potential for undefined behaviour:
uint64_t arr[1]; // Uninitialized
if(arr[0] == 0) {
C standard specifies that uninitialized variable with automatic storage duration has indeterminate value, which is either unspecified or trap representation. It also specifies that uintN_t types have no padding bits, and size and range of values are well defined; so trap representation for uint64_t is not possible.
So I conclude that uninitialized value itself is not undefined behavior. What about reading it?
6.3.2.1 Lvalues, arrays, and function designators
...
Except when it is the operand of the sizeof operator, the _Alignof operator, the unary & operator, the ++ operator, the -- operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue); this is called lvalue
conversion. ... -- irrelevant text removed --
... If
the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
Except when it is the operand of the sizeof operator, the _Alignof operator, or the
unary & operator, or is a string literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points
to the initial element of the array object and is not an lvalue. If the array object has
register storage class, the behavior is undefined.
Question: Does subscripting array count as taking the address of an object?
Following text seems to imply that subscripting array requires conversion to a pointer, which seems impossible to do without taking address:
6.5.2.1 Array subscripting
Constraints
One of the expressions shall have type ‘‘pointer to complete object type’’, the other
expression shall have integer type, and the result has type ‘‘type’’.
Semantics
A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).
This makes §6.3.2.1 paragraph 3 seem weird. How could array have register storage class at all, if subscription requires conversion to a pointer?
Yes, array subscripting counts as taking the address, as per the part you quoted in 6.5.2.1. The expression E1 must have its address taken.
Therefore the special case of UB in 6.3.2.1 does not apply to array indexing. If array indices are used, it is not relevant if the array could be stored with register storage duration or not (a variable having its address taken cannot use register storage duration).
You are correct in assuming that reading an uninitialized stdint.h type with indeterminate value, which has its address taken, does not invoke undefined behavior (guaranteed by C11 7.20.1.1), but merely unspecified behavior. The value could be anything and it can be non-deterministic between several reads, but it cannot be a trap.
"Reading an uninitalized variable is always UB" is a wide-spread but incorrect myth.
Further information with normative sources in this answer.
I know only one case
when arrays passed to a function they decay into a pointer.Can anybody elaborate all the cases in which arrays decay to pointers.
C 2011 6.3.2.1 3:
Except when it is the operand of the sizeof operator,… or the unary & operator, or is a string literal used to initialize an array, an expression that has type
“array of type” is converted to an expression with type “pointer to
type” that points to the initial element of the array object and is not an lvalue.
In other words, arrays usually decay to pointers. The standard lists the cases when they do not.
One might think that arrays act as arrays when you use them with subscripts, such as a[3]. However, what happens here is actually:
a is converted to a pointer.
The subscript operator acts on the pointer and the subscript to produce an lvalue designating the object. (In particular, a[3] is evaluated as *((a)+(3)). That is, a is converted to a pointer, 3 is added to the pointer, and the ***** operator is applied.)
Note: The C 2011 text includes “the _Alignof operator.” This was corrected in the C 2018 version of the standard, and I have elided it from the quote above. The operand of _Alignof is always a type; you cannot actually apply it to an object. So it was a mistake for the C 2011 standard to include it in 6.3.2.1 3.