Do pointers support "array style indexing"? - c

(Self-answered Q&A - this matter keeps popping up)
I assume that the reader is aware of how pointer arithmetic works.
int arr[3] = {1,2,3};
int* ptr = arr;
...
*(ptr + i) = value;
Teachers/C books keep telling me I shouldn't use *(ptr + i) like in the above example, because "pointers support array style indexing" and I should be using ptr[i] = value; instead. No argument there - much easier to read.
But looking through the C standard, I find nothing called "array style indexing". In fact, the operator [] is not expecting either operand to be an array, but instead a pointer or an integer!
6.5.2.1 Array subscripting
Constraints
One of the expressions shall have type ‘‘pointer to complete object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.
Why does the array subscripting operator not expect an array? Is the standard wrong? Is my teacher/C book confused?

You should indeed be using ptr[i] over *(ptr + i) for readability reasons. But apart from that, the [] operator is, strictly speaking, actually never used with an array operand.
Arrays, when used in an expression, always "decay" into a pointer to the first element (with some exceptions). C17 6.3.2.1/3, emphasis mine:
Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue.
Meaning that whenever you type arr[i], the operand arr gets replaced by a pointer to the first element inside that array. This is informally referred to as "array decaying". More info here: What is array decaying?
So whenever you use the [] operator, you use it on a pointer. Always.
The C standard says that this operator is guaranteed to be equivalent to the pointer arithmetic (C17 6.5.2.1/2):
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).
So whenever we type arr[i], it actually gets silently replaced by *(arr+i). Where arr is still a pointer to the first element.
And this is why the description you quoted tells you that either operand could be a pointer and the other an integer. Because obviously it doesn't matter if we type *(arr+i) or *(i+arr) - that's equivalent code.
Which in turn allows us to write obfuscated "joke" code like i[arr], which is actually valid C and fully equivalent to arr[i]. But don't write such code in real applications.

Related

Relationship matrix-pointer

I was wondering why in C this is possible:
int MATRICE[20][20];
int *p; p = MATRICE[19]; is equal to p = &(MATRICE[19][0]);
I tried to interpretate it in this way: I consider the label "MATRICE" as a constant pointer and like an array of pointers, so when it comes to the 20th pointer (MATRICE[19]) it points at the same thing that MATRICE[19][0] points too.
Is my idea correct?
Arrays used in expressions with rare exceptions are converted to pointers to their first elements.
From the C Standard (6.3.2.1 Lvalues, arrays, and function designators)
3 Except when it is the operand of the sizeof operator or the unary &
operator, or is a string literal used to initialize an array, an
expression that has type ‘‘array of type’’ is converted to an
expression with type ‘‘pointer to type’’ that points to the initial
element of the array object and is not an lvalue. If the array object
has register storage class, the behavior is undefined.
This expression
MATRICE[19]
yields a one-dimensional array of the type int[20] that is implicitly converted to pointer to its first element of the type int * when used as an initializer (as a right side hand expression in the assignment) in this code snippet
int *p; p = MATRICE[19];

Is dereferencing a pointer equal to getting it's first index? [duplicate]

(Self-answered Q&A - this matter keeps popping up)
I assume that the reader is aware of how pointer arithmetic works.
int arr[3] = {1,2,3};
int* ptr = arr;
...
*(ptr + i) = value;
Teachers/C books keep telling me I shouldn't use *(ptr + i) like in the above example, because "pointers support array style indexing" and I should be using ptr[i] = value; instead. No argument there - much easier to read.
But looking through the C standard, I find nothing called "array style indexing". In fact, the operator [] is not expecting either operand to be an array, but instead a pointer or an integer!
6.5.2.1 Array subscripting
Constraints
One of the expressions shall have type ‘‘pointer to complete object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.
Why does the array subscripting operator not expect an array? Is the standard wrong? Is my teacher/C book confused?
You should indeed be using ptr[i] over *(ptr + i) for readability reasons. But apart from that, the [] operator is, strictly speaking, actually never used with an array operand.
Arrays, when used in an expression, always "decay" into a pointer to the first element (with some exceptions). C17 6.3.2.1/3, emphasis mine:
Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue.
Meaning that whenever you type arr[i], the operand arr gets replaced by a pointer to the first element inside that array. This is informally referred to as "array decaying". More info here: What is array decaying?
So whenever you use the [] operator, you use it on a pointer. Always.
The C standard says that this operator is guaranteed to be equivalent to the pointer arithmetic (C17 6.5.2.1/2):
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).
So whenever we type arr[i], it actually gets silently replaced by *(arr+i). Where arr is still a pointer to the first element.
And this is why the description you quoted tells you that either operand could be a pointer and the other an integer. Because obviously it doesn't matter if we type *(arr+i) or *(i+arr) - that's equivalent code.
Which in turn allows us to write obfuscated "joke" code like i[arr], which is actually valid C and fully equivalent to arr[i]. But don't write such code in real applications.

sizeof operator on array name when offset is added

I was curious about sizeof(arrayName + offset). It gives me the sizeof(pointer). Though array name is actually a constant pointer in C, sizeof(arrayName) gives the size in bytes of an array. So I guess the compiler treat (arrayName+offset) as pure pointer even for sizeof() and hence only exception while using the array name would be sizeof(arrayName).
Is this behavior sizeof(arrayName + offset) well defined by the compiler? I am using MinGW 32 bit compiler.
Also is there any way we can know the size of partial array other than by using simple math like (sizeof(arrayName) - offset*sizeof(arrayName[0]))?
Is sizeof(arrayName) is not an inconsistent language construct in C/C++? For all other purpose, arrayName is treated as an address. And when we pass array to a function, this behavior may lead to bugs and beginners always have issue with this.
An array name is converted to a pointer to its first element in all but three cases:
The operand of the address-of operator &
The operand of the sizeof operator.
The operand of the _Alignof operator.
This is detailed in section 6.3.2.1 of the C standard:
3 Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string
literal used to initialize an array, an expression that has
type "array of type" is converted to an expression with type "pointer
to type" that points to the initial element of the array
object and is not an lvalue. If the array object has
register storage class, the behavior is undefined.
In the case of sizeof(arrayName + offset), the operand of sizeof is the expression arrayName + offset. The type of this expression is a pointer type, since arrayName is converted to a pointer in order to perform pointer arithmetic with offset. So the sizeof expression evaluates to the size of a pointer.
In the case of sizeof(arrayName), the operand of sizeof is an array, so it evaluated to the size of the array in bytes.
Both of these behaviors are well defined by the C standard.

What does address of a, which is an array, returns?

I thought when you try to get the address of an array, it returns the address of the first element it holds.
int *j;
int a[5]={1,5,4,7,8};
Now j=&a[0]; works perfectly fine.
Even j=a also does the same function.
But when I do j=&a it throws an error saying cannot convertint (*)[5]' to int*' in assignment
Why does it happen? &a should be the first element of the array a, so it should give &a[0].
But instead it throws an error. Can somebody explain why?
The C standard says the following regarding how arrays are used in expressions (taken from C99 6.3.2.1/3 "Lvalues, array, and function designators):
Except when it is the operand of the sizeof operator or the unary &
operator, or is a string literal used to initialize an array, an
expression that has type ‘‘array of type’’ is converted to an
expression with type ‘‘pointer to type’’ that points to the initial
element of the array object
This is commonly known as "arrays decay to pointers".
So the sub-expression a in the following larger expressions evaluates to a pointer to int:
j=&a[0]
j=a
In the simpler expression, j=a, that pointer is simply assigned to j.
In the more complex expression, j=&a[0], the 'index' operator [] is applied to the pointer (which is an operation equivalent to *(a + 0)) and the 'address-of' operator is applied to that, resulting in another pointer to int that gets assigned to j.
In the expression j=&a, the address-of operator is applied directly to the array name, and we hit one of the exceptions in the above quoted clause: "Except when it is the operand of ... the unary & operator".
Now when we look at what the standard says about the unary & (address-of) operator (C99 6.5.3.2/3 "Address and indirection operators"):
The unary & operator returns the address of its operand. If the
operand has type "type", the result has type "pointer to type".
Since a has type "array of 5 int" (int [5]), the result of applying & to it directly has type "pointer to array of 5 int" (int (*)[5]), which is not assignable to int*.
The type of a and &a is not the same even though they contain the same value, i.e., base address of the array a.
j = a;
The array name a here gets converted to a pointer to its first element.
Try to see what values you get via these statements to understand where the difference lies:
printf("%p", a+1);
printf("%p", &a+1);
c is a strongly typed language. Assignment such as j=a; is allowed only if j and a are of the same type or the compiler can safely convert a to j. In your case, type of j is int * while the type of &a is int (*)[5]. The compiler does not know how to automatically convert an object of type int (*)[5] to an object of type int *. The compiler is telling you exactly that.
a is an array of 5 ints. The pointer to a is a pointer to an array of five integers, or int (*)[5]. This is not compatible with an int * because of pointer arithmetic: If you increment a variable of type int *, the address in the variable increases by 4 (assuming 4 byte integers), so that it points to the next integer. If you increment a variable that points to an array of 5 integers, the address in the variable increases by 20 (again assuming 4 byte integers), so that it points to the next array of five integers.
Perhaps what's confusing is that the value give by a and &a is the same, as you said. The value is the same but the type is different, and the difference is most obvious when you do arithmetic on the pointers.
I hope that helps.

Does the bracket [] operator only have a single use?

I initially thought that it has a different use for pointers and for arrays. In the former case, it adds whatever is in brackets to the pointer and then dereferences the sum; in the latter case it would just yield the ith element of an array.
Then I realized that an array variable returns the pointer to the first element, so the operator does the same thing in each case: offset and dereference.
Does the bracket [] operator indeed only have a single use in C?
[] is called array subscript operator, but syntactically it's used on a pointer. An array is converted to a pointer to the first element in this usage (and many others). So, yes, [] is the same for arrays and pointers.
C11 §6.5.2.1 Array subscripting
Constraints
One of the expressions shall have type ‘‘pointer to complete object type’’, the other
expression shall have integer type, and the result has type ‘‘type’’.
Semantics
A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).
Whether it does "one thing" depends on what you think "one thing" means.
In C, the operator is defined like so
e1[e2] means *(e1+e2)
That's it. One thing. Or is it? Suppose a is an array and i is an integer. We can write:
a[3]
a[i]
3[a]
i[a]
and suppose p is a pointer and i is an integer. We can write
p[3]
p[i]
3[p]
i[p]
Arrays or pointers. Two things? Not really. You know that when we use the plus operator where one of the two operands is "an array" you are really doing pointer arithmetic.
The second part of your question - can it be used for things other than pointer arithmetic - is basically no in C, but yes in C++, because in C++ we can overload this operator. However sometimes you will see [] in type expressions, but that is probably not what you are asking about because in that case, we aren't really using it as an operator (we're using it as a type operator, which is different).

Resources