Does the bracket [] operator only have a single use? - c

I initially thought that it has a different use for pointers and for arrays. In the former case, it adds whatever is in brackets to the pointer and then dereferences the sum; in the latter case it would just yield the ith element of an array.
Then I realized that an array variable returns the pointer to the first element, so the operator does the same thing in each case: offset and dereference.
Does the bracket [] operator indeed only have a single use in C?

[] is called array subscript operator, but syntactically it's used on a pointer. An array is converted to a pointer to the first element in this usage (and many others). So, yes, [] is the same for arrays and pointers.
C11 §6.5.2.1 Array subscripting
Constraints
One of the expressions shall have type ‘‘pointer to complete object type’’, the other
expression shall have integer type, and the result has type ‘‘type’’.
Semantics
A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).

Whether it does "one thing" depends on what you think "one thing" means.
In C, the operator is defined like so
e1[e2] means *(e1+e2)
That's it. One thing. Or is it? Suppose a is an array and i is an integer. We can write:
a[3]
a[i]
3[a]
i[a]
and suppose p is a pointer and i is an integer. We can write
p[3]
p[i]
3[p]
i[p]
Arrays or pointers. Two things? Not really. You know that when we use the plus operator where one of the two operands is "an array" you are really doing pointer arithmetic.
The second part of your question - can it be used for things other than pointer arithmetic - is basically no in C, but yes in C++, because in C++ we can overload this operator. However sometimes you will see [] in type expressions, but that is probably not what you are asking about because in that case, we aren't really using it as an operator (we're using it as a type operator, which is different).

Related

Is dereferencing a pointer equal to getting it's first index? [duplicate]

(Self-answered Q&A - this matter keeps popping up)
I assume that the reader is aware of how pointer arithmetic works.
int arr[3] = {1,2,3};
int* ptr = arr;
...
*(ptr + i) = value;
Teachers/C books keep telling me I shouldn't use *(ptr + i) like in the above example, because "pointers support array style indexing" and I should be using ptr[i] = value; instead. No argument there - much easier to read.
But looking through the C standard, I find nothing called "array style indexing". In fact, the operator [] is not expecting either operand to be an array, but instead a pointer or an integer!
6.5.2.1 Array subscripting
Constraints
One of the expressions shall have type ‘‘pointer to complete object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.
Why does the array subscripting operator not expect an array? Is the standard wrong? Is my teacher/C book confused?
You should indeed be using ptr[i] over *(ptr + i) for readability reasons. But apart from that, the [] operator is, strictly speaking, actually never used with an array operand.
Arrays, when used in an expression, always "decay" into a pointer to the first element (with some exceptions). C17 6.3.2.1/3, emphasis mine:
Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue.
Meaning that whenever you type arr[i], the operand arr gets replaced by a pointer to the first element inside that array. This is informally referred to as "array decaying". More info here: What is array decaying?
So whenever you use the [] operator, you use it on a pointer. Always.
The C standard says that this operator is guaranteed to be equivalent to the pointer arithmetic (C17 6.5.2.1/2):
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).
So whenever we type arr[i], it actually gets silently replaced by *(arr+i). Where arr is still a pointer to the first element.
And this is why the description you quoted tells you that either operand could be a pointer and the other an integer. Because obviously it doesn't matter if we type *(arr+i) or *(i+arr) - that's equivalent code.
Which in turn allows us to write obfuscated "joke" code like i[arr], which is actually valid C and fully equivalent to arr[i]. But don't write such code in real applications.

Do pointers support "array style indexing"?

(Self-answered Q&A - this matter keeps popping up)
I assume that the reader is aware of how pointer arithmetic works.
int arr[3] = {1,2,3};
int* ptr = arr;
...
*(ptr + i) = value;
Teachers/C books keep telling me I shouldn't use *(ptr + i) like in the above example, because "pointers support array style indexing" and I should be using ptr[i] = value; instead. No argument there - much easier to read.
But looking through the C standard, I find nothing called "array style indexing". In fact, the operator [] is not expecting either operand to be an array, but instead a pointer or an integer!
6.5.2.1 Array subscripting
Constraints
One of the expressions shall have type ‘‘pointer to complete object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.
Why does the array subscripting operator not expect an array? Is the standard wrong? Is my teacher/C book confused?
You should indeed be using ptr[i] over *(ptr + i) for readability reasons. But apart from that, the [] operator is, strictly speaking, actually never used with an array operand.
Arrays, when used in an expression, always "decay" into a pointer to the first element (with some exceptions). C17 6.3.2.1/3, emphasis mine:
Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue.
Meaning that whenever you type arr[i], the operand arr gets replaced by a pointer to the first element inside that array. This is informally referred to as "array decaying". More info here: What is array decaying?
So whenever you use the [] operator, you use it on a pointer. Always.
The C standard says that this operator is guaranteed to be equivalent to the pointer arithmetic (C17 6.5.2.1/2):
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).
So whenever we type arr[i], it actually gets silently replaced by *(arr+i). Where arr is still a pointer to the first element.
And this is why the description you quoted tells you that either operand could be a pointer and the other an integer. Because obviously it doesn't matter if we type *(arr+i) or *(i+arr) - that's equivalent code.
Which in turn allows us to write obfuscated "joke" code like i[arr], which is actually valid C and fully equivalent to arr[i]. But don't write such code in real applications.

sizeof operator on array name when offset is added

I was curious about sizeof(arrayName + offset). It gives me the sizeof(pointer). Though array name is actually a constant pointer in C, sizeof(arrayName) gives the size in bytes of an array. So I guess the compiler treat (arrayName+offset) as pure pointer even for sizeof() and hence only exception while using the array name would be sizeof(arrayName).
Is this behavior sizeof(arrayName + offset) well defined by the compiler? I am using MinGW 32 bit compiler.
Also is there any way we can know the size of partial array other than by using simple math like (sizeof(arrayName) - offset*sizeof(arrayName[0]))?
Is sizeof(arrayName) is not an inconsistent language construct in C/C++? For all other purpose, arrayName is treated as an address. And when we pass array to a function, this behavior may lead to bugs and beginners always have issue with this.
An array name is converted to a pointer to its first element in all but three cases:
The operand of the address-of operator &
The operand of the sizeof operator.
The operand of the _Alignof operator.
This is detailed in section 6.3.2.1 of the C standard:
3 Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string
literal used to initialize an array, an expression that has
type "array of type" is converted to an expression with type "pointer
to type" that points to the initial element of the array
object and is not an lvalue. If the array object has
register storage class, the behavior is undefined.
In the case of sizeof(arrayName + offset), the operand of sizeof is the expression arrayName + offset. The type of this expression is a pointer type, since arrayName is converted to a pointer in order to perform pointer arithmetic with offset. So the sizeof expression evaluates to the size of a pointer.
In the case of sizeof(arrayName), the operand of sizeof is an array, so it evaluated to the size of the array in bytes.
Both of these behaviors are well defined by the C standard.

Accessing variables in allocated memory

Lets say I want to allocate memory for 3 integers:
int *pn = malloc(3 * sizeof(*pn));
Now to assign to them values I do:
pn[0] = 5550;
pn[1] = 11;
pn[2] = 70000;
To access 2nd value I do:
pn[1]
But the [n] operator is just a shortcut for *(a+n). Then it would mean that i access first byte after a index. But int is 4 bytes long so shoudn't i do
*(a+sizeof(*a)*n)
instead? How does it work?
No, the compiler takes care of that. There are special rules in pointer arithmetic, and that is one of them.
If you really only want to increment it by one byte, you have to cast the pointer to a pointer to a type which is one byte long (for example char).
Good question, but C will automatically multiply the offset by the size of the pointed-to type. In other words, when you access
p[n]
for a pointer declared as
T *p;
you will access the address p + (sizeof(T) * n) implicitly.
For instance, we can use C99 standard to find out what is going on. According to C99 standard:
6.5.2.1 Array subscripting
Constraints
- 1
One of the expressions shall have type ‘‘pointer to object type’’, the other expression shall
have integer type, and the result has type ‘‘type’’.
Semantics
- 2 A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
And from 6.5.5.8 about conversion rules for + operator:
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. Moreover, if the expression P points to the last
element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object,
the expression (Q)-1 points to the last element of the array object. If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
Thus, all these notes are about your case and it works exactly as you wrote and you don't need special constructions, dereferencing or anything else (pointers arithmetic do that for you):
pn[1] => *((pn)+(1))
Or, in terms of byte pointers (to simplify description what is going on) this operation is similar to :
pn[1] => *(((char*)pn) + (1*sizeof(*pn)))
Moreover you can access this element with 1[pn] and result will be the same.
You should not. The rules what is happening when you add a int to a pointer are not obvious. So better not use your intuition, but read language standards about what is happens in such a cases. For example read more about pointer arithmetics here (C) or here (C++).
Shortly - a non-void pointer are "measured" in units of the type lengths.

What is the proper input datatype for the operator []?

When accessing an array we use the operator [] like so:
int a[5];
...
a[b] = 12;
What is the proper data type for the variable b above?
I've found that a[b] is equivalent to *(a + b), which makes me think that I would want b to be void* or size_t but, I'm not sure.
From the C standard(ISO/IEC 9899:TC2) Sec 6.5.2.1 Array subscripting
A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
One of a and b must be a pointer, and the other must be any integer type. A proof follows.
Since a[b] is identical to (*((a)+(b))) per C 2011 (n1570) 6.5.2.1 2, a and b may be any types such that the latter expression is defined.
Per 6.5.3.2 2, the operand of the unary * operator must have pointer type. Therefore, the result of (a)+(b) must have pointer type.
Per 6.5.6, the binary + operator accepts various combinations of types, but the only one that yields a pointer type is the combination of a pointer and an integer, as described in 6.5.6 8.
According to 6.5.6 8, an integer may be added to a pointer, and the result has the type of the pointer operand. Clause 6.5.6 makes no distinction about the order of the operands of +, so they may be in either order. Thus, either of a and b may be the pointer, and the other will be the integer.
b is known as Index which is a number.So you can deal with it as integer.a[b] will access bth element of array a.
*(a + b) will give value at (a + (b * sizeof(data type of array element)))
A few types can work, due to implicit conversion.
A char will be promoted to int. An int or unsigned int will work too. short or unsigned short are valid types too.
Even if a[b] is the same as *(a + b), b cannot be a pointer, because a is already a pointer. The binary operator + doesn't accept two pointers.

Resources