Why is this code involving arrays and pointers behaving as it does? - c

I was asked what the output of the following code is:
int a[5] = { 1, 3, 5, 7, 9 };
int *p = (int *)(&a + 1);
printf("%d, %d", *(a + 1), *(p - 1));
3, 9
Error
3, 1
2, 1
The answer is NO.1
It is easy to get *(a+1) is 3.
But how about int *p = (int *)(&a + 1); and *(p - 1) ?

The answer to this could be either "1) 3,9" or "2) Error" (or more specifically undefined behavior) depending on how you read the C standard.
First, let's take this:
&a + 1
The & operator takes the address of the array a giving us an expression of type int(*)[5] i.e. a pointer to an array of int of size 5. Adding 1 to this treats the pointer as pointing to the first element of an array of int [5], with the resulting pointer pointing to just after a.
Also, even though &a points to a singular object (in this case an array of type int [5]) we can still add 1 to this address. This is valid because 1) a pointer to a singular object can be treated as a pointer to the first element of an array of size 1, and 2) a pointer may point to one element past the end of an array.
Section 6.5.6p7 of the C standard states the following regarding treating a pointer to an object as a pointer to the first element of an array of size 1:
For the purposes of these operators, a pointer to an object
that is not an element of an array behaves the same as a pointer
to the first element of an array of length one with the type of the
object as its element type.
And section 6.5.6p8 says the following regarding allowing a pointer to point to just past the end of an array:
When an expression that has integer type is added to or
subtracted from a pointer, the result has the type of the pointer
operand. If the pointer operand points to an element of an array
object, and the array is large enough, the result points to an element
offset from the original element such that the difference of the
subscripts of the resulting and original array elements equals the
integer expression. In other words, if the expression P points to the
i-th element of an array object, the expressions (P)+N
(equivalently, N+(P)) and (P)-N (where N has the value n) point to,
respectively, the i+n-th and i−n-th elements of the array object,
provided they exist. Moreover, if the expression P points to the
last element of an array object, the expression (P)+1 points one past
the last element of the array object, and if the expression Q
points one past the last element of an array object, the
expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to
elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an
overflow; otherwise, the behavior is undefined. If the result points
one past the last element of the array object, it shall not be used as
the operand of a unary * operator that is evaluated.
Now comes the questionable part, which is the cast:
(int *)(&a + 1)
This converts the pointer of type int(*)[5] to type int *. The intent here is to change the pointer which points to the end of the 1-element array of int [5] to the end of the 5-element array of int.
However the C standard isn't clear on whether this conversion and the subsequent operation on the result is allowed. It does allow conversion from one object type to another and back, assuming the pointer is properly aligned. While the alignment shouldn't be an issue, using this pointer is iffy.
So this pointer is assigned to p:
int *p = (int *)(&a + 1)
Which is then used as follows:
*(p - 1)
If we assume that p validly points to one element past the end of the array a, subtracting 1 from it results in a pointer to the last element of the array. The * operator then dereferences this pointer to the last element, yielding the value 9.
So if we assume that (int *)(&a + 1) results in a valid pointer, then the answer is 1) 3,9 otherwise the answer is 2) Error.

In the line
int *p = (int *)(&a + 1);
note that &a is being written, not a. This is important.
If simply a had been written, then the array would have decayed to a pointer to the first element, i.e. to &a[0]. However, since the expression &a was used instead, the result of this expression has the same value as if a or &a[0] had been used, but the type is different: The type is a pointer to an array of 5 int elements, instead of a pointer to a single int element.
According to the rules on pointer arithmetic, incrementing a pointer by 1 will increase the memory address by the size of the object that it is pointing to. Since the pointer is not pointing to a single element, but to an array of 5 elements, the memory address will be incremented by 5 * sizeof(int). Therefore, after incrementing the pointer, the value of (but not type of) the pointer will be equivalent to &a[5], i.e. one past the end of the array.
After casting this pointer to int * and assigning the result to p, the expression p is fully equivalent to &a[5] (both in value and in type).
Therefore, the expression *(p - 1) is equivalent to *(&a[5] - 1), which is equivalent to *(&a[4]), or simply a[4].

This:
&a + 1;
is taking the address of a, an array, and adding 1, which adds the size of one a, i.e. 5 integers. Then the indexing "backs down", one integer, ending up in the final element of a.

Normally whenever arrays are used in expressions, they "decay" into a pointer to the first element. There are a few exceptions to this rule and one such exception is the & operator.
&a therefore yields a pointer to the array of type int (*)[5]. Then &a + 1 is pointer arithmetic on such a type, meaning the pointer address is increased by the size of one int [5]. We end up pointing just beyond the array, but C actually allows us to do that as long as we don't de-reference that location.
Then the pointer is forced a type conversion to (int *) which we can do too - C allows pretty much any manner of wild pointer conversions as long as we don't de-reference or cause misalignment etc.
p - 1 does pointer arithmetic on type int and the actual type of data in the array is also int, so we are allowed to de-reference that location. We end up at the last item of the array.

Related

Is it safe to keep a pointer out-of-bounds without dereferencing it? [duplicate]

This question already has answers here:
Is storing an invalid pointer automatically undefined behavior?
(7 answers)
Does C check if a pointer is out-of-bound without the pointer being dereferenced?
(9 answers)
Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?
(13 answers)
Closed 2 years ago.
Is it safe in C to keep a pointer out-of-bounds (without dereferencing it) for further arithmetic ?
void f(int *array)
{
int *i = array - 1; // OOB
while(...) {
++i;
...
}
}
void g(int *array, int *end /* past-the-end pointer: OOB */)
{
while(array != end) {
...
++array;
}
}
I imagine some extreme cases, if the address is the first of memory or the last one...
Moving pointer to one element past the last element is allowed, but moving further or moving before the first element is not allowed.
Quote from N1570 6.5.6 Additive operators (point 8):
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. Moreover, if the expression P points to the last
element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object,
the expression (Q)-1 points to the last element of the array object. If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
A pointer may point to one element past the last element of the array, and pointer arithmetic may be done between that pointer and a pointer to an element of the array.
Such a pointer cannot be dereferenced, but it can be used in pointer arithmetic. For example, the following is valid:
char arr[10];
char *p1, *p2;
p1 = arr + 10;
p2 = arr + 5;
int diff = p1 - p2;
printf("diff=%d\n", diff); // prints 5
A pointer may not point before the first element.
This is spelled out in section 6.5.6p8 of the C standard:
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object,the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
Note that bolded portion that states that a pointer may be created to point to one element past the end of the array, and there is nothing allowing to point to any point before the start of the array.
As others have pointed out, you are allowed to point one past. But do remember that it is NOT allowed to point one element before the first. So you might want to be careful if you write algorithms that traverses arrays backwards. Because this snippet is invalid:
void foo(int *arr, int *end) {
while(end-- != arr) { // Ouch, bad idea...
// Code
}
// Here, end has the value arr[-1]
}
That's because, when end points at the same element as arr, the condition will be false, but after that, end is decremented once more and will point to one element before the array, thus invoking undefined behavior.
Do note that apart from that, the code works fine. To fix the bug, you can do this instead:
void foo(int *arr, int *end) {
while(end != arr) {
end--; // Move end-- to inside the loop, in the very beginning
// Code
}
// And here, end is equal to arr, which is perfectly fine
}
The code in the loop will work exactly as before. The only difference is that end will not be decremented the last time.

If 'a' is an integer array in C, why are 'a + 1' and '&a + 1' different?

#include<stdio.h>
int main(){
int a[2] = {0, 1};
printf("%d\n",a);
printf("&a + 1 %d\n",&a + 1);
printf("a + 1 %d\n",a + 1);
return 0;
}
The result is as follows:
6422232
&a + 1 6422240
a + 1 6422236
Why are &a + 1 and a + 1 different?
In &a+1, &a takes the address of the array. This yields a pointer to an array, so adding one adds the size of one array. This is because each type of pointer has its own unit of measurement—adding one always adds one of the pointed-to objects.
In a+1, a is the array itself. C automatically converts an array to the address of the first element. So, a yields a pointer to an element, so adding one adds the size of one element.
(In &a+1, a was not automatically converted to a pointer to the first element. Using & with an array is an exception to the conversion. See note 1 below.)
Notes
The automatic conversion of an array to a pointer to its first element occurs in most situations. It does not occur when the array is the argument of sizeof, &, or _Alignof or when the array is a string literal used to initialize an array.
In C’s model, a pointer uses units of whatever type of object it points to. So saying “adds the size of one array” is a bit imprecise. However, if we are talking about navigating storage using valid pointers in an array of objects, moving from one object to another traverses a number of bytes equal to the size of the object.
You should not print pointers with %d. The behavior of that is not defined by the C standard. To print a pointer, convert it to void * and print with %p:
printf("%p\n", (void *) a);
printf("&a + 1 %p\n", (void *) (&a + 1));
printf("a + 1 %p\n", (void *) (a + 1));
Note that %p does not necessarily produce the actual memory address. The C standard allows an implementation to produce some other representation of the pointer. In good C implementations without complicated memory models, %p will print the memory address of the pointed-to object. However, this is a quality-of-implementation feature.
Yes though a and &a have same values. Their type is different. In pointer arithmetic type matters.
For example over here a decayed into pointer to first element(int*) in (a+1)and then it is incremented by sizeof(int).
Where as &a is a case where decaying won't happen. So &a is of type int (*)[2] (pointer to an array of 2 int elements) now when you add to it 1 it moves by the size of the array or 2*sizeof(int).
This is why they are different.
First to be very clear the name of array is a constant pointer to the array. So it constantly points to address in memory where the array 1st element is store. The answer of &a+1 and a+1 differs cause the & a+1 what does its increments the address of array in normal words in increments the size of array and a + 1 prints you the value in addition of 1st element of the array

Printf and Array

I was asked this question as a class exercise:
int A[] = {1,3,5,7,9,0,2,4,6};
printf("%d\n", *(A+A[1]-*A));
I couldn't figure it out on paper, so went ahead to compiling a simple program and tested it and found that printf("%d",*A) always gives me 1 for the output.
But I still do not understand why this is the case, hence it would be great if someone can explain this.
A is treated like a pointer to the first element of array of integers.
A[1] is the value of the first element of that array, which is 3 (indexes are 0-based)
*A is the value to which A points, which if the zeroth element of array, so 1.
So
A[1] - *A == 3 - 1 == 2
Now we have
*(A + 2)
That's where pointer arithmetic kicks in. Since A is a pointer to integer, A+2 points to the second (0-based) item in that array and *(A+2) gets its value.
So answer is 5.
Also please note for future reference that pointer to an integer and array of integers are somewhat different things in C, but for the purposes of this discussion they are the same thing.
Break it down into its constituent parts:
A by itself is the memory address of the array, which is also equivalent to &A[0], the memory address of the first element of the array.
A[1] is the value stored in the second element of the array, which is 3.
*A dereferences the memory address of the array, which is equivilent to A[0], the value stored in the first element of the array, which is 1.
So, do some substitutions:
*(A+A[1]-*A)
= *(A+(A[1])-(A[0]))
= *(A+3-1)
= *(A+2)
The notation *(Array+index) is the same as the notation Array[index]. Under the hood, they both take the starting address of the array, increment it by the number of bytes of the array element type (in this case, int) multiplied by the index, and then dereference the resulting address. So *(A+2) is the same as A[2], which is 5.
Arrays used in expressions are automatically converted into pointers pointing at the first elements of the arrays except for some exceptions such as operands of sizeof or unary & operators.
E1[E2] is defined to be equivalent to *((E1) + (E2))
+ and - operator used to pointers will move the pointer forward and backward.
In this case, *A is equivalent to *(A + 0), which is equivalent to A[0] and it will give you the first element of the array.
The expression *(A+A[1]-*A) will
Get the pointer to the first element, which points at 1, via A
Move the pointer to A[1] (3) elements ahead via +A[1], so the pointer now points at 7
Move the pointer to *A (1) element before what is pointed via -*A, so the pointer now points at 5
Dereference the pointer via the unary * operator, so the expression is evaluated to 5
An array variable in C is only the pointer to the initial memory location for the array. So if you derreference the array, you will always get the value for the first position.
If you sum up 1 to the original array value, like *(A+1) you will get the second position.
You can get any position from the array using the same method:
*(A) is the first position
*(A+1) is the second position
*(A+2) is the third position
and so on...
If you declare the int array as int* A and allocate the memory and attribute the values, it is usually easier to visualize how this works.

How does C retrieve the address of a row for a 2d array

Can someone explain to me how C retrieves the correct memory address for a row when you only use one subscript to access a 2d array?
Example -
int array2D[2][2] = {1,2,3,4};
printf ( "starting address of row2 = %p" , array2D[1]);
I understand that when subscripting in C that what is actually going on is pointer addition so for a 1d array the array name points to element 0. In this case if I had wanted element 1 the compiler would take the starting address (say 4000) and add 4 to it (assuming a 4 bit int) so that what is returned is the item at memory address 4004.
My understanding is that when you populate a 2d array, as in my example, they are allocated sequentially so I would have
1 2
3 4
at addresses
4000 4004
4008 4012
So how does C work out that in this case array2D[1] should point to 4008 and not 4004? Does it run a sizeof() operator or have I misunderstood a fundamental here?
Thanks in advance
C knows how long each row is, so it does the multiplication to find the row.
int x[][3] = {{1,2,3},{4,5,6}};
then &x[1][0] is &x[0][0] plus 3 * sizeof(int).
That's why in a multidimensional C array declaration, all but the first dimension must be specified.
Pointer arithmetic depends on the type of the element being pointed to. Given a pointer p to type T, p + 1 points to the next element of type T, not necessarily the next byte following p. If T is char, then p + 1 points to the next char object after p, which starts at the byte immediately following p; if T is char [10], then p + 1 points to the next 10-element array of char after p, which starts at the 10th byte following p.
The type of the expression array2d in is "2-element array of 2-element array of int", which "decays" to type "pointer to 2-element array of int", or int (*)[2]1. Thus the expression array2d[1] is interpreted as *(array2d + 1). Since array2d points to an object of type int [2], array2d + 1 points to the next 2-element array of int following array2d, which is 2 * sizeof int bytes away from array2d.
1. Except when it is the operand of the sizeof or unary & operators, or is a string literal being used to initialize another array in a declaration, an expression of type "N-element array of T" will be converted to an expression of type "pointer to T" and its value will be the address of the first element in the array.
This is going to be a bit long-winded, but bear with me still.
Array subscription is just a shorthand: (p[N]) equals (*(p + N)) in all contexts for pointer types (both are invalid expressions for void*, though).
Now, if p is an array type, it would decay to a pointer type in an expression like (*(p + N)); an int[2][2] would decay into a pointer of type (*)[2] (i.e. a pointer to an int[2]).
Pointer arithmetic takes types into account; we need to convert things to char* to visualize what the compiler does to us:
T *p;
p[N] equals *(p + N) equals *(T*)((unsigned char*)p + N * sizeof *p)
Now, if T were an int[2] (to equal the situation we described above), then sizeof *p would be sizeof(int[2]), i.e. 2 * sizeof(int).
This is how subscription works in so-called multidimensional arrays.
sizeof(array2D[1]) == 8;
if array2D address is 4000;
so array2D[1] address is 4000+sizeof(array2D[1]) == 4000+8;

Will the prototype of a[1][2] be this: int **a?

a[1][2] is expanded by compiler like this: *( *(a+1) +2 ). So if a has such a prototype:int **a,
The foregoing expression should be explained like this:
Get the address of a from symbol table. Note it is a pointer
to a pointer
Now we add it by 1, then it point to the somewhere next to
where a point to.
Then we dereference it. I think here is a undefined behavior,
for we don't know if a+1 is valid and we arbitraryly access it.
Ok, if we are lucky enough that we successfully get the value
*(a+1). We add this by 2.
Upon this step, we dereference (*(a+1) +2 ). Will we be lucky now?
I read this in Expert C Programming in Chapter 10. Is this correct?
New answer, after edited question:
For a[1][2] to be valid, given that a has is defined as int **a;, both of these must be true:
a must point at the first of two sequential int * objects;
The second of those int * objects must point at the first of three sequential int objects.
The simplest way to arrange this is:
int x[3];
int *y[2] = { 0, x };
int **a = y;
Original answer:
If the expression a[1][2] is valid, then there are many distinct possibilities for the type of a (even neglecting qualifiers like const):
type **a; (pointer to pointer to type)
type *a[n]; (array of n pointers to type)
type (*a)[n]; (pointer to array of n type)
type a[m][n]; (array of m arrays of n type)
Precisely how the expression is evaluated depends on which of these types a actually has.
First a + 1 is calculated. If a is itself a pointer (either case 1 or case 3), then the value of a is directly loaded. If a is an array (case 2 or case 4), then the address of the first element of a is loaded (which is identical to the address of a itself).
This pointer is now offset by 1 object of the type that it points to. In case 1 and case 2, it would be offset by 1 "pointer to type" object; in case 3 and case 4, it would be offset by 1 "array of n type" object, which is the same as ofsetting by n type objects.
The calculated (offset) pointer is now dereferenced. In cases 1 and 2, the result has type "pointer to type", in cases 3 and 4 the result has type "array of n type".
Next *(a + 1) + 2 is calculated. As in the first case, if *(a + 1) is a pointer, then the value is used directly (this time, cases 1 and 2). If *(a + 1) is an array (cases 3 and 4), then the address of the first element of that array is taken.
The resulting pointer (which, at this point, always has type "pointer to type") is now offset by 2 type objects. The final offset pointer is now dereferenced, and the type object is retrieved.
Let's say the definition of a looks something like this:
int a[2][2] = {{1, 2}, {3, 4}};
Here's what the storage that the symbol a looks like:
[ 1 ][ 2 ][ 3 ][ 4 ]
In C, when you perform arithmetic on a pointer, the actual amount by which the pointer value is incremented or decremented is based on the size of the type stored in the array. The type contained in the first dimension of a is int[2], so when we ask C to calculate the pointer value (a + 1), it takes the location named by a and increments it by the size of int[2], which results in pointer referring to the memory location containing the integer value [3]. So yes, when you dereference this pointer and then add 2 to it, the result is the integer value 5. When you then try to dereference that integer value, it makes no sense.
So now let's say the array contains pointers:
char const * one = "one",
two = "two",
three = "three",
four = "four";
char const * a[2][2] = {{one, two}, {three, four}};
Add 1 to a and then dereference it, and you get the char pointer referring to the string "three." Add two to this, and you'll get a pointer referring to a now shorter string "ree". Dereference that, and you get the char value 'r', but only by sheer luck did you avoid a memory protection fault.

Resources