I've recently got into some pieces of code doing some questionable 2D arrays indexing operations. Considering as an example the following code sample:
int a[5][5];
a[0][20] = 3;
a[-2][15] = 4;
a[5][-3] = 5;
Are the indexing operations above subject to undefined behavior?
It's undefined behavior, and here's why.
Multidimensional array access can be broken down into a series of single-dimensional array accesses. In other words, the expression a[i][j] can be thought of as (a[i])[j]. Quoting C11 §6.5.2.1/2:
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).
This means the above is identical to *(*(a + i) + j). Following C11 §6.5.6/8 regarding addition of an integer and pointer (emphasis mine):
If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
In other words, if a[i] is not a valid index, the behavior is immediately undefined, even if "intuitively" a[i][j] seems in-bounds.
So, in the first case, a[0] is valid, but the following [20] is not, because the type of a[0] is int[5]. Therefore, index 20 is out of bounds.
In the second case, a[-1] is already out-of-bounds, thus already UB.
In the last case, however, the expression a[5] points to one past the last element of the array, which is valid as per §6.5.6/8:
... if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object ...
However, later in that same paragraph:
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
So, while a[5] is a valid pointer, dereferencing it will cause undefined behavior, which is caused by the final [-3] indexing (which, is also out-of-bounds, therefore UB).
array indexing with negative indexes is undefined behaviour. Sorry, that a[-3] is the same as *(&a - 3) in most architectures/compilers, and accepted without warning, but the C language allows you to add negative integers to pointers, but not use negative values as array indexes. Of curse this is not even checked at runtime.
Also, there are some issues to be acquainted for when defining arrays in front to pointers. You can leave unspecified just the first subindex, and no more, like in:
int a[][3][2]; /* array of unspecified size, definition is alias of int (*a)[3][2]; */
(indeed, the above is a pointer definition, not an array, just print sizeof a)
or
int a[4][3][2]; /* array of 24 integers, size is 24*sizeof(int) */
when you do this, the way to evaluate the offset is different for arrays than for pointers, so be carefull. In case of arrays, int a[I][J][K];
&a[i][j][k]
is placed at
&a + i*(sizeof(int)*J*K) + j*(sizeof(int)*K) + k*(sizeof(int))
but when you declare
int ***a;
then a[i][j][k] is the same as:
*(*(*(&a+i)+j)+k), meaning you have to dereference pointer a, then add (sizeof(int **))*i to its value, then dereference again, then add (sizeof (int *))*j to that value, then dereference it, and add (sizeof(int))*k to that value to get the exact address of the data.
BR
Related
I learned by this question that incrementing a NULL pointer or incrementing past the end of an array isn't well-defined behavior:
int* pointer = 0;
pointer++;
int a[3];
int* pointer2 = &a[0];
pointer2 += 4;
But what If the pointer pointing to a invalid place is only used for comparison and memory at his location is never accessed?
Example:
void exampleFunction(int arrayLen, char** strArray)
{
for(char** str = strArray; str < strArray + arrayLen; str++) //or even str < &strArray[arrayLen]
{
//here *str is always a pointer to the first char of my string
}
}
Here I compare my pointer to a pointer one element past the end of the array. Is this well-defined behavior?
Comparing to a pointer one step past the end of an array is well defined. However, your pointer and pointer2 examples are undefined, even if you do literally nothing with those pointers.
A pointer may point to one element past the end of the array. That pointer may not be dereferenced (otherwise that would be undefined behavior) but it can be compared to another pointer within the array.
Section 6.5.6 of the C standard says the following regarding pointer addition (emphasis added):
8 If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.
Section 6.5.8 says the following regarding pointer comparisons (emphasis added):
5 When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two
pointers to object types both point to the same object, or both point
one past the last element of the same array object, they compare
equal. If the objects pointed to are members of the same aggregate
object, pointers to structure members declared later compare greater
than pointers to members declared earlier in the structure, and
pointers to array elements with larger subscript values compare
greater than pointers to elements of the same array with lower
subscript values. All pointers to members of the same union object
compare equal. If the expression P points to an element of an array
object and the expression Q points to the last element of the same
array object, the pointer expression Q+1 compares greater than P. In
all other cases, the behavior is undefined.
In the case of pointer1, it starts out pointing to NULL. Incrementing this pointer invokes undefined behavior because it don't point to a valid object.
For pointer2, it is increased by 4, putting it two elements past the end of the array, not one, so this is again undefined behavior. Had it been increased by 3, the behavior would be well defined.
Is the behavior of the following program undefined?
#include <stdio.h>
int main(void)
{
int arr[2][3] = { { 1, 2, 3 },
{ 4, 5, 6 }
};
int *ptr1 = &arr[0][0]; // pointer to first elem of { 1, 2, 3 }
int *ptr3 = ptr1 + 2; // pointer to last elem of { 1, 2, 3 }
int *ptr3_plus_1 = ptr3 + 1; // pointer to one past last elem of { 1, 2, 3 }
int *ptr4 = &arr[1][0]; // pointer to first elem of { 4, 5, 6 }
// int *ptr_3_plus_2 = ptr3 + 2; // this is not legal
/* It is legal to compare ptr3_plus_1 and ptr4 */
if (ptr3_plus_1 == ptr4) {
puts("ptr3_plus_1 == ptr4");
/* ptr3_plus_1 is a valid address, but is it legal to dereference it? */
printf("*ptr3_plus_1 = %d\n", *ptr3_plus_1);
} else {
puts("ptr3_plus_1 != ptr4");
}
return 0;
}
According to §6.5.6 ¶8:
Moreover, if the expression P points to the last element of an
array object, the expression (P)+1 points one past the last
element of the array object.... If both the pointer operand and the
result point to elements of the same array object, or one past the
last element of the array object, the evaluation shall not produce an
overflow; otherwise, the behavior is undefined. If the result points
one past the last element of the array object, it shall not be used as
the operand of a unary * operator that is evaluated.
From this, it would appear that the behavior of the above program is undefined; ptr3_plus_1 points to an address one past the end of the array object from which it is derived, and dereferencing this address causes undefined behavior.
Further, Annex J.2 suggests that this is undefined behavior:
An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).
There is some discussion of this issue in the Stack Overflow question, One-dimensional access to a multidimensional array: well-defined C?. The consensus here appears to be that this kind of access to arbitrary elements of a two-dimensional array through one-dimensional subscripts is indeed undefined behavior.
The issue, as I see it, is that it is not even legal to form the address of the pointer ptr3_plus_2, so it is not legal to access arbitrary two-dimensional array elements in this way. But, it is legal to form the address of the pointer ptr3_plus_1 using this pointer arithmetic. Further, it is legal to compare the two pointers ptr3_plus_1 and ptr4, according to §6.5.9 ¶6:
Two pointers compare equal if and only if both are null pointers, both
are pointers to the same object (including a pointer to an object and
a subobject at its beginning) or function, both are pointers to one
past the last element of the same array object, or one is a pointer
to one past the end of one array object and the other is a pointer to
the start of a different array object that happens to immediately
follow the first array object in the address space.
So, if it both ptr3_plus_1 and ptr4 are valid pointers that compare equal and that must point to the same address (the object pointed to by ptr4 must be adjacent in memory to the object pointed to by ptr3 anyway, since array storage must be contiguous), it would seem that *ptr3_plus_1 is as valid as *ptr4.
Is this undefined behavior, as described in §6.5.6 ¶8 and Annex J.2, or is this an exceptional case?
To Clarify
It seems unambiguous that it is undefined behavior to attempt to access the element one past the end of the final row of a two-dimensional array. My interest is in the question of whether it is legal to access the first element of the intermediate rows by forming a new pointer using a pointer to an element from the previous row and pointer arithmetic. It seems to me that a different example in Annex J.2 could have made this more clear.
Is it possible to reconcile the clear statement in §6.5.6 ¶8 that an attempted dereference of a pointer to the location one past the end of an array leads to undefined behavior with the idea that the pointer past the end of the first row of a two-dimensional array of type T[][] is also a pointer of type T * that points to an object of type T, namely the first element of an array of type T[]?
So, if it both ptr3_plus_1 and ptr4 are valid pointers that compare equal and that must point to the same address
They are.
it would seem that *ptr3_plus_1 is as valid as *ptr4.
It is not.
The pointers are equal, but not equivalent. The trivial well-known example of the distinction between equality and equivalence is negative zero:
double a = 0.0, b = -0.0;
assert (a == b);
assert (1/a != 1/b);
Now, to be fair, there is a difference between the two, as positive and negative zero have a different representation, ptr3_plus_1 and ptr4 on typical implementations have the same representation. This is not guaranteed, and on implementations where they would have different representations, it should be clear that your code might fail.
Even on the typical implementations, while there are good arguments to be made that the same representation implies equivalent values, to the best of my knowledge, the official interpretation is that the standard does not guarantee this, therefore programs cannot rely on it, therefore implementations can assume programs do not do this and optimise accordingly.
A debugging implementation might use "fat" pointers. For example, a pointer may be represented as a tuple (address, base, size) to detect out-of-bounds access. There is absolutely nothing wrong or contrary to the standard about such representation. So any pointer arithmetic that brings the pointer outside the range of [base, base+size] fails, and any dereference outside of [base, base+size) also fails.
Note that base and size are not the address and the size of the 2D array but rather of the array that the pointer points into (the row in this case).
It might sound trivial in this case, but when deciding whether a certain pointer construction is UB or not, it is useful to mentally run your example through this hypothetical implementation.
I want to know in c language, if I have an array whose length is 3, if i try to access the 4th element of the array, which memory address will it point to?
I have read similar problems of accessing array element out of bound, they said that this is a typical undefined behavior which is unsafe, but will there be any common rules for where the 4th element would point to?.
For example, which memory address would the array[3] refer to with declaration given here?
int a = 10;
int array[3] = {1, 2, 3};
int b = 20;
printf("%d", array[3]); // access the 4th element here
May it point to a or b or array[x] or it's totally random?
The key point here is : if i declared variable A after variable B (especially when they are global variable or static variable), will they be stored continuously in memory? Or it's totally depend on compiler?
First, to add some clarity to the why part of the undefined behavior that you wrote about, as per C11 spec, you are allowed to write an expression which gives you the address of the element one past the last element, that's fine, but you can not dereference it. The later invokes undefined behavior.
Considering array[3] being the same as *(array + 3), quoting the C11 standard, chapter §6.5.6, Additive operators
[....] If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
and
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
So, to sum this up, (array + 3) will point to the next location of the last element in the array, (i.e., &array[2]) but whether that address will be that of the address of a or b or random memory location, is compiler dependent, entirely.
May it point to a or b or array[x] or it's totally random?
TOTALLY RANDOM :)
This will be totally undefined..
In fact, since those variables are local, they will porbably be sitting on the stack. But depending on your system and/or compiler, some of them (for example a and b in this case) might be stored in a register instead of in memory.
So, your &array[3] will probably be pointing somewhere in the stackspace, pointing to an int further then &array[2], but it is undefined to what value it will point, and accessing it will/should be illegal.
Suppose I want to get the last element of an automatic array whose size is unknown. I know that I can make use of the sizeof operator to get the size of the array and get the last element accordingly.
Is using *((*(&array + 1)) - 1) safe?
Like:
char array[SOME_SIZE] = { ... };
printf("Last element = %c", *((*(&array + 1)) - 1));
int array[SOME_SIZE] = { ... };
printf("Last element = %d", *((*(&array + 1)) - 1));
etc
No, it is not.
&array is of type pointer to char[SOME_SIZE] (in the first example given). This means &array + 1 points to memory immediately past the end of array. Dereferencing that (as in (*(&array+1)) gives undefined behaviour.
No need to analyse further. Once there is any part of an expression that gives undefined behaviour, the whole expression does.
I don't think it is safe.
From the standard as #dasblinkenlight quoted in his answer (now removed) there is also something I would like to add:
C99 Section 6.5.6.8 -
[...]
if the expression P points to the last element of an array object, the expression (P)+1 points [...]
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
So as it says , we should not do this *(&array + 1) as it will go one past the last element of array and so * should not be used.
As also it is well known that dereferencing pointers pointing to an unauthorized memory location leads to undefined behaviour .
I believe it's undefined behavior for the reasons Peter mentions in his answer.
There is a huge debate going on about *(&array + 1). On the one hand, dereferencing &array + 1 seems to be legal because it's only changing the type from T (*)[] back to T [], but on the other hand, it's still a pointer to uninitialized, unused and unallocated memory.
My answer relies on the following:
C99 6.5.6.7 (Semantics of additive operators)
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
Since &array is not a pointer to an object that is an element of an array, then according to this, it means that the code is equivalent to:
char array_equiv[1][SOME_SIZE] = { ... };
/* ... */
printf("Last element = %c", *((*(&array_equiv[0] + 1)) - 1));
That is, &array is a pointer to an array of 10 chars, so it behaves the same as a pointer to the first element of an array of length 1 where each element is an array of 10 chars.
Now, that together with the clause that follows (already mentioned in other answers; this exact excerpt is blatantly stolen from ameyCU's answer):
C99 Section 6.5.6.8 -
[...]
if the expression P points to the last element of an array object, the expression (P)+1 points [...]
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
Makes it pretty clear that it is UB: it's equivalent to dereferencing a pointer that points one past the last element of array_equiv.
Yes, in real world, it probably works, as in reality the original code doesn't really dereference a memory location, it's mostly a type conversion from T (*)[] to T [], but I'm pretty sure that from a strict standard-compliance point of view, it is undefined behavior.
It is probably safe, but there are some caveats.
Suppose we have
T array[LEN];
Then &array is of type T(*)[LEN].
Next, &array + 1 is again of type T(*)[LEN], pointing just past the end of the original array.
Next, *(&array + 1) is of type T[LEN], which may be implicitly converted to T*, still pointing just past the end of the original array. (So we did NOT dereference an invalid memory location: the * operator is not evaluated).
Next, *(&array + 1) - 1 is of type T*, pointing at the last array location.
Finally, we dereference this (which is legitimate if the array length is not zero): *(*(&array + 1) - 1) gives the last array element, a value of type T.
Note that the only time we actually dereference a pointer is in this last step.
Now, the potential caveats.
First, *(&array + 1) formally appears like an attempt to dereference a pointer that points to an invalid memory location. But it really isn't. That's the nature of array pointers: this formal dereference only changes the type of the pointer, does not actually result in an attempt to retrieve value from the referenced location. That is, array is of type T[LEN] but it may be implicitly converted to type &T, pointing to the first element of the array; &array is a pointer to type T[LEN], pointing at the beginning of the array; *(&array+1) is again of type T[LEN] which may be implicitly converted to type &T. At no point is a pointer actually dereferenced.
Second, &array + 1 may in fact be an invalid address, but it really isn't: My C++11 reference manual tells me explicitly that "Taking a pointer to the element one beyond the end of an array is guaranteed to work", and a similar statement is also made in K&R, so I believe it has always been standard behavior.
Finally, in case of a zero-length array, the expression dereferences the memory location just before the array, which may be unallocated/invalid. But this issue would also arise if one used a more conventional approach using sizeof() without testing for nonzero length first.
In short, I do not believe there is anything undefined or implementation-dependent about this expression's behavior.
Imho that might work but is probably unwise. You should carefully review your sw design and ask yourself why you want the last entry of the array. Is the content of the array completely unknown to you or is it possible to define the structure in terms of c structs and unions. If that is the case stay away from complex pointer operations in a char array for example and define the data properly in you c code, in structs and unions where ever possible.
So instead of :
printf("Last element = %c", *((*(&array + 1)) - 1));
It could be :
printf("Checksum = %c", myStruct.MyUnion.Checksum);
This clarifies your code. The last letter in your array means nothing to a person not familiar with whats in this array. myStruct.myUnion.Checksum makes sense to anyone. Studying the myStruct structure could explain the whole data structure to anyone. Please use something like that if it can be declared in such a way. If you are in the rare situation you can not, study above answers, they make good sense i think
a)
If both the pointer operand and the result [of P + N] point to
elements of the same array object, or one past the last element of the
array object, the evaluation shall not produce an overflow;
[...]
if the expression P points either to an element of an array
object or one past the last element of an array object, and the
expression Q points to the last element of the same array object, the
expression ((Q)+1)−(P) has the same value as ((Q)−(P))+1 and as
−((P)−((Q)+1)), and has the value zero if the expression P points one
past the last element of the array object, even though the expression
(Q)+1 does not point to an element of the array object.
This states that computations using array elements one past the last element is actually completely fine. As some people here have written that the use of non-existent objects for computations is already illegal, I thought I include that part.
Then we need to take care about this part:
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is
evaluated.
There is one important part that the other answers omitted and that is:
If the pointer operand points to an element of an array object
This is not the fact. The pointer operand we dereference is not a pointer to an element of an array object, it is a pointer to a pointer. So this whole clause is completely irrelevant. But, there is also stated:
For the purposes of these [additive] operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
What does this mean?
It means our pointer to a pointer is actually again a pointer to an array - of length[1]. And now we can close the loop, because as the first paragraph states, we are allowed to make calculations with one past the array, so we are allowed to make calculations with the array as if it would be an array of length[2]!
In a more graphical way:
ptr -> (ptr to int[10])[0] -> int[10]
-> (ptr to int[10])[1]
So, we are allowed to make calculations with (ptr to int[10])[1], even though it is technically outside the array of length[1].
b)
The steps that happen are:
array ptr of type int[SOME_SIZE] to the first element array
&array ptr to a ptr of type int[SOME_SIZE] to the first element of array
+ 1 ptr, one more than the ptr of type int[SOME_SIZE]) to the first element array, to a ptr of type int
This is NOT yet a pointer to int[SOME_SIZE+1], according to C99 Section 6.5.6.8. This is NOT yet ptr + SOME_SIZE + 1
* We dereference the pointer to the pointer. NOW, after the dereferencing, we have a pointer according to C99 Section 6.5.6.8, which is past the element of the array and which is not allowed to be dereferenced. This pointer is allowed to exist and we are allowed to use operators on it, except the unary * operator. But we don't use that one on that pointer yet.
-1 Now we subtract one from the ptr of type int to one after the last element of the array, letting ptr point to the last element of the array.
* dereferencing a ptr to int to the last element of the array, which is legal.
c)
And last, but not least:
If it would be illegal, then the offsetof macro would be illegal, too, which is defined as:
((size_t)(&((st *)0)->m))
I'm very confused about this question in C.
if a[i] is equivalent to *(a+i). What is the equivalent of a[j][i]?
I know the (a+i) is incrementing the memory address of the first element of the array by the value of i and then using the * operator to dereference that address to obtain the value. However, I am confused about multidimensional arrays. In memory, the values are stored just like a single dimensional array but I don't understand how I can increment the memory address by using the variable i or j like in the single dimensional array example.
for some reason printing *a in single dimensional array will print the first element of the array whereas *a in a multidimensional array will print a random number. Why is this so?
Any help is greatly appreciated.
if a[i] is equivalent to *(a+i). What is the equivalent of a[j][i]?
a[j][i]
is similar to
*(*(a+ j) + i)
Now If you want to know how it is?
Then let see
You already know that
a[j]=*(a+j) -------------------------> res 1
Now
a[j][i] = *(a[j]+i); --------------------------> res2
After that replace the res1 in res2. So it become
a[j][i] = *(*(a+ j) + i) ----------------------> res3
The literal answer to the question is simple: a[i] is defined to be always identical to *(a+i) by the standard, and therefore, a[j][i] is guaranteed to be always identical to *(*(a+j)+i). However, that by itself does not help us to understand what is going on; it just transforms one compound expression to another.
a[j][i] (and by extension, *(*(a+j)+i)) does very different things depending on the type of a. This is because, depending on the types, there may be implicit array-to-pointer conversions that are not apparent.
In C, a value of array type T[x] is implicitly converted to an rvalue of pointer type T* in many contexts, some of which include the left side of the subscript operator, as well as an operand in addition. So if you do either a[i] or *(a+i), and a is an expression of array type, in both cases it is converted to a pointer to its first element (like &a[0]) and it's the pointer that participates in the operation. Thus you can see how *(a+i) makes sense.
If a had type T[x][y], it would be a "true" multidimensional array, which is a C array whose elements are themselves C arrays (of a certain compile-time-constant size). In this case, if you consider *(*(a+j)+i), what is happening is 1) a is converted to a pointer to its first element (which is a pointer to an array, of type T(*)[y]), 2) that pointer is incremented and dereferenced, producing a value of array type (the jth subarray of a), 3) that array is then converted to a pointer to its first element (a pointer of type T*), which is then 4) incremented and dereferenced. This finally produces the ith element of the jth element of a, what you usually think of as a[j][i].
However, a could also have type, say, T**. This is usually used to implement "fake" multidimensional arrays, which is an array of pointers, which then in turn point to the first element of some array. This allows you to have "rows" that can have different sizes (thus the multidimensional array need not be "rectangular"), and sizes not fixed at compile time. The "rows", as well as the main pointer array, do not have to be stored contiguously. In this case, if you consider *(*(a+j)+i), what is happening is 1) a is incremented and dereferenced, producing a value of pointer type (the jth element of a), 2) that pointer is then incremented and dereferenced. This finally produces the ith element of the array referred to by the jth element of the main pointer array. Note that in this case there are no implicit array-to-pointer conversions.
Generally array follows pointer concepts.For 1D array one time dereference is enough to get value.But in multidimensional array ,to get values we have dereference 2times in 2D array, 3times in 3D array like that.
In a[j][i]=*(*(a+j)+i)
A multidimensional array in C is contiguous. The following:
int a[4][5];
consists of 4 int[5]s laid out next to each other in memory.
An array of pointers:
int *a[4];
is jagged. Each pointer can point to (the first element of) a separate array of a different length.
a[i][j] is equivalent to ((a+i)+j). See the C11 standard, section 6.5.2.1:
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))
Thus, a[i][j] is equivalent to (*(a+i))[j], which is equivalent to ((a+i)+j).
This equivalence exists because in most contexts, expressions of array type decay to pointers to their first element (C11 standard, 6.3.2.1). a[i][j] is interpreted as the following:
a is an array of arrays, so it decays to a pointer to a[0], the first subarray.
a+i is a pointer to the ith subarray of a.
a[i] is equivalent to *(a+i), dereferencing a pointer to the ith subarray of a. Since this is an expression of array type, it decays to a pointer to a[i][0].
a[i][j] is equivalent to *(*(a+i)+j), dereferencing a pointer to the jth element of the ith subarray of a.
Note that pointers to arrays are different from pointers to their first element. a+i is a pointer to an array; it is not an expression of array type, and it does not decay, whether to a pointer to a pointer or to any other type.
and for some reason printing *a in single dimensional array will print the first element of the array whereas *a in a multidimensional array will print a random number. Why is this so?
for two dimensional array you need to dereference it accordingly ,
printf("\nvalue : %d",**a);
to print the first element of the array.