So I have a function that returns a pointer to an element in an array A. I have another function that takes that pointer as a parameter. However, I need that function to be able to deal with the possibility that it may be passed a completely arbitrary pointer.
Is there a way to be able to detect if a pointer points somewhere within a structure? In this case, my array A?
I've seen similar questions regarding C++, but not with C.
The only portable way is to use an equality test against all of the possible valid values for the pointer. For example:
int A[10];
bool points_to_A(int *ptr)
{
for (int i = 0; i < 10; ++i)
if ( ptr == &A[i] )
return true;
return false;
}
Note that using a relational operator (e.g. <), or subtraction, with two pointers is undefined behaviour unless the two pointers actually do point to elements of the same array (or one past the end).
In section §6.5.8 Relational operators, the C11 standard (ISO/IEC 9899:2011) says:
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than
P. In all other cases, the behavior is undefined.
If you know that the pointer is within range of the array, then comparisons work. If the pointer is outside the range, you can't be sure that the comparisons will work. In practice, it usually does, but the standard explicitly says that the comparison yields undefined behaviour.
Note that for an array SomeType array[20];, the address &array[20] is guaranteed to be valid and to compare reliably with any address from &array[0] through &array[19]. You need to decide whether you want to count that as being in your array.
Subject to the observation that the standard does not guarantee that it will work, then, you can compare two int pointers:
int within_int_array(int *array, size_t num_ints, int *ptr)
{
return ptr >= array && ptr < array + num_ints;
}
These days, you should be increasingly cautious about invoking undefined behaviour. Compilers do nasty things to programs that use undefined behaviour, and technically, you have no recourse since the standard says "undefined behaviour".
Related
Is the behavior of the following program undefined?
#include <stdio.h>
int main(void)
{
int arr[2][3] = { { 1, 2, 3 },
{ 4, 5, 6 }
};
int *ptr1 = &arr[0][0]; // pointer to first elem of { 1, 2, 3 }
int *ptr3 = ptr1 + 2; // pointer to last elem of { 1, 2, 3 }
int *ptr3_plus_1 = ptr3 + 1; // pointer to one past last elem of { 1, 2, 3 }
int *ptr4 = &arr[1][0]; // pointer to first elem of { 4, 5, 6 }
// int *ptr_3_plus_2 = ptr3 + 2; // this is not legal
/* It is legal to compare ptr3_plus_1 and ptr4 */
if (ptr3_plus_1 == ptr4) {
puts("ptr3_plus_1 == ptr4");
/* ptr3_plus_1 is a valid address, but is it legal to dereference it? */
printf("*ptr3_plus_1 = %d\n", *ptr3_plus_1);
} else {
puts("ptr3_plus_1 != ptr4");
}
return 0;
}
According to §6.5.6 ¶8:
Moreover, if the expression P points to the last element of an
array object, the expression (P)+1 points one past the last
element of the array object.... If both the pointer operand and the
result point to elements of the same array object, or one past the
last element of the array object, the evaluation shall not produce an
overflow; otherwise, the behavior is undefined. If the result points
one past the last element of the array object, it shall not be used as
the operand of a unary * operator that is evaluated.
From this, it would appear that the behavior of the above program is undefined; ptr3_plus_1 points to an address one past the end of the array object from which it is derived, and dereferencing this address causes undefined behavior.
Further, Annex J.2 suggests that this is undefined behavior:
An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).
There is some discussion of this issue in the Stack Overflow question, One-dimensional access to a multidimensional array: well-defined C?. The consensus here appears to be that this kind of access to arbitrary elements of a two-dimensional array through one-dimensional subscripts is indeed undefined behavior.
The issue, as I see it, is that it is not even legal to form the address of the pointer ptr3_plus_2, so it is not legal to access arbitrary two-dimensional array elements in this way. But, it is legal to form the address of the pointer ptr3_plus_1 using this pointer arithmetic. Further, it is legal to compare the two pointers ptr3_plus_1 and ptr4, according to §6.5.9 ¶6:
Two pointers compare equal if and only if both are null pointers, both
are pointers to the same object (including a pointer to an object and
a subobject at its beginning) or function, both are pointers to one
past the last element of the same array object, or one is a pointer
to one past the end of one array object and the other is a pointer to
the start of a different array object that happens to immediately
follow the first array object in the address space.
So, if it both ptr3_plus_1 and ptr4 are valid pointers that compare equal and that must point to the same address (the object pointed to by ptr4 must be adjacent in memory to the object pointed to by ptr3 anyway, since array storage must be contiguous), it would seem that *ptr3_plus_1 is as valid as *ptr4.
Is this undefined behavior, as described in §6.5.6 ¶8 and Annex J.2, or is this an exceptional case?
To Clarify
It seems unambiguous that it is undefined behavior to attempt to access the element one past the end of the final row of a two-dimensional array. My interest is in the question of whether it is legal to access the first element of the intermediate rows by forming a new pointer using a pointer to an element from the previous row and pointer arithmetic. It seems to me that a different example in Annex J.2 could have made this more clear.
Is it possible to reconcile the clear statement in §6.5.6 ¶8 that an attempted dereference of a pointer to the location one past the end of an array leads to undefined behavior with the idea that the pointer past the end of the first row of a two-dimensional array of type T[][] is also a pointer of type T * that points to an object of type T, namely the first element of an array of type T[]?
So, if it both ptr3_plus_1 and ptr4 are valid pointers that compare equal and that must point to the same address
They are.
it would seem that *ptr3_plus_1 is as valid as *ptr4.
It is not.
The pointers are equal, but not equivalent. The trivial well-known example of the distinction between equality and equivalence is negative zero:
double a = 0.0, b = -0.0;
assert (a == b);
assert (1/a != 1/b);
Now, to be fair, there is a difference between the two, as positive and negative zero have a different representation, ptr3_plus_1 and ptr4 on typical implementations have the same representation. This is not guaranteed, and on implementations where they would have different representations, it should be clear that your code might fail.
Even on the typical implementations, while there are good arguments to be made that the same representation implies equivalent values, to the best of my knowledge, the official interpretation is that the standard does not guarantee this, therefore programs cannot rely on it, therefore implementations can assume programs do not do this and optimise accordingly.
A debugging implementation might use "fat" pointers. For example, a pointer may be represented as a tuple (address, base, size) to detect out-of-bounds access. There is absolutely nothing wrong or contrary to the standard about such representation. So any pointer arithmetic that brings the pointer outside the range of [base, base+size] fails, and any dereference outside of [base, base+size) also fails.
Note that base and size are not the address and the size of the 2D array but rather of the array that the pointer points into (the row in this case).
It might sound trivial in this case, but when deciding whether a certain pointer construction is UB or not, it is useful to mentally run your example through this hypothetical implementation.
So I have a function that returns a pointer to an element in an array A. I have another function that takes that pointer as a parameter. However, I need that function to be able to deal with the possibility that it may be passed a completely arbitrary pointer.
Is there a way to be able to detect if a pointer points somewhere within a structure? In this case, my array A?
I've seen similar questions regarding C++, but not with C.
The only portable way is to use an equality test against all of the possible valid values for the pointer. For example:
int A[10];
bool points_to_A(int *ptr)
{
for (int i = 0; i < 10; ++i)
if ( ptr == &A[i] )
return true;
return false;
}
Note that using a relational operator (e.g. <), or subtraction, with two pointers is undefined behaviour unless the two pointers actually do point to elements of the same array (or one past the end).
In section §6.5.8 Relational operators, the C11 standard (ISO/IEC 9899:2011) says:
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than
P. In all other cases, the behavior is undefined.
If you know that the pointer is within range of the array, then comparisons work. If the pointer is outside the range, you can't be sure that the comparisons will work. In practice, it usually does, but the standard explicitly says that the comparison yields undefined behaviour.
Note that for an array SomeType array[20];, the address &array[20] is guaranteed to be valid and to compare reliably with any address from &array[0] through &array[19]. You need to decide whether you want to count that as being in your array.
Subject to the observation that the standard does not guarantee that it will work, then, you can compare two int pointers:
int within_int_array(int *array, size_t num_ints, int *ptr)
{
return ptr >= array && ptr < array + num_ints;
}
These days, you should be increasingly cautious about invoking undefined behaviour. Compilers do nasty things to programs that use undefined behaviour, and technically, you have no recourse since the standard says "undefined behaviour".
The output of the following c program is 1. Can someone please explain?
#include<stdio.h>
#include<string.h>
int main(){
int a = 5,b = 10,c;
int *p = &a,*q = &b;
c = p - q;
printf("%d" , c);
return 0;
}
The program invokes undefined behavior. Pointer subtraction has to be done with pointers to elements of the same array.
From the C Standard:
(C99, 6.5.6p9) "When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object [...]"
You are returning the differences in memory location of two successive stack-allocated pointers using int pointer arithmetic as your unit of measurement.
Technically the program behaviour is undefined. Rather pointless therefore to say any more.
The variables were allocated one after the other in the stack. Say on a machine with 4-byte sized integers the addresses may be say 1000 and 1004; the difference is 4 bytes; pointer arithmetic says that the difference should return the value in elements (not bytes), so number of elements (ints) between the address is 1 (integer).
However, this is valid only within the same array or one element past it, which isn't the case in your example, and thus is undefined.
C++11 §5.7/6:
When two pointers to elements of the same array object are subtracted,
the result is the difference of the subscripts of the two array
elements ... Unless both pointers point to elements of the same array
object, or one past the last element of the array object, the behavior
is undefined.
EDIT: Thanks ouah for making me lookup the C++ standard on the matter.
I quote from "The C Programming Language" by Kernighan & Ritchie:
Any pointer can be meaningfully compared for equality or inequality with zero. But the behavior is undefined for arithmetic or comparisons with pointers that do not point to members of the same array. (There is one exception: the address of the first element past the end of an array can be used in pointer arithmetic.)
Does this mean I cannot rely on == for checking equality of different pointers? What are the situations in which this comparison leads to a wrong result?
One example that comes to my mind is Harvard architecture with separate address spaces for code and for data. In computers of that architecture the compiler can store constant data in the code memory. Since the two address spaces are separate, a pointer to an address in the code memory could be numerically equal to a pointer in the data memory, without pointing to the same address.
The equality operator is defined for all valid pointers, and the only time it can give a "false positive" is when one pointer points to one element past the end of an array, and the other happens to point (or points by virtue of a structure definition) to another object stored just past the array in memory.
I think your mistake is treating K&R as normative. See the C99 standard (nice html version here: http://port70.net/~nsz/c/c99/n1256.html), 6.5.9 on the equality operator. The issue about comparisons being undefined only applies to relational operators (see 6.5.8):
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object or incomplete types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.
I interpret this as following:
short a[9];
int b[12];
short * c = a + 9;
Here it is valid to say that
c > a
because c results from a via pointer arithmetic,
but not necessarily that
b == c
or
c <= b
or something alike, because they result from different arrays, whose order and alignment in memory is not defined.
You cannot use pointer comparison for comparing pointers that point into different arrays.
So:
int arr[5] = {1, 2, 3, 4, 5};
int * p = &arr[0];
int anotherarr[] = {1, 2};
int * pf = &anotherarr[0];
You cannot do if (p == pf) since p and pf do not point into the same array. This will lead to undefined behaviour.
You can rely on pointer comparison if they point within the same array.
Not sure about the arithmetic case myself.
You can do == and != with pointers from different arrays.
<, <=, >, >= is not defined.
Following an hot comment thread in another question, I came to debate of what is and what is not defined in C99 standard about C arrays.
Basically when I define a 2D array like int a[5][5], does the standard C99 garantee or not that it will be a contiguous block of ints, can I cast it to (int *)a and be sure I will have a valid 1D array of 25 ints.
As I understand the standard the above property is implicit in the sizeof definition and in pointer arithmetic, but others seems to disagree and says casting to (int*) the above structure give an undefined behavior (even if they agree that all existing implementations actually allocate contiguous values).
More specifically, if we think an implementation that would instrument arrays to check array boundaries for all dimensions and return some kind of error when accessing 1D array, or does not give correct access to elements above 1st row. Could such implementation be standard compilant ? And in this case what parts of the C99 standard are relevant.
We should begin with inspecting what int a[5][5] really is. The types involved are:
int
array[5] of ints
array[5] of arrays
There is no array[25] of ints involved.
It is correct that the sizeof semantics imply that the array as a whole is contiguous. The array[5] of ints must have 5*sizeof(int), and recursively applied, a[5][5] must have 5*5*sizeof(int). There is no room for additional padding.
Additionally, the array as a whole must be working when given to memset, memmove or memcpy with the sizeof. It must also be possible to iterate over the whole array with a (char *). So a valid iteration is:
int a[5][5], i, *pi;
char *pc;
pc = (char *)(&a[0][0]);
for (i = 0; i < 25; i++)
{
pi = (int *)pc;
DoSomething(pi);
pc += sizeof(int);
}
Doing the same with an (int *) would be undefined behaviour, because, as said, there is no array[25] of int involved. Using a union as in Christoph's answer should be valid, too. But there is another point complicating this further, the equality operator:
6.5.9.6
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. 91)
91) Two objects may be adjacent in memory because they are adjacent elements of a larger array or adjacent members of a structure with no padding between them, or because the implementation chose to place them so, even though they are unrelated. If prior invalid pointer operations (such as accesses outside array bounds) produced undefined behavior, subsequent comparisons also produce undefined behavior.
This means for this:
int a[5][5], *i1, *i2;
i1 = &a[0][0] + 5;
i2 = &a[1][0];
i1 compares as equal to i2. But when iterating over the array with an (int *), it is still undefined behaviour, because it is originally derived from the first subarray. It doesn't magically convert to a pointer into the second subarray.
Even when doing this
char *c = (char *)(&a[0][0]) + 5*sizeof(int);
int *i3 = (int *)c;
won't help. It compares equal to i1 and i2, but it isn't derived from any of the subarrays; it is a pointer to a single int or an array[1] of int at best.
I don't consider this a bug in the standard. It is the other way around: Allowing this would introduce a special case that violates either the type system for arrays or the rules for pointer arithmetic or both. It may be considered a missing definition, but not a bug.
So even if the memory layout for a[5][5] is identical to the layout of a[25], and the very same loop using a (char *) can be used to iterate over both, an implementation is allowed to blow up if one is used as the other. I don't know why it should or know any implementation that would, and maybe there is a single fact in the Standard not mentioned till now that makes it well defined behaviour. Until then, I would consider it to be undefined and stay on the safe side.
I've added some more comments to our original discussion.
sizeof semantics imply that int a[5][5] is contiguous, but visiting all 25 integers via incrementing a pointer like int *p = *a is undefined behaviour: pointer arithmetics is only defined as long as all pointers invoved lie within (or one element past the last element of) the same array, as eg &a[2][1] and &a[3][1] do not (see C99 section 6.5.6).
In principle, you can work around this by casting &a - which has type int (*)[5][5] - to int (*)[25]. This is legal according to 6.3.2.3 §7, as it doesn't violate any alignment requirements. The problem is that accessing the integers through this new pointer is illegal as it violates the aliasing rules in 6.5 §7. You can work around this by using a union for type punning (see footnote 82 in TC3):
int *p = ((union { int multi[5][5]; int flat[25]; } *)&a)->flat;
This is, as far as I can tell, standards compliant C99.
If the array is static, like your int a[5][5] array, it's guaranteed to be contiguous.