== for pointer comparison - c

I quote from "The C Programming Language" by Kernighan & Ritchie:
Any pointer can be meaningfully compared for equality or inequality with zero. But the behavior is undefined for arithmetic or comparisons with pointers that do not point to members of the same array. (There is one exception: the address of the first element past the end of an array can be used in pointer arithmetic.)
Does this mean I cannot rely on == for checking equality of different pointers? What are the situations in which this comparison leads to a wrong result?

One example that comes to my mind is Harvard architecture with separate address spaces for code and for data. In computers of that architecture the compiler can store constant data in the code memory. Since the two address spaces are separate, a pointer to an address in the code memory could be numerically equal to a pointer in the data memory, without pointing to the same address.

The equality operator is defined for all valid pointers, and the only time it can give a "false positive" is when one pointer points to one element past the end of an array, and the other happens to point (or points by virtue of a structure definition) to another object stored just past the array in memory.
I think your mistake is treating K&R as normative. See the C99 standard (nice html version here: http://port70.net/~nsz/c/c99/n1256.html), 6.5.9 on the equality operator. The issue about comparisons being undefined only applies to relational operators (see 6.5.8):
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object or incomplete types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.

I interpret this as following:
short a[9];
int b[12];
short * c = a + 9;
Here it is valid to say that
c > a
because c results from a via pointer arithmetic,
but not necessarily that
b == c
or
c <= b
or something alike, because they result from different arrays, whose order and alignment in memory is not defined.

You cannot use pointer comparison for comparing pointers that point into different arrays.
So:
int arr[5] = {1, 2, 3, 4, 5};
int * p = &arr[0];
int anotherarr[] = {1, 2};
int * pf = &anotherarr[0];
You cannot do if (p == pf) since p and pf do not point into the same array. This will lead to undefined behaviour.
You can rely on pointer comparison if they point within the same array.
Not sure about the arithmetic case myself.

You can do == and != with pointers from different arrays.
<, <=, >, >= is not defined.

Related

What are the restrictions in comparing two pointers?

int a=40,b=34;
int *iptr1,*iptr2;
iptr1 = &a;
iptr2 = &b;
printf("\n Equal condition of two pointers=%d", (ip1 == ip2)); //no error
char name1[20], name2[20];
char *p1 = name1;
char *p2 = name2;
if(p1 > p2) /*Error*/
Why there is an error/warning for the relation operation but none for the comparison operation?
You can only perform relational operations ( <, >, <=, >= ) on pointers from the same array or same aggregate object. Otherwise, it causes undefined behavior.
Quoting C11, chapter §6.5.8, Relational operators, paragraph 5
When two pointers are compared, the result depends on the relative locations in the
address space of the objects pointed to. If two pointers to object types both point to the
same object, or both point one past the last element of the same array object, they
compare equal. If the objects pointed to are members of the same aggregate object,
pointers to structure members declared later compare greater than pointers to members
declared earlier in the structure, and pointers to array elements with larger subscript
values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the
expression P points to an element of an array object and the expression Q points to the
last element of the same array object, the pointer expression Q+1 compares greater than
P. In all other cases, the behavior is undefined.
In your code,
if(p1>p2)
is an attempt to compare two pointers which are not part of the same array object, neither members of same aggregate object. So, it triggers the warning.
However, for comparison, no such constraint is held, so an expression like (ip1==ip2) is perfectly OK.
You may compare pointers with the equality operators (== or !=) that point to different objects or elements of different arrays. If the pointers do not point to the same object then they are considered as unequal.
More precisely (the C Standard, 6.5.9 Equality operators)
6 Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an object
and a subobject at its beginning) or function, both are pointers to
one past the last element of the same array object, or one is a
pointer to one past the end of one array object and the other is a
pointer to the start of a different array object that happens to
immediately follow the first array object in the address space.
Consider the following example.
#include <stdio.h>
int main(void)
{
struct A
{
int x;
int y;
} a;
printf( "&a.x + 1 == &a.y is %d\n", &a.x + 1 == &a.y );
return 0;
}
If there is no padding between the data members x and y of the structure then the output will be equal to 1 because each data member can be considered as an array with one element and the "array" y immediately follows the "array" x.
However you may not compare pointers with the relational operators (<, >, <=, >=) that do not point to elements of the same array or past the end of the same array.

Is it UB to access an element one past the end of a row of a 2d array?

Is the behavior of the following program undefined?
#include <stdio.h>
int main(void)
{
int arr[2][3] = { { 1, 2, 3 },
{ 4, 5, 6 }
};
int *ptr1 = &arr[0][0]; // pointer to first elem of { 1, 2, 3 }
int *ptr3 = ptr1 + 2; // pointer to last elem of { 1, 2, 3 }
int *ptr3_plus_1 = ptr3 + 1; // pointer to one past last elem of { 1, 2, 3 }
int *ptr4 = &arr[1][0]; // pointer to first elem of { 4, 5, 6 }
// int *ptr_3_plus_2 = ptr3 + 2; // this is not legal
/* It is legal to compare ptr3_plus_1 and ptr4 */
if (ptr3_plus_1 == ptr4) {
puts("ptr3_plus_1 == ptr4");
/* ptr3_plus_1 is a valid address, but is it legal to dereference it? */
printf("*ptr3_plus_1 = %d\n", *ptr3_plus_1);
} else {
puts("ptr3_plus_1 != ptr4");
}
return 0;
}
According to §6.5.6 ¶8:
Moreover, if the expression P points to the last element of an
array object, the expression (P)+1 points one past the last
element of the array object.... If both the pointer operand and the
result point to elements of the same array object, or one past the
last element of the array object, the evaluation shall not produce an
overflow; otherwise, the behavior is undefined. If the result points
one past the last element of the array object, it shall not be used as
the operand of a unary * operator that is evaluated.
From this, it would appear that the behavior of the above program is undefined; ptr3_plus_1 points to an address one past the end of the array object from which it is derived, and dereferencing this address causes undefined behavior.
Further, Annex J.2 suggests that this is undefined behavior:
An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).
There is some discussion of this issue in the Stack Overflow question, One-dimensional access to a multidimensional array: well-defined C?. The consensus here appears to be that this kind of access to arbitrary elements of a two-dimensional array through one-dimensional subscripts is indeed undefined behavior.
The issue, as I see it, is that it is not even legal to form the address of the pointer ptr3_plus_2, so it is not legal to access arbitrary two-dimensional array elements in this way. But, it is legal to form the address of the pointer ptr3_plus_1 using this pointer arithmetic. Further, it is legal to compare the two pointers ptr3_plus_1 and ptr4, according to §6.5.9 ¶6:
Two pointers compare equal if and only if both are null pointers, both
are pointers to the same object (including a pointer to an object and
a subobject at its beginning) or function, both are pointers to one
past the last element of the same array object, or one is a pointer
to one past the end of one array object and the other is a pointer to
the start of a different array object that happens to immediately
follow the first array object in the address space.
So, if it both ptr3_plus_1 and ptr4 are valid pointers that compare equal and that must point to the same address (the object pointed to by ptr4 must be adjacent in memory to the object pointed to by ptr3 anyway, since array storage must be contiguous), it would seem that *ptr3_plus_1 is as valid as *ptr4.
Is this undefined behavior, as described in §6.5.6 ¶8 and Annex J.2, or is this an exceptional case?
To Clarify
It seems unambiguous that it is undefined behavior to attempt to access the element one past the end of the final row of a two-dimensional array. My interest is in the question of whether it is legal to access the first element of the intermediate rows by forming a new pointer using a pointer to an element from the previous row and pointer arithmetic. It seems to me that a different example in Annex J.2 could have made this more clear.
Is it possible to reconcile the clear statement in §6.5.6 ¶8 that an attempted dereference of a pointer to the location one past the end of an array leads to undefined behavior with the idea that the pointer past the end of the first row of a two-dimensional array of type T[][] is also a pointer of type T * that points to an object of type T, namely the first element of an array of type T[]?
So, if it both ptr3_plus_1 and ptr4 are valid pointers that compare equal and that must point to the same address
They are.
it would seem that *ptr3_plus_1 is as valid as *ptr4.
It is not.
The pointers are equal, but not equivalent. The trivial well-known example of the distinction between equality and equivalence is negative zero:
double a = 0.0, b = -0.0;
assert (a == b);
assert (1/a != 1/b);
Now, to be fair, there is a difference between the two, as positive and negative zero have a different representation, ptr3_plus_1 and ptr4 on typical implementations have the same representation. This is not guaranteed, and on implementations where they would have different representations, it should be clear that your code might fail.
Even on the typical implementations, while there are good arguments to be made that the same representation implies equivalent values, to the best of my knowledge, the official interpretation is that the standard does not guarantee this, therefore programs cannot rely on it, therefore implementations can assume programs do not do this and optimise accordingly.
A debugging implementation might use "fat" pointers. For example, a pointer may be represented as a tuple (address, base, size) to detect out-of-bounds access. There is absolutely nothing wrong or contrary to the standard about such representation. So any pointer arithmetic that brings the pointer outside the range of [base, base+size] fails, and any dereference outside of [base, base+size) also fails.
Note that base and size are not the address and the size of the 2D array but rather of the array that the pointer points into (the row in this case).
It might sound trivial in this case, but when deciding whether a certain pointer construction is UB or not, it is useful to mentally run your example through this hypothetical implementation.

Can I use inequalities with void * in C? [duplicate]

So I have a function that returns a pointer to an element in an array A. I have another function that takes that pointer as a parameter. However, I need that function to be able to deal with the possibility that it may be passed a completely arbitrary pointer.
Is there a way to be able to detect if a pointer points somewhere within a structure? In this case, my array A?
I've seen similar questions regarding C++, but not with C.
The only portable way is to use an equality test against all of the possible valid values for the pointer. For example:
int A[10];
bool points_to_A(int *ptr)
{
for (int i = 0; i < 10; ++i)
if ( ptr == &A[i] )
return true;
return false;
}
Note that using a relational operator (e.g. <), or subtraction, with two pointers is undefined behaviour unless the two pointers actually do point to elements of the same array (or one past the end).
In section §6.5.8 Relational operators, the C11 standard (ISO/IEC 9899:2011) says:
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than
P. In all other cases, the behavior is undefined.
If you know that the pointer is within range of the array, then comparisons work. If the pointer is outside the range, you can't be sure that the comparisons will work. In practice, it usually does, but the standard explicitly says that the comparison yields undefined behaviour.
Note that for an array SomeType array[20];, the address &array[20] is guaranteed to be valid and to compare reliably with any address from &array[0] through &array[19]. You need to decide whether you want to count that as being in your array.
Subject to the observation that the standard does not guarantee that it will work, then, you can compare two int pointers:
int within_int_array(int *array, size_t num_ints, int *ptr)
{
return ptr >= array && ptr < array + num_ints;
}
These days, you should be increasingly cautious about invoking undefined behaviour. Compilers do nasty things to programs that use undefined behaviour, and technically, you have no recourse since the standard says "undefined behaviour".

Check if pointer points to given array

So I have a function that returns a pointer to an element in an array A. I have another function that takes that pointer as a parameter. However, I need that function to be able to deal with the possibility that it may be passed a completely arbitrary pointer.
Is there a way to be able to detect if a pointer points somewhere within a structure? In this case, my array A?
I've seen similar questions regarding C++, but not with C.
The only portable way is to use an equality test against all of the possible valid values for the pointer. For example:
int A[10];
bool points_to_A(int *ptr)
{
for (int i = 0; i < 10; ++i)
if ( ptr == &A[i] )
return true;
return false;
}
Note that using a relational operator (e.g. <), or subtraction, with two pointers is undefined behaviour unless the two pointers actually do point to elements of the same array (or one past the end).
In section §6.5.8 Relational operators, the C11 standard (ISO/IEC 9899:2011) says:
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than
P. In all other cases, the behavior is undefined.
If you know that the pointer is within range of the array, then comparisons work. If the pointer is outside the range, you can't be sure that the comparisons will work. In practice, it usually does, but the standard explicitly says that the comparison yields undefined behaviour.
Note that for an array SomeType array[20];, the address &array[20] is guaranteed to be valid and to compare reliably with any address from &array[0] through &array[19]. You need to decide whether you want to count that as being in your array.
Subject to the observation that the standard does not guarantee that it will work, then, you can compare two int pointers:
int within_int_array(int *array, size_t num_ints, int *ptr)
{
return ptr >= array && ptr < array + num_ints;
}
These days, you should be increasingly cautious about invoking undefined behaviour. Compilers do nasty things to programs that use undefined behaviour, and technically, you have no recourse since the standard says "undefined behaviour".

Is subtraction of pointers not pointing to different elements of same array valid in C?

Is subtraction of pointers not pointing to different elements of same array valid in C?
Is something such as below guaranteed to work according to C Standards? I vaguely remember reading that this is not valid?
int * a;
int * b;
a = (int*) 100;
b = (int*) 200;
printf("%d\n", b-a);
Will this give me 25.
From the C spec, Appendix J.2 Undefined behaviour:
Pointers that do not point into, or just beyond, the same array object are subtracted (6.5.6).
6.5.6 Additive operators, paragraph 9 says:
When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements.
There you have it - your example causes undefined behaviour. That said, on most systems it will work just fine. You probably do want to change your printf format to %td to indicate that you're printing a ptrdiff_t type.
This is undefined behavior.
For one, those pointers don't point to memory that you own.
You can only substract pointers that point inside the same array (or one position after the end of the array).
Of course, it will most likely work on most compilers, and you get 25 because sizeof(int) == 4 on your platform. If they were char *, you'd get 100. (possibly, or it could crash, that's the beauty of UB).
Even the standard does not promise a defined behavior, the result is correct.
An integer needs in your architecture 4 bytes. Thus, the difference give you the number of integer values the both pointer are apart.
You can use the difference as an index or offset in an array.
For the same reason
int *p = (int*) 100;
p++;
will result in p=104.
Of course it's undefined. Subtracting two arbitrary pointers (viewed as integers) is not even guaranteed to be a multiple of your object's size.

Resources