Looks like GCC with some optimization thinks two pointers from different translation units can never be same even if they are actually the same.
Code:
main.c
#include <stdint.h>
#include <stdio.h>
int a __attribute__((section("test")));
extern int b;
void check(int cond) { puts(cond ? "TRUE" : "FALSE"); }
int main() {
int * p = &a + 1;
check(
(p == &b)
==
((uintptr_t)p == (uintptr_t)&b)
);
check(p == &b);
check((uintptr_t)p == (uintptr_t)&b);
return 0;
}
b.c
int b __attribute__((section("test")));
If I compile it with -O0, it prints
TRUE
TRUE
TRUE
But with -O1
FALSE
FALSE
TRUE
So p and &b are actually the same value, but the compiler optimized out their comparison assuming they can never be equal.
I can't figure out, which optimization made this.
It doesn't look like strict aliasing, because pointers are of one type, and -fstrict-aliasing option doesn't make this effect.
Is this the documented behavour? Or is this a bug?
There are three aspects in your code which result in general problems:
Conversion of a pointer to an integer is implementation defined. There is no guarantee conversion of two pointers to have all bits identical.
uintptr_t is guaranteed to convert from a pointer to the same type then back unchanged (i.e. compare equal to the original pointer). But nothing more. The integer values themselves are not guaranteed to compare equal. E.g. there could be unused bits with arbitrary value. See the standard, 7.20.1.4.
And (briefly) two pointers can only compare equal if they point into the same array or right behind it (last entry plus one) or at least one is a null pointer. For any other constellation, they compare unequal. For the exact details, see the standard, 6.5.9p6.
Finally, there is no guarantee how variables are placed in memory by the toolchain (typically the linker for static variables, the compiler for automatic variables). Only an array or a struct (i.e. composite types) guarantee the ordering of its elements.
For your example, 6.5.9p7 also applies. It basically treats a pointer to a non-array object for comparision like on to the first entry of an array of size 1. This does not cover an incremented pointer past the object like &a + 1. Relevant is the object the pointer is based on. That is object a for pointer p and b for pointer &b. The rest can be found in paragraph 6.
None of your variables is an array (last part of paragraph 6), so the pointers need not compare equal, even for &a + 1 == &b. The last "TRUE" might arise from gcc assuming the uintptr_t comparison returning true.
gcc is known to agressively optimise while strictly following the standard. Other compilers are more conservative, but that results in less optimised code. Please don't try "solving" this by disabling optimisation or other hacks, but fix it using well-defined behaviour. It is a bug in the code.
p == &b is a pointer comparison and is subject to the following rules from the C Standard (6.5.9 Equality operators, point 4):
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
(uintptr_t)p == (uintptr_t)&b is an arithmetic comparison and is subject to the following rules (6.5.9 Equality operators, point 6):
If both of the operands have arithmetic type, the usual arithmetic conversions are performed. Values of complex types are equal if and only if both their real parts are equal and also their imaginary parts are equal. Any two values of arithmetic types from different type domains are equal if and only if the results of their conversions to the (complex) result type determined by the usual arithmetic conversions are equal.
These two excerpts require very different things from the implementation. And it is clear that the C specification places no requirement on an implementation to mimic the behavior of the former kind of comparison in cases where the latter kind is invoked and vice versa. The implementation is only required to follow this rule (7.18.1.4 Integer types capable of holding object pointers in C99 or 7.20.1.4 in C11):
The [uintptr_t] type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer.
(Addendum: The above quote isn't applicable in this case, because the conversion from int* to uintptr_t does not involve void* as an intermediate step. See Hadi's answer for an explanation and citation on this. Still, the conversion in question is implementation-defined and the two comparisons you are attempting are not required to exhibit the same behavior, which is the main takeaway here.)
As an example of the difference, consider two pointers that point at the same address of two different address spaces. Comparing them as pointers shouldn't return true, but comparing them as unsigned integers might.
&a + 1 is an integer added to a pointer, which is subject to the following rules (6.5.6 Additive operators, point 8):
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. Moreover, if the expression P points to the last
element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object,
the expression (Q)-1 points to the last element of the array object. If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
I believe that this excerpt shows that pointer addition (and subtraction) is defined only for pointers within the same array object or one past the last element. And because (1) a is not an array and (2) a and b aren't members of the same array object, it seems to me that your pointer math operation invokes undefined behavior and your compiler takes advantage of it to assume that the pointer comparison returns false. Again as pointed out in Hadi's answer (and in contrast to what my original answer assumed at this point), pointers to non-array objects can be considered pointers to array objects of length one, and thus adding one to your pointer to the scalar does qualify as pointing to one past the end of the array.
Therefore your case seems to fall under the last part of the first excerpt mentioned in this answer, making your comparison well-defined to evaluate to true if and only if the two variables are linked in sequence and in ascending order. Whether this is true for your program is left unspecified by the standard and it's up to the implementation.
While one of the answers has already been accepted, the accepted answer (and all other answers for that matter) are critically wrong as I'll explain and then answer the question. I'll be quoting from the same C standard, namely n1570.
Let's start with &a + 1. In contrast to what #Theodoros and #Peter has stated, this expression has defined behavior. To see this, consider section 6.5.6 paragraph 7 "Additive operators" which states:
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
and paragraph 8 (in particular, the emphasized part):
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i−n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.
The expression (uintptr_t)p == (uintptr_t)&b has two parts. The conversion from a pointer to an uintptr_t is NOT defined by section 7.20.1.4 (in contrast to what #Olaf and #Theodoros have said):
The following type designates an unsigned integer type with the
property that any valid pointer to void can be converted to this type,
then converted back to pointer to void, and the result will compare
equal to the original pointer:
uintptr_t
It's important to recognize that this rule applies only to valid pointers to void. However, in this case, we have a valid pointer to int. A relevant paragraph can be found in section 6.3.2.3 paragraph 1:
A pointer to void may be converted to or from a pointer to any object
type. A pointer to any object type may be converted to a pointer to
void and back again; the result shall compare equal to the original
pointer.
This means that (uintptr_t)(void*)p is allowed according to this paragraph and 7.20.1.4. But (uintptr_t)p and (uintptr_t)&b are ruled by section 6.3.2.3 paragraph 6:
Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any
integer type.
Note that uintptr_t is an integer type as stated in section 7.20.1.4 mentioned above and therefore this rule applies.
The second part of (uintptr_t)p == (uintptr_t)&b is comparing for equality. As previously discussed, since the result of conversion is implementation-defined, the result of equality is also implementation defined. This applies irrespective of whether the pointers themselves are equal or not.
Now I'll discuss p == &b. The third point in #Olaf's answer is wrong and #Theodoros's answer is incomplete regarding this expression. Section 6.5.9 "Equality operators" paragraph 7:
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
and paragraph 6:
Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an object
and a subobject at its beginning) or function, both are pointers to
one past the last element of the same array object, or one is a
pointer to one past the end of one array object and the other is a
pointer to the start of a different array object that happens to
immediately follow the first array object in the address space.)
In contrast what #Olaf have said, comparing pointers using the == operator never results in undefined behavior (which may occur only when using relational operators such as <= according to section 6.5.8 paragraph 5 which I'll omit here for brevity). Now since p points to the next int relative to a, it will be equal to &b only when the linker has placed b in that location in the binary. Otherwise, there are unequal. So this is implementation-dependent (the relative order of a and b is unspecified by the standard). Since the declarations of a and b use a language extension, namely __attribute__((section("test"))), the relative locations is indeed implementation-dependent by J.5 and 3.4.2 (omitted for brevity).
We conclude that the results of check(p == &b) and check((uintptr_t)p == (uintptr_t)&b) are implementation-dependent. So the answer depends on which version of which compiler you are using. I'm using gcc 4.8 and by compiling with default options except for the level of optimization, the output I get in both -O0 and -O1 cases is all TRUE.
According to C11 6.5.9/6 and C11 6.5.9/7, the test p == &b must give 1 if a and b are adjacent in the address space.
Your example shows that GCC appears to not fulfill this requirement of the Standard.
Update 26/Apr/2016: My original answer contained suggestions about modifying the code to remove other potential sources of UB and isolate this one condition.
However, it's since come to light that the issues raised by this thread are under review - N2012.
One of their recommendations is that p == &b should be unspecified, and they acknowledge that GCC does in fact not implement the ISO C11 requirement.
So I have the remaining text from my answer, as it is no longer necessary to prove a "compiler bug", since the non-conformance (whether you want to call it a bug or not) has been established.
Re-reading your program I see that you are (understandably) baffled by the fact that in the optimized version
p == &b
is false, while
(uintptr_t)p == (uintptr_t)&b;
is true. The last line indicates that the numerical values are indeed identical; how can p == &b then be false??
I must admit that I have no idea. I am convinced that it is a gcc bug.
After a discussion with M.M I think I can make the following case if the conversion to uintptr_t goes through an intermediate void pointer (you should include that in your program and see whether it changes anything):
Because both steps in the conversion chain int* -> void* -> uintptr_t are guaranteed to be reversible, unequal int pointers can logically not result in equal uintptr_t values.1 (Those equal uintptr_t values would have to convert back to equal int pointers, altering at least one of them and thus violating the value-preserving conversion rule.) In code (I'm not aiming for equality here, just demonstrating the conversions and comparisons):
int a,b, *ap=&a, *bp = &b;
assert(ap != bp);
void *avp = ap, *bvp bp;
uintptr_t ua = (uintptr_t)avp, ub = (uintptr_t)bvp;
// Now the following holds:
// if ap != bp then *necessarily* ua != ub.
// This is violated by the OP's case (sans the void* step).
assert((int *)(void *)ua == (int*)(void*)ub);
1This assumes that the uintptr_t doesn't carry hidden information in the form of padding bits which are not evaluated in an arithmetic comparison but possibly in a type conversion. One can check that through CHAR_BIT, UINTPTR_MAX, sizeof(uintptr_t) and some bit fiddling.—
For a similar reason it's conceivable that two uintptr_t values compare different but convert back to the same pointer (namely if there are bits in uintptr_t not used for storing a pointer value, and the conversion does not zero them). But that is the opposite of the OP's problem.
Related
I noticed this warning from Clang:
warning: performing pointer arithmetic on a null pointer
has undefined behavior [-Wnull-pointer-arithmetic]
In details, it is this code which triggers this warning:
int *start = ((int*)0);
int *end = ((int*)0) + count;
The constant literal zero converted to any pointer type decays into the null pointer constant, which does not point to any contiguous area of memory but still has the type pointer to type needed to do pointer arithmetic.
Why would arithmetic on a null pointer be forbidden when doing the same on a non-null pointer obtained from an integer different than zero does not trigger any warning?
And more importantly, does the C standard explicitly forbid null pointer arithmetic?
Also, this code will not trigger the warning, but this is because the pointer is not evaluated at compile time:
int *start = ((int*)0);
int *end = start + count;
But a good way of avoiding the undefined behavior is to explicitly cast an integer value to the pointer:
int *end = (int *)(sizeof(int) * count);
The C standard does not allow it.
6.5.6 Additive operators (emphasis mine)
8 When an expression that has integer type is added to or
subtracted from a pointer, the result has the type of the pointer
operand. If the pointer operand points to an element of an array
object, and the array is large enough, the result points to an element
offset from the original element such that the difference of the
subscripts of the resulting and original array elements equals the
integer expression. In other words, if the expression P points to the
i-th element of an array object, the expressions (P)+N (equivalently,
N+(P)) and (P)-N (where N has the value n) point to, respectively, the
i+n-th and i-n-th elements of the array object, provided they exist.
Moreover, if the expression P points to the last element of an array
object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element
of an array object, the expression (Q)-1 points to the last element of
the array object. If both the pointer operand and the result point to
elements of the same array object, or one past the last element of the
array object, the evaluation shall not produce an overflow; otherwise,
the behavior is undefined. If the result points one past the last
element of the array object, it shall not be used as the operand of a
unary * operator that is evaluated.
For the purposes of the above, a pointer to a single object is considered as pointing into an array of 1 element.
Now, ((uint8_t*)0) does not point at an element of an array object. Simply because a pointer holding a null pointer value does not point at any object. Which is said at:
6.3.2.3 Pointers
3 If a null pointer constant is converted to a pointer type, the
resulting pointer, called a null pointer, is guaranteed to compare
unequal to a pointer to any object or function.
So you can't do arithmetic on it. The warning is justified, because as the second highlighted sentence mentions, we are in the case of undefined behavior.
Don't be fooled by the fact the offsetof macro is possibly implemented like that. The standard library is not bound by the constraints placed on user programs. It can employ deeper knowledge. But doing this in our code is not well defined.
When the C Standard was written, the vast majority of C implementations would, for any non-void* pointer value p, uphold the invariants that p+0 and p-0 both yield p, and p-p will yield zero. More generally, operations like a size-zero memcpy or fwrite that operate on a buffer of size N would ignore the buffer address when N was zero. Such behavior would allow programmers to avoid having to write code to handle corner cases. For example, code to output a packet with an optional payload passed via address and length arguments would naturally process (NULL,0) as an empty payload.
Nothing in the published Rationale for the C Standard suggests that implementations whose target platforms would naturally behave in such fashion shouldn't continue to work as they always had. There were, however, a few platforms where it may have been expensive to uphold such behavioral guarantees in cases where p is null.
As with most situations where the vast majority of C implementations would process a construct identically, but implementations might exist where such treatment would be impractical, the Standard characterizes the addition of zero to a null pointer as Undefined Behavior. The Standard allows implementations to, as a form of "conforming language extension", define the behavior of constructs in cases where it imposes no requirements, and it allow conforming (but not strictly conforming) programs to make use of them. According to the published Rationale, the stated intention was that support for such "popular extensions" be regarded as a "quality of implementation" issue to be decided by the marketplace. Implementations that could support them at essentially zero cost would do so, but implementations where such support would be expensive would be free to support such constructs or not based upon their customers' needs.
If one is using a compiler that targets commonplace platforms, and is designed to process the widest range of useful programs reasonably efficiently, then the extended semantics surrounding pointer arithmetic may allow one to write code more efficiently than would otherwise be possible. If one is targeting a compiler that does not value compatibility with quality compilers, however, one should recognize that it may treat the Standard's allowance for quirky hardware as an invitation to behave nonsensically even on commonplace hardware. Of course, one should also be aware that such compilers may behave nonsensically in corner cases where adherence with the Standard would require them to forego optimizations that are unsound but would "usually" be safe.
Does the C standard require pointers to be (integer) numbers?
One may argue that yes, because of pointer arithmetic...
But on the other hand operations like -- or ++ may be understood as previous memory location, next memory location, depending on how they are described in the standard, and actual implementation may use any representation to hold pointer data (as long as mentioned operations are implemented)...
Another question comes to mind - does C require arrays/buffers etc. to be contiguous, i.e. next element is stored in next memory location (++p where p is a pointer)? I ask because you can often see implementations online that seem to assume that it does.
No, pointers need not be plain numbers.
If you read the standard, there are provisions for that:
Two pointers to unrelated objects (meaning not part of a bigger object, remember structs and arrays) may not be compared, except for equality.
6.5.8 Relational operators
[...]
5 When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object or incomplete types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript
values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.
Two pointers to unrelated objects may not be subtracted.
6.5.6 Additive operators
[...]
9 When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the twoarray elements. The size of the result is implementation-defined, and its type (a signed integer type) is ptrdiff_t defined in the <stddef.h> header. If the result is not representable in an object of that type, the behavior is undefined. In other words, if the expressions P and Q point to, respectively,the i-th and j-th elements of an array object, the expression (P)-(Q) has the value i−j provided the value fits in an object of type ptrdiff_t. Moreover, if the expression P points either to an element of an array object or one past the last element of an array object, and the expression Q points to the last element of the same array object, the expression ((Q)+1)-(P) has the same
value as ((Q)-(P))+1 and as -((P)-((Q)+1)), and has the value zero if the expression P points one past the last element of the array object, even though the expression (Q)+1 does not point to an element of the array object.91)
There may not be a way to represent a pointer as a number, as no suitable type might exist. Thus, trying to convert might result in Undefined Behavior.
Any specific implementation defining a behavior does not mean it isn't UB according to the standard.
6.3.2.3 Pointers
[...]
6 Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of anyinteger type.
7.18.1.4 Integer types capable of holding object pointers
1 The following type designates a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
intptr_t
The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
uintptr_t
These types are optional.
That's just off the top of my head, I'm sure there's more.
All quotes from n1256 (C99 draft).
Arrays have always been required to be contiguous.
To answer to your second question in arrays elements are in contiguous Memory locations. Thats why you use pointer arithmetic to move between elements.
Consider below code snippet :
int *p;
/* Lets say p points to address 100
and sizeof(int) is 4 bytes. */
int *q = p+1;
unsigned long r = q-p;
/* r results in 1, hence for r = q-p
something is happening similar to r=(104-100)/4 */
Is there a real division by sizeof(datatype) going on during runtime when two pointers of same type are subtracted, or there is some other mechanism through which pointer subtraction works.
The C standard states the following regarding pointer subtraction (section 6.5.6p9):
When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array
object; the result is the difference of the subscripts of the
two array elements. The size of the result is
implementation-defined, and its type (a signed integer type) is
ptrdiff_t defined in the header. If the result is not
representable in an object of that type, the behavior is
undefined. In other words, if the expressions P and Q point to,
respectively, the i
-th and j
-th elements of an array object, the expression (P)-(Q) has the value i−j provided the value fits in an object of type ptrdiff_t . Moreover,
if the expression P points either to an element of an array object or
one past the last element of an array object, and the expression Q
points to the last element of the same array object, the expression
((Q)+1)-(P) has the same value as ((Q)-(P))+1 and as
-((P)-((Q)+1)) , and has the value zero if the expression P points one past the last element of the array object, even
though the expression (Q)+1 does not point to an element of the array
object. 106)
Footnote 106 states:
Another way to approach pointer arithmetic is first to convert the
pointer(s) to character pointer(s): In this scheme the integer
expression added to or subtracted from the converted pointer is first
multiplied by the size of the object originally pointed to,
and the resulting pointer is converted back to the original
type. For pointer subtraction, the result of the difference
between the character pointers is similarly divided by the size of
the object originally pointed to. When viewed in this way, an
implementation need only provide one extra byte (which may
overlap another object in the program) just after the end of the
object in order to satisfy the "one past the last element"
requirements.
So the footnote states that pointer subtraction may be implemented by subtracting the raw pointer values and dividing by the size of the pointed-to object. It doesn't have to be implemented this way, however.
Note also that the standard requires that pointer subtraction is performed between pointers pointing to elements of the same array object (or one element past the end). If they don't then the behavior is undefined. In practice, if you're working on a system with a flat memory model you'll probably still get the "expected" values but you can't depend on that.
See #dbush answer for the explanation on how pointer substraction works.
If, instead, you are programming something low-level, say a kernel, driver, debugger or similar and you need to have actual subtraction of addresses, cast the pointers to char *:
(char *)q - (char *)p
The result will be of ptrdiff_t type, an implementation defined signed integer.
Of course, this is not defined/portable C, but will work on most architectures/environments.
Inspired by this answering this question, I dug a little into the C11 and C99 standards for the use of equality operators on pointers (the original question concerns relational operators). Here's what C11 has to say (C99 is similar) at §6.5.9.6:
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.94)
Footnote 94 says (and note that footnotes are non-normative):
Two objects may be adjacent in memory because they are adjacent elements of a larger array or adjacent members of a structure with no padding between them, or because the implementation chose to place them so, even though they are unrelated. If prior invalid pointer operations (such as accesses outside array bounds) produced undefined behavior, subsequent comparisons also produce undefined behavior.
The body of the text and the non-normative note appear to be in conflict. If one takes the 'if and only if' from the body of the text seriously, then in no other circumstances than those set out should equality be returned, and there is no room for UB. So, for instance this code:
uintptr_t a = 1;
uintptr_t b = 1;
void *ap = (void *)a;
void *bp = (void *)b;
printf ("%d\n", ap <= bp); /* UB by §6.5.8.5 */
printf ("%d\n", ap < bp); /* UB by §6.5.8.5 */
printf ("%d\n", ap == bp); /* false by §6.5.9.6 ?? */
should print zero, as ap and bp are neither pointers to the same object or function, or any of the other bits set out.
In §6.5.8.5 (relational operators) the behaviour is more clear (my emphasis):
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object or incomplete types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.
Questions:
I am correct that there is some ambiguity as to when equality operators with pointers are permitted UB (comparing the footnote and the body of the text)?
If there is no ambiguity, when precisely can comparison of pointers with equality operators be UB? For instance, is it always UB if at least one pointer is artificially created (per above)? What if one pointer refers to memory that has been free()d? Given the footnote is non-normative, can one conclude there is never UB, in the sense that all 'other' comparisons must yield false?
Does §6.5.9.6 really mean that equality comparison of meaningless but bitwise equal pointers should always be false?
Note this question is tagged language-lawyer; I am not asking what in practice compilers do, as I believe already know the answer to that (compare them using the same technique as comparing integers).
Am I correct that there is some ambiguity as to when equality operators with pointers are UB?
No, because this passage from §6.5.9(3):
The == and != operators are analogous to the relational operators except for their lower precedence.
Implies that the following from §6.5.9(6) also applies to the equality operators:
When two pointers are compared [...] In all other cases, the behavior is undefined.
If there is no ambiguity, when precisely can comparison of pointers with equality operators be UB?
There is undefined behaviour in all cases for which the standard does not explicitly define the behaviour.
Is it always UB if at least one pointer is artificially created converted from an arbitrary integer?
§6.3.2.3(5):
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
What if one pointer refers to memory that has been freed?
§6.2.4(2):
The value of a pointer becomes indeterminate when the object it points to reaches the end of its lifetime.
can one conclude there is never UB, in the sense that all 'other' comparisons must yield false?
No. The standard defines under what conditions two pointers must compare equal, and under what conditions two pointers must compare not equal. Any equality comparisons between two pointers that falls outside both of those two sets of conditions invokes undefined behaviour.
Does §6.5.9(6) really mean that equality comparison of meaningless but bitwise equal pointers should always be false?
No, it is undefined.
There are tons of code like this one:
#include <stdio.h>
int main(void)
{
int a[2][2] = {{0, 1}, {2, -1}};
int *p = &a[0][0];
while (*p != -1) {
printf("%d\n", *p);
p++;
}
return 0;
}
But based on this answer, the behavior is undefined.
N1570. 6.5.6 p8:
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i−n-th elements of the array object, provided they exist. Moreover,
if the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary
* operator that is evaluated.
Can someone explain this in detail?
The array who's base address (pointer to first element) p is assigned is of type int[2]. This means the address in p can legally be dereferenced only at locations *p and *(p+1), or if you prefer subscript notation, p[0] and p[1]. Furthermore, p+2 is guaranteed to be a legally evaluated as an address, and comparable to other addresses in that sequence, but can not be dereferenced. This is the one-past address.
The code you posted violates the one-past rule by dereferencing p once it passes the last element in the array in which it is homed. That the array in which it is homed is buttressed up against another array of similar dimension is not relevant to the formal definition cited.
That said, in practice it works, but as is often said. observed behavior is not, and should never be considered, defined behavior. Just because it works doesn't make it right.
The object representation of pointers is opaque, in C. There is no prohibition against pointers having bounds information encoded. That's one possibility to keep in mind.
More practically, implementations are also able to achieve certain optimizations based on assumptions which are asserted by rules like these: Aliasing.
Then there's the protection of programmers from accidents.
Consider the following code, inside a function body:
struct {
char c;
int i;
} foo;
char * cp1 = (char *) &foo;
char * cp2 = &foo.c;
Given this, cp1 and cp2 will compare as equal, but their bounds are nonetheless different. cp1 can point to any byte of foo and even to "one past" foo, but cp2 can only point to "one past" foo.c, at most, if we wish to maintain defined behaviour.
In this example, there might be padding between the foo.c and foo.i members. While the first byte of that padding co-incides with "one past" the foo.c member, cp2 + 2 might point into the other padding. The implementation can notice this during translation and instead of producing a program, it can advise you that you might be doing something you didn't think you were doing.
By contrast, if you read the initializer for the cp1 pointer, it intuitively suggests that it can access any byte of the foo structure, including padding.
In summary, this can produce undefined behaviour during translation (a warning or error) or during program execution (by encoding bounds information); there's no difference, standard-wise: The behaviour is undefined.
You can cast your pointer into a pointer to a pointer to array to ensure the correct array semantics.
This code is indeed not defined but provided as a C extension in every compiler in common usage today.
However the correct way of doing it would be to cast the pointer into a pointer to array as so:
((int (*)[2])p)[0][0]
to get the zeroth element or say:
((int (*)[2])p)[1][1]
to get the last.
To be strict, he reason I think this is illegal is that you are breaking strict aliasing, pointers to different types may not point to the same address (variable).
In this case you are creating a pointer to an array of ints and a pointer to an int and pointing them to the same value, this is not allowed by the standard as the only type that may alias another pointer is a char * and even this is rarely used properly.