I have implemented an AVL tree in C. Only later did I read that pointer comparison is only valid between objects in the same array. In my implementation, I do certain equality tests. For example, to test whether a node is a right child of a parent I might test node==node->parent->right. However, the nodes are allocated as needed, not in a contiguous chunk. Is this behavior defined? How would you write this code instead if it is not?
For equality and inequality, in the standard (ISO/IEC 9899:2011) §6.5.9 Equality Operators ¶6 says:
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
There's no undefined behaviour in comparing pointers to unrelated objects for equality or inequality.
By contrast, §6.5.8 Relational Operators ¶5 says:
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.
This means that comparing pointers with >, >=, < or <= when the pointers are not pointing to the same object (for the definition of 'same object' given in painstaking detail in the quote), the behaviour is undefined.
Related
Does the C standard require pointers to be (integer) numbers?
One may argue that yes, because of pointer arithmetic...
But on the other hand operations like -- or ++ may be understood as previous memory location, next memory location, depending on how they are described in the standard, and actual implementation may use any representation to hold pointer data (as long as mentioned operations are implemented)...
Another question comes to mind - does C require arrays/buffers etc. to be contiguous, i.e. next element is stored in next memory location (++p where p is a pointer)? I ask because you can often see implementations online that seem to assume that it does.
No, pointers need not be plain numbers.
If you read the standard, there are provisions for that:
Two pointers to unrelated objects (meaning not part of a bigger object, remember structs and arrays) may not be compared, except for equality.
6.5.8 Relational operators
[...]
5 When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object or incomplete types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript
values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.
Two pointers to unrelated objects may not be subtracted.
6.5.6 Additive operators
[...]
9 When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the twoarray elements. The size of the result is implementation-defined, and its type (a signed integer type) is ptrdiff_t defined in the <stddef.h> header. If the result is not representable in an object of that type, the behavior is undefined. In other words, if the expressions P and Q point to, respectively,the i-th and j-th elements of an array object, the expression (P)-(Q) has the value i−j provided the value fits in an object of type ptrdiff_t. Moreover, if the expression P points either to an element of an array object or one past the last element of an array object, and the expression Q points to the last element of the same array object, the expression ((Q)+1)-(P) has the same
value as ((Q)-(P))+1 and as -((P)-((Q)+1)), and has the value zero if the expression P points one past the last element of the array object, even though the expression (Q)+1 does not point to an element of the array object.91)
There may not be a way to represent a pointer as a number, as no suitable type might exist. Thus, trying to convert might result in Undefined Behavior.
Any specific implementation defining a behavior does not mean it isn't UB according to the standard.
6.3.2.3 Pointers
[...]
6 Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of anyinteger type.
7.18.1.4 Integer types capable of holding object pointers
1 The following type designates a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
intptr_t
The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
uintptr_t
These types are optional.
That's just off the top of my head, I'm sure there's more.
All quotes from n1256 (C99 draft).
Arrays have always been required to be contiguous.
To answer to your second question in arrays elements are in contiguous Memory locations. Thats why you use pointer arithmetic to move between elements.
Where does the ISO C11 standard state that comparing two pointers (with <, >, <=, >=) that do not point to the same array is undefined behavior?
Well, 6.5.8p5 from C11 draft is pretty clear:
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.
That "other case" is for example a case where two pointers point to different arrays.
Note that there is still ongoing work about pointer provenance and hopefully future standard will clear the edge cases.
Inspired by this answering this question, I dug a little into the C11 and C99 standards for the use of equality operators on pointers (the original question concerns relational operators). Here's what C11 has to say (C99 is similar) at §6.5.9.6:
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.94)
Footnote 94 says (and note that footnotes are non-normative):
Two objects may be adjacent in memory because they are adjacent elements of a larger array or adjacent members of a structure with no padding between them, or because the implementation chose to place them so, even though they are unrelated. If prior invalid pointer operations (such as accesses outside array bounds) produced undefined behavior, subsequent comparisons also produce undefined behavior.
The body of the text and the non-normative note appear to be in conflict. If one takes the 'if and only if' from the body of the text seriously, then in no other circumstances than those set out should equality be returned, and there is no room for UB. So, for instance this code:
uintptr_t a = 1;
uintptr_t b = 1;
void *ap = (void *)a;
void *bp = (void *)b;
printf ("%d\n", ap <= bp); /* UB by §6.5.8.5 */
printf ("%d\n", ap < bp); /* UB by §6.5.8.5 */
printf ("%d\n", ap == bp); /* false by §6.5.9.6 ?? */
should print zero, as ap and bp are neither pointers to the same object or function, or any of the other bits set out.
In §6.5.8.5 (relational operators) the behaviour is more clear (my emphasis):
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object or incomplete types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.
Questions:
I am correct that there is some ambiguity as to when equality operators with pointers are permitted UB (comparing the footnote and the body of the text)?
If there is no ambiguity, when precisely can comparison of pointers with equality operators be UB? For instance, is it always UB if at least one pointer is artificially created (per above)? What if one pointer refers to memory that has been free()d? Given the footnote is non-normative, can one conclude there is never UB, in the sense that all 'other' comparisons must yield false?
Does §6.5.9.6 really mean that equality comparison of meaningless but bitwise equal pointers should always be false?
Note this question is tagged language-lawyer; I am not asking what in practice compilers do, as I believe already know the answer to that (compare them using the same technique as comparing integers).
Am I correct that there is some ambiguity as to when equality operators with pointers are UB?
No, because this passage from §6.5.9(3):
The == and != operators are analogous to the relational operators except for their lower precedence.
Implies that the following from §6.5.9(6) also applies to the equality operators:
When two pointers are compared [...] In all other cases, the behavior is undefined.
If there is no ambiguity, when precisely can comparison of pointers with equality operators be UB?
There is undefined behaviour in all cases for which the standard does not explicitly define the behaviour.
Is it always UB if at least one pointer is artificially created converted from an arbitrary integer?
§6.3.2.3(5):
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
What if one pointer refers to memory that has been freed?
§6.2.4(2):
The value of a pointer becomes indeterminate when the object it points to reaches the end of its lifetime.
can one conclude there is never UB, in the sense that all 'other' comparisons must yield false?
No. The standard defines under what conditions two pointers must compare equal, and under what conditions two pointers must compare not equal. Any equality comparisons between two pointers that falls outside both of those two sets of conditions invokes undefined behaviour.
Does §6.5.9(6) really mean that equality comparison of meaningless but bitwise equal pointers should always be false?
No, it is undefined.
I wonder what defines if the start of an memory object is at lower or higher addresses than the end of an object. For example:
char buffer[10];
char* p = &buffer[0];
printf("%p\n",p); //0x7fff064a6276
p = &buffer[9];
printf("%p\n",p); //0x7fff064a627f
In this example the start of object is at a lower address than the end. Even though the stack grows towards lower addresses.
Why does the layout goes the reverse direction of the stack growth?
What defines this direction? Language? OS? Compiler? CPU architecture? ...
Is it always the case that the end of the object is at a higher address than the beginning?
One part of the standard that is relevant is in §6.3.2.3 Pointers (under §6.3 Conversions):
¶7 … When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
Another relevant portion is §6.7.2.1 Structure and union specifiers:
¶15 Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
The definition of addition (and subtraction) is partly relevant (§6.5.6 Additive operators):
¶8 When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
¶9 is a similar paragraph defining the behaviour of subtraction.
And then there's §6.5.2.1 Array subscripting:
¶2 A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
From these, you know that the address of an object converted to a char * must point to the lowest byte address holding the object. In practice, this means that the 'object pointer' address of the object also points to the lowest address. The rule in no way enforces that the data in an int type must be little-endian or big-endian; both are valid.
You also know that the first element in a structure is at a lower address within the structure than later elements.
Most compilers will allocate space on the stack for all of the local variables in one block, with the start of the arrays at the lowest address going upwards.
You need to go to a deeper subroutine to see the addresses "going down".
For example call other subroutine which also has a local buffer. You will find its memory 'lower' in the address space (as the stack has gotten bigger) that the local array in the parent routine.
As answered elsewhere, calling functions like memcpy with invalid or NULL pointers is undefined behaviour, even if the length argument is zero. In the context of such a function, especially memcpy and memmove, is a pointer just past the end of the array a valid pointer?
I'm asking this question because a pointer just past the end of an array is legal to obtain (as opposed to, e.g. a pointer two elements past the end of an array) but you are not allowed to dereference it, yet footnote 106 of ISO 9899:2011 indicates that such a pointer points to into the address space of the program, a criterion required for a pointer to be valid according to §7.1.4.
Such usage occurs in code where I want to insert an item into the middle of an array, requiring me to move all items after the insertion point:
void make_space(type *array, size_t old_length, size_t index)
{
memmove(array + index + 1, array + index, (old_length - index) * sizeof *array);
}
If we want to insert at the end of the array, index is equal to length and array + index + 1 points just past the end of the array, but the number of copied elements is zero.
Passing the past the end pointer to the first argument of memmove has several pitfalls, probably resulting in a nasal demon attack.
Strictly speaking, there is no impermeable guarantee for that to be well defined.
(Unfortunatelly, there is not much information about the "past the last element" conecpt in the standard.)
Note: Sorry about having the other direction now...
The question basicially is whether the "one past the end pointer" is a valid first function argument for memmove if 0 bytes are moved:
T array[length];
memmove(array + length, array + length - 1u, 0u);
The requirement in question is the validity of the first argument.
N1570, 7.1.4, 1
If a function argument is described as being an array, the pointer actually passed to the function shall have a value such that all address computations and accesses to objects (that would be valid if the pointer did point to the first element of such an array) are in fact valid.
If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined.
Making the argument valid if the pointer
is not outside the address space,
is not a null pointer,
is not a pointer to const memory
and if the argument type
is not of array type.
1. Address space
N1570, 6.5.6, 8
Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object.
N1570, 6.5.6, 9
Moreover, if the expression P points either to an element of an array object or one past the last element of an array object, and the expression Q points to the last element of the same array object, the expression ((Q)+1)-(P) has the same value as ((Q)-(P))+1 and as -((P)-((Q)+1)), and has the value zero if the
expression P points one past the last element of the array object, even though the expression (Q)+1 does not point to an element of the array object.106
106 Another way to approach pointer arithmetic is first to convert the pointer(s) to character pointer(s): In this scheme the integer expression added to or subtracted from the converted pointer is first multiplied by the size of the object originally pointed to, and the resulting pointer is converted back to the original type. For pointer subtraction, the result of the difference between the character pointers is similarly divided by the size of the object originally pointed to.
When viewed in this way, an implementation need only provide one extra byte (which may overlap another object in the program) just after the end of the object in order to satisfy the "one past the last element" requirements.
Eventhough the footnote is not normative -as pointed out by Lundin- we have an explanation here that "an implementation need only provide one extra byte".
Although, I can't proove by quoting I suspect that this is a hint that the standard means to require the implementation to included memory inside of the programs address space at the location pointed to by the past the end pointer.
2. Null Pointer
The past the end pointer is not a null pointer.
3. Pointing to const memory
The standard imposes no further requirements on the past the end pointer other than giving some information about the result of several operations and the (again non-normaltive ;)) footnote clarifies that it can overlap with another object.
Thus, there is no guarantee that the memory the past the end pointer points at is non constant.
Since the first argument of memove is a pointer to non-constant memory, passing the past the end pointer is not guaranteed to be valid and potentially undefined behaviour.
4. Validity of array arguments
Chapter 7.21.1 describes the string handling header <string.h> and the first clause states:
The header declares one type and several functions, and defines one macro useful for manipulating arrays of character type and other objects treated as arrays of character type.
I don't think that the standard is very clear here whether the "objects treated as arrays of character type" refers to the functions or to the macro only.
If this sentence actually implies that memove treats the first argument as an array of characters, the behaviour of passing the past the end pointer to memmove is undefined behaviour as per 7.1.4 (which requires a pointer to a valid object).
3.15 object
object
region of data storage in the execution environment, the contents of which can represent
values
The memory, pointer to one past the last element points to, of an array object or an object cannot represent values, since it cannot be dereferenced ( 6.5.6 Additive operators, paragraph 8 ).
7.24.2.1 The memcpy function
The memcpy function copies n characters from the object pointed to by s2 into the
object pointed to by s1. If copying takes place between objects that overlap, the behavior
is undefined.
Pointers passed to memcpy must point to an object.
6.5.3.4 The sizeof and _Alignof operators
When sizeof is applied to an operand that has type char, unsigned char, or
signed char, (or a qualified version thereof) the result is 1. When applied to an
operand that has array type, the result is the total number of bytes in the array. When
applied to an operand that has structure or union type, the result is the total number of bytes in such an object, including internal and trailing padding.
sizeof operator doesn't count the one-past element as the object, since it doesn't count towards the size of the object. Yet it clearly gives the size of the entire object.
6.3.2.1 Lvalues, arrays, and function designators
An lvalue is an expression (with an object type other than void) that potentially
designates an object; 64) if an lvalue does not designate an object when it is evaluated, the
behavior is undefined.
I argue that the one past pointer to an array object or an object, both of which are otherwise allowed to point to, does not represent an object.
int a ;
int* p = a+1 ;
p is defined, but it does not point to an object since it cannot be dereferenced, the memory it points to cannot represent a value, and sizeof doesn't count that memory as a part of the object. Memcpy requires a pointer to an object.
Therefore the passing one past pointer to memcpy causes undefined behavior.
Update:
This part also support the conclusion:
6.5.9 Equality operators
Two pointers compare equal if and only if both are null pointers, both are pointers to the
same object (including a pointer to an object and a subobject at its beginning) or function,
both are pointers to one past the last element of the same array object, or one is a pointer
to one past the end of one array object and the other is a pointer to the start of a different
array object that happens to immediately follow the first array object in the address
space.
This implies that pointer to an object if incremented to one past an object, can point to a different object. In that case, it certainly cannot point to the object it pointed to originally, showing that pointer one past an object doesn't point to an object.
If we look at the C99 standard, there is this:
7.21.1.p2
Where an argument declared as size_t n specifies the length of the
array for a function, n can have the value zero on a call to that
function. Unless explicitly stated otherwise in the description of a
particular function in this subclause, pointer arguments on such a
call shall still have valid values, as described in 7.1.4. On such a
call, a function that locates a character finds no occurrence, a
function that compares two character sequences returns zero, and a
function that copies characters copies zero characters.
...
There is no explicit statement in the description of memcpy in 7.21.2.1
7.1.4.p1
... If a function argument is described as being an array, the pointer
actually passed to the function shall have a value such that all
address computations and accesses to objects (that would be valid if
the pointer did point to the first element of such an array) are in
fact valid.
Emphasis added. It seems the pointers have to point to valid locations (in the sense of dereferencing), and the paragraphs about pointer arithmetic allowing to point to the end + 1 do not apply here.
There is the question if the arguments to memcpy are arrays or not. Of course they are not declared as arrays, but
7.21.1.p1 says
The header string.h declares one type and several functions, and
defines one macro useful for manipulating arrays of character type and
other objects treated as arrays of character type.
and memcpy is in string.h.
So I would assume memcpy does treat the arguments as arrays of characters.
Because the macro mentioned is NULL, the "useful for..." part of the sentence clearly applies to the functions.