Does the C standard require pointers to be (integer) numbers?
One may argue that yes, because of pointer arithmetic...
But on the other hand operations like -- or ++ may be understood as previous memory location, next memory location, depending on how they are described in the standard, and actual implementation may use any representation to hold pointer data (as long as mentioned operations are implemented)...
Another question comes to mind - does C require arrays/buffers etc. to be contiguous, i.e. next element is stored in next memory location (++p where p is a pointer)? I ask because you can often see implementations online that seem to assume that it does.
No, pointers need not be plain numbers.
If you read the standard, there are provisions for that:
Two pointers to unrelated objects (meaning not part of a bigger object, remember structs and arrays) may not be compared, except for equality.
6.5.8 Relational operators
[...]
5 When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object or incomplete types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript
values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.
Two pointers to unrelated objects may not be subtracted.
6.5.6 Additive operators
[...]
9 When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the twoarray elements. The size of the result is implementation-defined, and its type (a signed integer type) is ptrdiff_t defined in the <stddef.h> header. If the result is not representable in an object of that type, the behavior is undefined. In other words, if the expressions P and Q point to, respectively,the i-th and j-th elements of an array object, the expression (P)-(Q) has the value i−j provided the value fits in an object of type ptrdiff_t. Moreover, if the expression P points either to an element of an array object or one past the last element of an array object, and the expression Q points to the last element of the same array object, the expression ((Q)+1)-(P) has the same
value as ((Q)-(P))+1 and as -((P)-((Q)+1)), and has the value zero if the expression P points one past the last element of the array object, even though the expression (Q)+1 does not point to an element of the array object.91)
There may not be a way to represent a pointer as a number, as no suitable type might exist. Thus, trying to convert might result in Undefined Behavior.
Any specific implementation defining a behavior does not mean it isn't UB according to the standard.
6.3.2.3 Pointers
[...]
6 Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of anyinteger type.
7.18.1.4 Integer types capable of holding object pointers
1 The following type designates a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
intptr_t
The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
uintptr_t
These types are optional.
That's just off the top of my head, I'm sure there's more.
All quotes from n1256 (C99 draft).
Arrays have always been required to be contiguous.
To answer to your second question in arrays elements are in contiguous Memory locations. Thats why you use pointer arithmetic to move between elements.
Related
Consider below code snippet :
int *p;
/* Lets say p points to address 100
and sizeof(int) is 4 bytes. */
int *q = p+1;
unsigned long r = q-p;
/* r results in 1, hence for r = q-p
something is happening similar to r=(104-100)/4 */
Is there a real division by sizeof(datatype) going on during runtime when two pointers of same type are subtracted, or there is some other mechanism through which pointer subtraction works.
The C standard states the following regarding pointer subtraction (section 6.5.6p9):
When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array
object; the result is the difference of the subscripts of the
two array elements. The size of the result is
implementation-defined, and its type (a signed integer type) is
ptrdiff_t defined in the header. If the result is not
representable in an object of that type, the behavior is
undefined. In other words, if the expressions P and Q point to,
respectively, the i
-th and j
-th elements of an array object, the expression (P)-(Q) has the value i−j provided the value fits in an object of type ptrdiff_t . Moreover,
if the expression P points either to an element of an array object or
one past the last element of an array object, and the expression Q
points to the last element of the same array object, the expression
((Q)+1)-(P) has the same value as ((Q)-(P))+1 and as
-((P)-((Q)+1)) , and has the value zero if the expression P points one past the last element of the array object, even
though the expression (Q)+1 does not point to an element of the array
object. 106)
Footnote 106 states:
Another way to approach pointer arithmetic is first to convert the
pointer(s) to character pointer(s): In this scheme the integer
expression added to or subtracted from the converted pointer is first
multiplied by the size of the object originally pointed to,
and the resulting pointer is converted back to the original
type. For pointer subtraction, the result of the difference
between the character pointers is similarly divided by the size of
the object originally pointed to. When viewed in this way, an
implementation need only provide one extra byte (which may
overlap another object in the program) just after the end of the
object in order to satisfy the "one past the last element"
requirements.
So the footnote states that pointer subtraction may be implemented by subtracting the raw pointer values and dividing by the size of the pointed-to object. It doesn't have to be implemented this way, however.
Note also that the standard requires that pointer subtraction is performed between pointers pointing to elements of the same array object (or one element past the end). If they don't then the behavior is undefined. In practice, if you're working on a system with a flat memory model you'll probably still get the "expected" values but you can't depend on that.
See #dbush answer for the explanation on how pointer substraction works.
If, instead, you are programming something low-level, say a kernel, driver, debugger or similar and you need to have actual subtraction of addresses, cast the pointers to char *:
(char *)q - (char *)p
The result will be of ptrdiff_t type, an implementation defined signed integer.
Of course, this is not defined/portable C, but will work on most architectures/environments.
Looks like GCC with some optimization thinks two pointers from different translation units can never be same even if they are actually the same.
Code:
main.c
#include <stdint.h>
#include <stdio.h>
int a __attribute__((section("test")));
extern int b;
void check(int cond) { puts(cond ? "TRUE" : "FALSE"); }
int main() {
int * p = &a + 1;
check(
(p == &b)
==
((uintptr_t)p == (uintptr_t)&b)
);
check(p == &b);
check((uintptr_t)p == (uintptr_t)&b);
return 0;
}
b.c
int b __attribute__((section("test")));
If I compile it with -O0, it prints
TRUE
TRUE
TRUE
But with -O1
FALSE
FALSE
TRUE
So p and &b are actually the same value, but the compiler optimized out their comparison assuming they can never be equal.
I can't figure out, which optimization made this.
It doesn't look like strict aliasing, because pointers are of one type, and -fstrict-aliasing option doesn't make this effect.
Is this the documented behavour? Or is this a bug?
There are three aspects in your code which result in general problems:
Conversion of a pointer to an integer is implementation defined. There is no guarantee conversion of two pointers to have all bits identical.
uintptr_t is guaranteed to convert from a pointer to the same type then back unchanged (i.e. compare equal to the original pointer). But nothing more. The integer values themselves are not guaranteed to compare equal. E.g. there could be unused bits with arbitrary value. See the standard, 7.20.1.4.
And (briefly) two pointers can only compare equal if they point into the same array or right behind it (last entry plus one) or at least one is a null pointer. For any other constellation, they compare unequal. For the exact details, see the standard, 6.5.9p6.
Finally, there is no guarantee how variables are placed in memory by the toolchain (typically the linker for static variables, the compiler for automatic variables). Only an array or a struct (i.e. composite types) guarantee the ordering of its elements.
For your example, 6.5.9p7 also applies. It basically treats a pointer to a non-array object for comparision like on to the first entry of an array of size 1. This does not cover an incremented pointer past the object like &a + 1. Relevant is the object the pointer is based on. That is object a for pointer p and b for pointer &b. The rest can be found in paragraph 6.
None of your variables is an array (last part of paragraph 6), so the pointers need not compare equal, even for &a + 1 == &b. The last "TRUE" might arise from gcc assuming the uintptr_t comparison returning true.
gcc is known to agressively optimise while strictly following the standard. Other compilers are more conservative, but that results in less optimised code. Please don't try "solving" this by disabling optimisation or other hacks, but fix it using well-defined behaviour. It is a bug in the code.
p == &b is a pointer comparison and is subject to the following rules from the C Standard (6.5.9 Equality operators, point 4):
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
(uintptr_t)p == (uintptr_t)&b is an arithmetic comparison and is subject to the following rules (6.5.9 Equality operators, point 6):
If both of the operands have arithmetic type, the usual arithmetic conversions are performed. Values of complex types are equal if and only if both their real parts are equal and also their imaginary parts are equal. Any two values of arithmetic types from different type domains are equal if and only if the results of their conversions to the (complex) result type determined by the usual arithmetic conversions are equal.
These two excerpts require very different things from the implementation. And it is clear that the C specification places no requirement on an implementation to mimic the behavior of the former kind of comparison in cases where the latter kind is invoked and vice versa. The implementation is only required to follow this rule (7.18.1.4 Integer types capable of holding object pointers in C99 or 7.20.1.4 in C11):
The [uintptr_t] type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer.
(Addendum: The above quote isn't applicable in this case, because the conversion from int* to uintptr_t does not involve void* as an intermediate step. See Hadi's answer for an explanation and citation on this. Still, the conversion in question is implementation-defined and the two comparisons you are attempting are not required to exhibit the same behavior, which is the main takeaway here.)
As an example of the difference, consider two pointers that point at the same address of two different address spaces. Comparing them as pointers shouldn't return true, but comparing them as unsigned integers might.
&a + 1 is an integer added to a pointer, which is subject to the following rules (6.5.6 Additive operators, point 8):
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. Moreover, if the expression P points to the last
element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object,
the expression (Q)-1 points to the last element of the array object. If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
I believe that this excerpt shows that pointer addition (and subtraction) is defined only for pointers within the same array object or one past the last element. And because (1) a is not an array and (2) a and b aren't members of the same array object, it seems to me that your pointer math operation invokes undefined behavior and your compiler takes advantage of it to assume that the pointer comparison returns false. Again as pointed out in Hadi's answer (and in contrast to what my original answer assumed at this point), pointers to non-array objects can be considered pointers to array objects of length one, and thus adding one to your pointer to the scalar does qualify as pointing to one past the end of the array.
Therefore your case seems to fall under the last part of the first excerpt mentioned in this answer, making your comparison well-defined to evaluate to true if and only if the two variables are linked in sequence and in ascending order. Whether this is true for your program is left unspecified by the standard and it's up to the implementation.
While one of the answers has already been accepted, the accepted answer (and all other answers for that matter) are critically wrong as I'll explain and then answer the question. I'll be quoting from the same C standard, namely n1570.
Let's start with &a + 1. In contrast to what #Theodoros and #Peter has stated, this expression has defined behavior. To see this, consider section 6.5.6 paragraph 7 "Additive operators" which states:
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
and paragraph 8 (in particular, the emphasized part):
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i−n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.
The expression (uintptr_t)p == (uintptr_t)&b has two parts. The conversion from a pointer to an uintptr_t is NOT defined by section 7.20.1.4 (in contrast to what #Olaf and #Theodoros have said):
The following type designates an unsigned integer type with the
property that any valid pointer to void can be converted to this type,
then converted back to pointer to void, and the result will compare
equal to the original pointer:
uintptr_t
It's important to recognize that this rule applies only to valid pointers to void. However, in this case, we have a valid pointer to int. A relevant paragraph can be found in section 6.3.2.3 paragraph 1:
A pointer to void may be converted to or from a pointer to any object
type. A pointer to any object type may be converted to a pointer to
void and back again; the result shall compare equal to the original
pointer.
This means that (uintptr_t)(void*)p is allowed according to this paragraph and 7.20.1.4. But (uintptr_t)p and (uintptr_t)&b are ruled by section 6.3.2.3 paragraph 6:
Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any
integer type.
Note that uintptr_t is an integer type as stated in section 7.20.1.4 mentioned above and therefore this rule applies.
The second part of (uintptr_t)p == (uintptr_t)&b is comparing for equality. As previously discussed, since the result of conversion is implementation-defined, the result of equality is also implementation defined. This applies irrespective of whether the pointers themselves are equal or not.
Now I'll discuss p == &b. The third point in #Olaf's answer is wrong and #Theodoros's answer is incomplete regarding this expression. Section 6.5.9 "Equality operators" paragraph 7:
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
and paragraph 6:
Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an object
and a subobject at its beginning) or function, both are pointers to
one past the last element of the same array object, or one is a
pointer to one past the end of one array object and the other is a
pointer to the start of a different array object that happens to
immediately follow the first array object in the address space.)
In contrast what #Olaf have said, comparing pointers using the == operator never results in undefined behavior (which may occur only when using relational operators such as <= according to section 6.5.8 paragraph 5 which I'll omit here for brevity). Now since p points to the next int relative to a, it will be equal to &b only when the linker has placed b in that location in the binary. Otherwise, there are unequal. So this is implementation-dependent (the relative order of a and b is unspecified by the standard). Since the declarations of a and b use a language extension, namely __attribute__((section("test"))), the relative locations is indeed implementation-dependent by J.5 and 3.4.2 (omitted for brevity).
We conclude that the results of check(p == &b) and check((uintptr_t)p == (uintptr_t)&b) are implementation-dependent. So the answer depends on which version of which compiler you are using. I'm using gcc 4.8 and by compiling with default options except for the level of optimization, the output I get in both -O0 and -O1 cases is all TRUE.
According to C11 6.5.9/6 and C11 6.5.9/7, the test p == &b must give 1 if a and b are adjacent in the address space.
Your example shows that GCC appears to not fulfill this requirement of the Standard.
Update 26/Apr/2016: My original answer contained suggestions about modifying the code to remove other potential sources of UB and isolate this one condition.
However, it's since come to light that the issues raised by this thread are under review - N2012.
One of their recommendations is that p == &b should be unspecified, and they acknowledge that GCC does in fact not implement the ISO C11 requirement.
So I have the remaining text from my answer, as it is no longer necessary to prove a "compiler bug", since the non-conformance (whether you want to call it a bug or not) has been established.
Re-reading your program I see that you are (understandably) baffled by the fact that in the optimized version
p == &b
is false, while
(uintptr_t)p == (uintptr_t)&b;
is true. The last line indicates that the numerical values are indeed identical; how can p == &b then be false??
I must admit that I have no idea. I am convinced that it is a gcc bug.
After a discussion with M.M I think I can make the following case if the conversion to uintptr_t goes through an intermediate void pointer (you should include that in your program and see whether it changes anything):
Because both steps in the conversion chain int* -> void* -> uintptr_t are guaranteed to be reversible, unequal int pointers can logically not result in equal uintptr_t values.1 (Those equal uintptr_t values would have to convert back to equal int pointers, altering at least one of them and thus violating the value-preserving conversion rule.) In code (I'm not aiming for equality here, just demonstrating the conversions and comparisons):
int a,b, *ap=&a, *bp = &b;
assert(ap != bp);
void *avp = ap, *bvp bp;
uintptr_t ua = (uintptr_t)avp, ub = (uintptr_t)bvp;
// Now the following holds:
// if ap != bp then *necessarily* ua != ub.
// This is violated by the OP's case (sans the void* step).
assert((int *)(void *)ua == (int*)(void*)ub);
1This assumes that the uintptr_t doesn't carry hidden information in the form of padding bits which are not evaluated in an arithmetic comparison but possibly in a type conversion. One can check that through CHAR_BIT, UINTPTR_MAX, sizeof(uintptr_t) and some bit fiddling.—
For a similar reason it's conceivable that two uintptr_t values compare different but convert back to the same pointer (namely if there are bits in uintptr_t not used for storing a pointer value, and the conversion does not zero them). But that is the opposite of the OP's problem.
I have implemented an AVL tree in C. Only later did I read that pointer comparison is only valid between objects in the same array. In my implementation, I do certain equality tests. For example, to test whether a node is a right child of a parent I might test node==node->parent->right. However, the nodes are allocated as needed, not in a contiguous chunk. Is this behavior defined? How would you write this code instead if it is not?
For equality and inequality, in the standard (ISO/IEC 9899:2011) §6.5.9 Equality Operators ¶6 says:
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
There's no undefined behaviour in comparing pointers to unrelated objects for equality or inequality.
By contrast, §6.5.8 Relational Operators ¶5 says:
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.
This means that comparing pointers with >, >=, < or <= when the pointers are not pointing to the same object (for the definition of 'same object' given in painstaking detail in the quote), the behaviour is undefined.
As answered elsewhere, calling functions like memcpy with invalid or NULL pointers is undefined behaviour, even if the length argument is zero. In the context of such a function, especially memcpy and memmove, is a pointer just past the end of the array a valid pointer?
I'm asking this question because a pointer just past the end of an array is legal to obtain (as opposed to, e.g. a pointer two elements past the end of an array) but you are not allowed to dereference it, yet footnote 106 of ISO 9899:2011 indicates that such a pointer points to into the address space of the program, a criterion required for a pointer to be valid according to §7.1.4.
Such usage occurs in code where I want to insert an item into the middle of an array, requiring me to move all items after the insertion point:
void make_space(type *array, size_t old_length, size_t index)
{
memmove(array + index + 1, array + index, (old_length - index) * sizeof *array);
}
If we want to insert at the end of the array, index is equal to length and array + index + 1 points just past the end of the array, but the number of copied elements is zero.
Passing the past the end pointer to the first argument of memmove has several pitfalls, probably resulting in a nasal demon attack.
Strictly speaking, there is no impermeable guarantee for that to be well defined.
(Unfortunatelly, there is not much information about the "past the last element" conecpt in the standard.)
Note: Sorry about having the other direction now...
The question basicially is whether the "one past the end pointer" is a valid first function argument for memmove if 0 bytes are moved:
T array[length];
memmove(array + length, array + length - 1u, 0u);
The requirement in question is the validity of the first argument.
N1570, 7.1.4, 1
If a function argument is described as being an array, the pointer actually passed to the function shall have a value such that all address computations and accesses to objects (that would be valid if the pointer did point to the first element of such an array) are in fact valid.
If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined.
Making the argument valid if the pointer
is not outside the address space,
is not a null pointer,
is not a pointer to const memory
and if the argument type
is not of array type.
1. Address space
N1570, 6.5.6, 8
Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object.
N1570, 6.5.6, 9
Moreover, if the expression P points either to an element of an array object or one past the last element of an array object, and the expression Q points to the last element of the same array object, the expression ((Q)+1)-(P) has the same value as ((Q)-(P))+1 and as -((P)-((Q)+1)), and has the value zero if the
expression P points one past the last element of the array object, even though the expression (Q)+1 does not point to an element of the array object.106
106 Another way to approach pointer arithmetic is first to convert the pointer(s) to character pointer(s): In this scheme the integer expression added to or subtracted from the converted pointer is first multiplied by the size of the object originally pointed to, and the resulting pointer is converted back to the original type. For pointer subtraction, the result of the difference between the character pointers is similarly divided by the size of the object originally pointed to.
When viewed in this way, an implementation need only provide one extra byte (which may overlap another object in the program) just after the end of the object in order to satisfy the "one past the last element" requirements.
Eventhough the footnote is not normative -as pointed out by Lundin- we have an explanation here that "an implementation need only provide one extra byte".
Although, I can't proove by quoting I suspect that this is a hint that the standard means to require the implementation to included memory inside of the programs address space at the location pointed to by the past the end pointer.
2. Null Pointer
The past the end pointer is not a null pointer.
3. Pointing to const memory
The standard imposes no further requirements on the past the end pointer other than giving some information about the result of several operations and the (again non-normaltive ;)) footnote clarifies that it can overlap with another object.
Thus, there is no guarantee that the memory the past the end pointer points at is non constant.
Since the first argument of memove is a pointer to non-constant memory, passing the past the end pointer is not guaranteed to be valid and potentially undefined behaviour.
4. Validity of array arguments
Chapter 7.21.1 describes the string handling header <string.h> and the first clause states:
The header declares one type and several functions, and defines one macro useful for manipulating arrays of character type and other objects treated as arrays of character type.
I don't think that the standard is very clear here whether the "objects treated as arrays of character type" refers to the functions or to the macro only.
If this sentence actually implies that memove treats the first argument as an array of characters, the behaviour of passing the past the end pointer to memmove is undefined behaviour as per 7.1.4 (which requires a pointer to a valid object).
3.15 object
object
region of data storage in the execution environment, the contents of which can represent
values
The memory, pointer to one past the last element points to, of an array object or an object cannot represent values, since it cannot be dereferenced ( 6.5.6 Additive operators, paragraph 8 ).
7.24.2.1 The memcpy function
The memcpy function copies n characters from the object pointed to by s2 into the
object pointed to by s1. If copying takes place between objects that overlap, the behavior
is undefined.
Pointers passed to memcpy must point to an object.
6.5.3.4 The sizeof and _Alignof operators
When sizeof is applied to an operand that has type char, unsigned char, or
signed char, (or a qualified version thereof) the result is 1. When applied to an
operand that has array type, the result is the total number of bytes in the array. When
applied to an operand that has structure or union type, the result is the total number of bytes in such an object, including internal and trailing padding.
sizeof operator doesn't count the one-past element as the object, since it doesn't count towards the size of the object. Yet it clearly gives the size of the entire object.
6.3.2.1 Lvalues, arrays, and function designators
An lvalue is an expression (with an object type other than void) that potentially
designates an object; 64) if an lvalue does not designate an object when it is evaluated, the
behavior is undefined.
I argue that the one past pointer to an array object or an object, both of which are otherwise allowed to point to, does not represent an object.
int a ;
int* p = a+1 ;
p is defined, but it does not point to an object since it cannot be dereferenced, the memory it points to cannot represent a value, and sizeof doesn't count that memory as a part of the object. Memcpy requires a pointer to an object.
Therefore the passing one past pointer to memcpy causes undefined behavior.
Update:
This part also support the conclusion:
6.5.9 Equality operators
Two pointers compare equal if and only if both are null pointers, both are pointers to the
same object (including a pointer to an object and a subobject at its beginning) or function,
both are pointers to one past the last element of the same array object, or one is a pointer
to one past the end of one array object and the other is a pointer to the start of a different
array object that happens to immediately follow the first array object in the address
space.
This implies that pointer to an object if incremented to one past an object, can point to a different object. In that case, it certainly cannot point to the object it pointed to originally, showing that pointer one past an object doesn't point to an object.
If we look at the C99 standard, there is this:
7.21.1.p2
Where an argument declared as size_t n specifies the length of the
array for a function, n can have the value zero on a call to that
function. Unless explicitly stated otherwise in the description of a
particular function in this subclause, pointer arguments on such a
call shall still have valid values, as described in 7.1.4. On such a
call, a function that locates a character finds no occurrence, a
function that compares two character sequences returns zero, and a
function that copies characters copies zero characters.
...
There is no explicit statement in the description of memcpy in 7.21.2.1
7.1.4.p1
... If a function argument is described as being an array, the pointer
actually passed to the function shall have a value such that all
address computations and accesses to objects (that would be valid if
the pointer did point to the first element of such an array) are in
fact valid.
Emphasis added. It seems the pointers have to point to valid locations (in the sense of dereferencing), and the paragraphs about pointer arithmetic allowing to point to the end + 1 do not apply here.
There is the question if the arguments to memcpy are arrays or not. Of course they are not declared as arrays, but
7.21.1.p1 says
The header string.h declares one type and several functions, and
defines one macro useful for manipulating arrays of character type and
other objects treated as arrays of character type.
and memcpy is in string.h.
So I would assume memcpy does treat the arguments as arrays of characters.
Because the macro mentioned is NULL, the "useful for..." part of the sentence clearly applies to the functions.
Since adding two pointers together is illegal, how is this code snippet valid?
struct key *low = &tab[0];
struct key *high = &tab[n];
struct key *mid;
while (low < high)
{
mid = low + (high-low) / 2; //isn't this adding pointers?
//code continues...
The first statement in the while loop seems to add two addresses together, how is this legal?
This code is from K&Rs the C programming language on page 122
The difference of two pointers (high - low) is an integer (actually ptrdiff_t, which is a signed integer type), so you're adding an integer to a pointer, which is perfectly legal. This also explains why it's perfectly OK to divide the difference by 2, which is not something you could do with a pointer.
You are allowed to subtract two pointers(the result is ptrdiff_t) and you are allowed to add an integer value to a pointer. This is covered in the draft C99 standard section 6.5.6 Additive operators paragrph 2:
For addition, either both operands shall have arithmetic type, or one operand shall be a
pointer to an object type and the other shall have integer type. (Incrementing is
equivalent to adding 1.)
and paragraph 3:
For subtraction, one of the following shall hold:
and includes the following bullets:
both operands are pointers to qualified or unqualified versions of compatible object
types; or
Some important notes, when subtracting two pointers they must point to the same array, this is covered in paragraph 9 which says:
When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array object; [...]
In order to avoid undefined behavior the resulting pointer from an addition must still point to the same array or one off the end of the array and if you point to one past the end you shall not dereference it, which is in paragraph 8 which says:
[...]If both the pointer operand and the result point to elements of
the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.
The difference of two pointers high - low is a value with an arithmetic, integral type (representable by a value of type ptrdiff_t, in case you need to store it). It's perfectly allowed to add integral values to pointers.
If you're subtracting two pointers that point to elements of the same array, or just past the last element of the same array, then the subtraction is the difference (in array elements) between them. The return value is a signed integer type, ptrdiff_t which is from stddef.h.
So, high - low is returning this signed integer which is then being added to low. So you're not adding pointers, you're adding a pointer with a signed integer type.