It has always been my understanding that the lack of a sequence point after the reading of the right expression in an assignment makes an example like the following produce undefined behavior:
void f(void)
{
int *p;
/*...*/
p = (int [2]){*p};
/*...*/
}
// p is assigned the address of the first element of an array of two ints, the
// first having the value previously pointed to by p and the second, zero. The
// expressions in this compound literal need not be constant. The unnamed object
// has automatic storage duration.
However, this is EXAMPLE 2 under "6.5.2.5 Compound literals" in the committee draft for the C11 standard, the version identified as n1570, which I understand to be the final draft (I don't have access to the final version).
So, my question: Is there something in the standard that gives this defined and specified behavior?
EDIT
I would like to expound on exactly what I see as the problem, in response to some of the discussion that has come up.
We have two conditions under which an assignment is explicitly stated to have
undefined behavior, as per 6.5p2 of the standard quoted in the answer given by dbush:
1) A side effect on a scalar object is unsequenced relative to a different side
effect on the same scalar object.
2) A side effect on a scalar object is unsequenced relative to a value
computation using the value of the same scalar object.
An example of item 1 is "i = ++i + 1". In this case the side effect of
writing the value i+1 into i due to ++i is unsequenced relative to the side effect of assigning the RHS to the LHS. There is a sequence point between the value calculations of each side and the assignment of RHS to LHS, as described in 6.5.16.1 given in the answer by Jens Gustedt below. However, the modification of i due to ++i is not subject to that sequence point, otherwise the behavior would
be defined.
In the example I give above, we have a similar situation. There is a value computation, which involves the creation of an array and the conversion of that array to a pointer to its first element. There is also a side effect of writing a value to part of that array, *p to the first element.
So, I don't see what gaurantees we have in the standard that the modification
of the otherwise uninitialized first element of the array will be sequenced
before the writing of the array address to p. What about this modification (writing *p to the first element) is different from the modification of writing
i+1 to i?
To put it another way, suppose an implementation looked at the statement of interest in the example as three tasks: 1st, allocate space for the compound literal object; 2nd: assign a pointer to said space to p; 3rd: write *p to the first element in the newly allocated space. The value computation for both RHS and LHS would be sequenced before the assignment, as computing the value of the RHS only requires the address. In what way is this hypothetical implementation not standard compliant?
You need to look at the definition of the assignment operator in 6.5.16.1
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands.
The evaluations of the operands are unsequenced.
So here you clearly see that first it evaluates the expressions on both sides in any order or even concurrently, and then stores the value of the right into the object designated by the left.
Additionally, you should know that LHS and RHS of an assignment are evaluated differently. Citations are a bit too long, so here is a summary
For the LHS the evaluation leaves "lvalues", that is objects such as
p, untouched. In particular it doesn't look at the contents of the
object.
For the RHS there is "lvalue conversion", that is for any object that is found there (e.g *p) the contents of that object is loaded.
If the RHS contains an lvalue of array type, this array is converted to a pointer to its first element. This is what is happening to your compound literal.
Edit: You added another question
What about this modification (writing *p to the first element) is
different from the modification of writing i+1 to i?
The difference is simply that i in the LHS of the assignment and thus has to be updated. The array from the compound literal is not in the LHS and thus is of no concern for the update.
Section 6.5p2 of the C standard details why this is valid:
If a side effect on a scalar object is unsequenced relative to either
a different side effect on the same scalar object or a value
computation using the value of the same scalar object, the behavior is
undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an
unsequenced side effect occurs in any of the orderings. 84)
And footnote 84 states:
84) This paragraph renders undefined statement expressions such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a[i] = i;
The posted snippet from 6.5.2.5 falls under the latter, as there is no side effect.
In (int [2]){*p}, *p provides an initial value for the compound literal. This is not an assignment, and it is not a side effect. The initial value is part of the object when the object is created. There is no moment when the array exists and it is not initialized.
In p = (int [2]){*p}, we know the side effect of updating p is sequenced after the computation of the right side because C 2011 [N1570] 6.5.16 3 says “The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands.”
Related
Expression 1: *p++; where p is a pointer to integer.
p will be incremented first and then the value to which it is pointing to is taken due to associativity(right to left). Is it right?
Expression 2: a=*p++; where p is a pointer to integer.
Value of p is taken first and then assigned to a first then p is incremented due to post increment. Is it right?
First of all, let me tell you that, neither associativity nor order of evaluation is actually relevant here. It is all about the operator precedence. Let's see the definitions first. (emphasis mine)
Precedence : In mathematics and computer programming, the order of operations (or operator precedence) is a collection of rules that reflect conventions about which procedures to perform first in order to evaluate a given mathematical expression.
Associativity: In programming languages, the associativity (or fixity) of an operator is a property that determines how operators of the same precedence are grouped in the absence of parentheses.
Order of evaluation : Order of evaluation of the operands of any C operator, including the order of evaluation of function arguments in a function-call expression, and the order of evaluation of the subexpressions within any expression is unspecified, except a few cases. There's mainly two types of evaluation: a) value computation b) side effect.
Post-increment has higher precedence, so it will be evaluated first.
Now, it so happens that the value increment is a side effect of the operation which is sequenced after the " value computation". So, the value computation result, will be the unchanged value of the operand p (which again, here, gets dereferenced due to use of * operator) and then, the increment takes place.
Quoting C11, chapter §6.5.2.4,
The result of the postfix ++ operator is the value of the operand. As a side effect, the
value of the operand object is incremented (that is, the value 1 of the appropriate type is
added to it). See the discussions of additive operators and compound assignment for
information on constraints, types, and conversions and the effects of operations on
pointers. The value computation of the result is sequenced before the side effect of
updating the stored value of the operand. [.....]
The order of evaluation in both the cases are same, the only difference is, in the first case, the final value is discarded.
If you use the first expression "as-is", your compiler should produce a warning about unused value.
Postfix operators have higher priorities than unary operators.
Thus this expression
*p++
is equivalent to the expression
*( p++ )
According to the C Standard (6.5.2.4 Postfix increment and decrement operators)
2 The result of the postfix ++ operator is the value of the
operand. As a side effect, the value of the operand object is
incremented (that is, the value 1 of the appropriate type is added to
it). See the discussions of additive operators and compound assignment
for information on constraints, types, and conversions and the effects
of operations on pointers. The value computation of the result is
sequenced before the side effect of updating the stored value of the
operand.
So p++ yields the original value of the pointer p as the result of the operation and has also a side effect of incrementing the operand itself.
As for the unary operator then (6.5.3.2 Address and indirection operators)
4 The unary * operator denotes indirection. If the operand points to a
function, the result is a function designator; if it points to an
object, the result is an lvalue designating the object. If the operand
has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an
invalid value has been assigned to the pointer, the behavior of the
unary * operator is undefined
So the final result of the expression
*( p++ )
is the value of the object pointed to by the pointer p that also is incremented due to the side effect. This value is assigned to the variable a in the statement
a=*p++;
For example if there are the following declarations
char s[] = "Hello";
char *p = s;
char a;
then after this statement
a = *p++;
the object a will have the character 'H' and the pointer p will point to the second character of the array s that is to the character 'e'.
Associativity is not relevant here. Associativity only matters when you have adjacent operators with the same precedence. But in this case, ++ has higher precedence than *, so only precedence matters. Because of precedence, the expression is equivalent to:
*(p++)
Since it uses post-increment, p++ increments the pointer, but the expression returns the value of the pointer before it was incremented. The indirection then uses that original pointer to fetch the value. It's effectively equivalent to:
int *temp = p;
p = p + 1;
*temp;
The second expression is the same, except it assigns the value to another variable, so that last statement becomes:
a = *temp;
The expression
*p++
is equivalent to
*(p++)
This is due to precedende (i.e.: the postfix increment operator has higher precedence than the indirection operator)
and the expression
a=*p++
is for the same reason equivalent to
a=*(p++)
In both cases, the expression p++ is evaluated to p.
v = i++;: i is returned to the equality operation and then assigned to v. Subsequently, i is incremented (EDIT: technically it's not necessarily executed in this order). Thus v has the old value of i. I remember it like this: ++ is written last and therefore happens last.
v = ++i;: i is incremented, and then returned to be assigned to v. v and i has the same value.
When you don't use the returned value, they do the same (although different implementations may yield different performance in some cases). E.g. in for loops, for(int i=0; i<n; i++) is the same as for(int i=0; i<n; ++i). The latter is sometimes automatically preferred because it tends to be faster for some objects.
* has lower precedence than ++ so *p++ is the same as *(p++). Thus in this case p is returned to * which dereferences it. Then the address in p is incremented by one element. *++p increments the adress of p first, then dereferences it.
v = (*p)++; sets v equal to the old value pointed to by p and then increments it, while v = ++(*p); increments the value pointed to by p and then sets v equal to it. The address in p is unchanged.
Example: If,
int a[] = {1,2};
then
int v = *a++;
and
int v = *++a;
will both leave a incremented, but in the first case v will be 1 and in the latter it'll be 2.
*p++; where p is a pointer to integer.
p will be incremented first and then the value to which it is pointing to is taken due to associativity (right to left). Is it right?
No. In a post-increment, the value is copied to a temporary (an rvalue), then the lvalue is incremented as a side effect.
a=*p++; where p is a pointer to integer.
Value of p is taken first and then assigned to a first then p is incremented due to post increment. Is it right?
No, that's not correct either. The increment of p might happen before the write to a. What's important is that the value being stored in a was loaded using the temporary copy of the prior value of p.
Whether that memory fetch occurs before the memory write with the new value of p isn't specified, and any code that relies on the order is undefined behavior.
Any of these sequences are allowed:
Copy p into temporary THEN increment p, THEN load value at address indicated in temporary THEN store loaded value to a
Copy p into temporary THEN load value at address indicated in temporary (this value itself is placed in a temporary) THEN increment p THEN store loaded value to a
Copy p into temporary THEN load value at address indicated in temporary THEN store loaded value to a THEN increment p
Here are two code examples that are undefined behavior because they rely on the order of side effects:
int a = 7;
int *p = &a;
a = (*p)++; // undefined behavior, do not do this!!
void *pv;
pv = &pv;
void *pv2;
pv2 = *(pv++); // undefined behavior, do not do this!!!
The parentheses do not create a sequence point (or sequenced before relationship, in the new wording). The version of the code with parentheses is just as undefined as the version without.
I've been encountered on a case where cross-platform code was behaving differently on a basic assignment statement.
One compiler evaluated the Lvalue first, Rvalue second and then the assignment.
Another compiler did the Rvalue first, Lvalue second and then the assignment.
This may have impact in case Lvalue influence the value of Rvalue as shown in the following case:
struct MM {
int m;
}
int helper (struct MM** ppmm ) {
(*ppmm) = (struct MM *) malloc (sizeof (struct MM));
(*ppmm)->m = 1000;
return 100;
}
int main() {
struct MM mm = {500};
struct MM* pmm = &mm
pmm->m = helper(&pmm);
printf(" %d %d " , mm.m , pmm->m);
}
The example above, the line pmm->m = helper(&mm);, depend on the order of evaluation. if Lvalue evaluated first, than pmm->m is equivalent to mm.m, and if Rvalue calculated first than pmm->m is equivalent to the MM instance that allocated on heap.
My question is whether there's a C standard to determine the order of evaluation (didn't find any), or each compiler can choose what to do.
are there any other similar pitfalls I should be aware of ?
The semantics for evaluation of an = expression include that
The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
(C2011, 6.5.16/3; emphasis added)
The emphasized provision explicitly permits your observed difference in the behavior of the program when compiled by different compilers. Moreover, unsequenced means, among other things, that it is permissible for the evaluations to occur in different order even in different runs of the very same build of the program. If the function in which the unsequenced evaluations appear were called more than once, then it would be permissible for the evaluations to occur in different order during different calls within the same execution of the program.
That already answers the question, but it's important to see the bigger picture. Modifying an object or calling a function that does so is a side effect (C2011, 5.1.2.3/2). This key provision therefore comes into play:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
(C2011, 6.5/2)
The called function has the side effect of modifying the value stored in main()'s variable pmm, evaluation of the left-hand operand of the assignment involves a value computation using the value of pmm, and these are unsequenced, therefore the behavior is undefined.
Undefined behavior is to be avoided at all costs. Because your program's behavior is undefined, is not limited to the two alternatives you observed (in case that wasn't bad enough). The C standard places no limitations whatever on what it may do. It might instead crash, zero out your hard drive's partition table, or, if you have suitable hardware, summon nasal demons. Or anything else. Most of these are unlikely, but the best viewpoint is that if your program has undefined behavior then your program is wrong.
When using the simple assignment operator: =, the order of evaluation of operands is unspecified. There is also no sequence point in between the evaluations.
For example if you have two functions:
*Get() = logf(2.0f);
It is not specified in which order they are called at any time, and yet this behavior is completely defined.
A function call will introduce a sequence point. It will happen after the evaluation of the arguments and before the actual call. The operator ; will also introduce a sequence point. This is important because an object must not be modified twice without an intervening sequence point, otherwise the behavior is undefined.
Your example is particularly complicated due to unspecified behavior, and may have different results, depending the left or right operand is evaluated first.
The left operand is evaluated first.
The left operand is evaluated and the pointer pmm will point to the struct mm. Then the function is called, and a sequence point occurs. it modifies the pointer pmm by pointing it to allocated memory, followed by a sequence point because of the operator ;. Then it stores the value 1000 to the member m, followed by another sequence point because of ;. The function returns 100 and assigns it to the left operand, but since the left operand was evaluated first, the value 100, it is assigned to the object mm, more specifically its member m.
mm->m has the value 100 and ppm->m has the value 1000. This is defined behavior, no object is modified twice in-between sequence points.
The right operand is evaluated first.
The function is called first, the sequence point occurs, it modifies the pointer ppm by pointing it to new allocated struct, followed by a sequence point. Then it stores the value 1000 to the member m, followed by a sequence point. Then the function returns. Then the left operand is evaluated, ppm->m will point to the new allocated struct, and its member m, is modified by assigning it the value 100.
mm->m will have the value 500 since it was never modified, and pmm->m will have the value 100. No object was modified twice in-between sequence points. The behavior is defined.
Looks like GCC with some optimization thinks two pointers from different translation units can never be same even if they are actually the same.
Code:
main.c
#include <stdint.h>
#include <stdio.h>
int a __attribute__((section("test")));
extern int b;
void check(int cond) { puts(cond ? "TRUE" : "FALSE"); }
int main() {
int * p = &a + 1;
check(
(p == &b)
==
((uintptr_t)p == (uintptr_t)&b)
);
check(p == &b);
check((uintptr_t)p == (uintptr_t)&b);
return 0;
}
b.c
int b __attribute__((section("test")));
If I compile it with -O0, it prints
TRUE
TRUE
TRUE
But with -O1
FALSE
FALSE
TRUE
So p and &b are actually the same value, but the compiler optimized out their comparison assuming they can never be equal.
I can't figure out, which optimization made this.
It doesn't look like strict aliasing, because pointers are of one type, and -fstrict-aliasing option doesn't make this effect.
Is this the documented behavour? Or is this a bug?
There are three aspects in your code which result in general problems:
Conversion of a pointer to an integer is implementation defined. There is no guarantee conversion of two pointers to have all bits identical.
uintptr_t is guaranteed to convert from a pointer to the same type then back unchanged (i.e. compare equal to the original pointer). But nothing more. The integer values themselves are not guaranteed to compare equal. E.g. there could be unused bits with arbitrary value. See the standard, 7.20.1.4.
And (briefly) two pointers can only compare equal if they point into the same array or right behind it (last entry plus one) or at least one is a null pointer. For any other constellation, they compare unequal. For the exact details, see the standard, 6.5.9p6.
Finally, there is no guarantee how variables are placed in memory by the toolchain (typically the linker for static variables, the compiler for automatic variables). Only an array or a struct (i.e. composite types) guarantee the ordering of its elements.
For your example, 6.5.9p7 also applies. It basically treats a pointer to a non-array object for comparision like on to the first entry of an array of size 1. This does not cover an incremented pointer past the object like &a + 1. Relevant is the object the pointer is based on. That is object a for pointer p and b for pointer &b. The rest can be found in paragraph 6.
None of your variables is an array (last part of paragraph 6), so the pointers need not compare equal, even for &a + 1 == &b. The last "TRUE" might arise from gcc assuming the uintptr_t comparison returning true.
gcc is known to agressively optimise while strictly following the standard. Other compilers are more conservative, but that results in less optimised code. Please don't try "solving" this by disabling optimisation or other hacks, but fix it using well-defined behaviour. It is a bug in the code.
p == &b is a pointer comparison and is subject to the following rules from the C Standard (6.5.9 Equality operators, point 4):
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
(uintptr_t)p == (uintptr_t)&b is an arithmetic comparison and is subject to the following rules (6.5.9 Equality operators, point 6):
If both of the operands have arithmetic type, the usual arithmetic conversions are performed. Values of complex types are equal if and only if both their real parts are equal and also their imaginary parts are equal. Any two values of arithmetic types from different type domains are equal if and only if the results of their conversions to the (complex) result type determined by the usual arithmetic conversions are equal.
These two excerpts require very different things from the implementation. And it is clear that the C specification places no requirement on an implementation to mimic the behavior of the former kind of comparison in cases where the latter kind is invoked and vice versa. The implementation is only required to follow this rule (7.18.1.4 Integer types capable of holding object pointers in C99 or 7.20.1.4 in C11):
The [uintptr_t] type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer.
(Addendum: The above quote isn't applicable in this case, because the conversion from int* to uintptr_t does not involve void* as an intermediate step. See Hadi's answer for an explanation and citation on this. Still, the conversion in question is implementation-defined and the two comparisons you are attempting are not required to exhibit the same behavior, which is the main takeaway here.)
As an example of the difference, consider two pointers that point at the same address of two different address spaces. Comparing them as pointers shouldn't return true, but comparing them as unsigned integers might.
&a + 1 is an integer added to a pointer, which is subject to the following rules (6.5.6 Additive operators, point 8):
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. Moreover, if the expression P points to the last
element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object,
the expression (Q)-1 points to the last element of the array object. If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
I believe that this excerpt shows that pointer addition (and subtraction) is defined only for pointers within the same array object or one past the last element. And because (1) a is not an array and (2) a and b aren't members of the same array object, it seems to me that your pointer math operation invokes undefined behavior and your compiler takes advantage of it to assume that the pointer comparison returns false. Again as pointed out in Hadi's answer (and in contrast to what my original answer assumed at this point), pointers to non-array objects can be considered pointers to array objects of length one, and thus adding one to your pointer to the scalar does qualify as pointing to one past the end of the array.
Therefore your case seems to fall under the last part of the first excerpt mentioned in this answer, making your comparison well-defined to evaluate to true if and only if the two variables are linked in sequence and in ascending order. Whether this is true for your program is left unspecified by the standard and it's up to the implementation.
While one of the answers has already been accepted, the accepted answer (and all other answers for that matter) are critically wrong as I'll explain and then answer the question. I'll be quoting from the same C standard, namely n1570.
Let's start with &a + 1. In contrast to what #Theodoros and #Peter has stated, this expression has defined behavior. To see this, consider section 6.5.6 paragraph 7 "Additive operators" which states:
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
and paragraph 8 (in particular, the emphasized part):
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i−n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.
The expression (uintptr_t)p == (uintptr_t)&b has two parts. The conversion from a pointer to an uintptr_t is NOT defined by section 7.20.1.4 (in contrast to what #Olaf and #Theodoros have said):
The following type designates an unsigned integer type with the
property that any valid pointer to void can be converted to this type,
then converted back to pointer to void, and the result will compare
equal to the original pointer:
uintptr_t
It's important to recognize that this rule applies only to valid pointers to void. However, in this case, we have a valid pointer to int. A relevant paragraph can be found in section 6.3.2.3 paragraph 1:
A pointer to void may be converted to or from a pointer to any object
type. A pointer to any object type may be converted to a pointer to
void and back again; the result shall compare equal to the original
pointer.
This means that (uintptr_t)(void*)p is allowed according to this paragraph and 7.20.1.4. But (uintptr_t)p and (uintptr_t)&b are ruled by section 6.3.2.3 paragraph 6:
Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any
integer type.
Note that uintptr_t is an integer type as stated in section 7.20.1.4 mentioned above and therefore this rule applies.
The second part of (uintptr_t)p == (uintptr_t)&b is comparing for equality. As previously discussed, since the result of conversion is implementation-defined, the result of equality is also implementation defined. This applies irrespective of whether the pointers themselves are equal or not.
Now I'll discuss p == &b. The third point in #Olaf's answer is wrong and #Theodoros's answer is incomplete regarding this expression. Section 6.5.9 "Equality operators" paragraph 7:
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
and paragraph 6:
Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an object
and a subobject at its beginning) or function, both are pointers to
one past the last element of the same array object, or one is a
pointer to one past the end of one array object and the other is a
pointer to the start of a different array object that happens to
immediately follow the first array object in the address space.)
In contrast what #Olaf have said, comparing pointers using the == operator never results in undefined behavior (which may occur only when using relational operators such as <= according to section 6.5.8 paragraph 5 which I'll omit here for brevity). Now since p points to the next int relative to a, it will be equal to &b only when the linker has placed b in that location in the binary. Otherwise, there are unequal. So this is implementation-dependent (the relative order of a and b is unspecified by the standard). Since the declarations of a and b use a language extension, namely __attribute__((section("test"))), the relative locations is indeed implementation-dependent by J.5 and 3.4.2 (omitted for brevity).
We conclude that the results of check(p == &b) and check((uintptr_t)p == (uintptr_t)&b) are implementation-dependent. So the answer depends on which version of which compiler you are using. I'm using gcc 4.8 and by compiling with default options except for the level of optimization, the output I get in both -O0 and -O1 cases is all TRUE.
According to C11 6.5.9/6 and C11 6.5.9/7, the test p == &b must give 1 if a and b are adjacent in the address space.
Your example shows that GCC appears to not fulfill this requirement of the Standard.
Update 26/Apr/2016: My original answer contained suggestions about modifying the code to remove other potential sources of UB and isolate this one condition.
However, it's since come to light that the issues raised by this thread are under review - N2012.
One of their recommendations is that p == &b should be unspecified, and they acknowledge that GCC does in fact not implement the ISO C11 requirement.
So I have the remaining text from my answer, as it is no longer necessary to prove a "compiler bug", since the non-conformance (whether you want to call it a bug or not) has been established.
Re-reading your program I see that you are (understandably) baffled by the fact that in the optimized version
p == &b
is false, while
(uintptr_t)p == (uintptr_t)&b;
is true. The last line indicates that the numerical values are indeed identical; how can p == &b then be false??
I must admit that I have no idea. I am convinced that it is a gcc bug.
After a discussion with M.M I think I can make the following case if the conversion to uintptr_t goes through an intermediate void pointer (you should include that in your program and see whether it changes anything):
Because both steps in the conversion chain int* -> void* -> uintptr_t are guaranteed to be reversible, unequal int pointers can logically not result in equal uintptr_t values.1 (Those equal uintptr_t values would have to convert back to equal int pointers, altering at least one of them and thus violating the value-preserving conversion rule.) In code (I'm not aiming for equality here, just demonstrating the conversions and comparisons):
int a,b, *ap=&a, *bp = &b;
assert(ap != bp);
void *avp = ap, *bvp bp;
uintptr_t ua = (uintptr_t)avp, ub = (uintptr_t)bvp;
// Now the following holds:
// if ap != bp then *necessarily* ua != ub.
// This is violated by the OP's case (sans the void* step).
assert((int *)(void *)ua == (int*)(void*)ub);
1This assumes that the uintptr_t doesn't carry hidden information in the form of padding bits which are not evaluated in an arithmetic comparison but possibly in a type conversion. One can check that through CHAR_BIT, UINTPTR_MAX, sizeof(uintptr_t) and some bit fiddling.—
For a similar reason it's conceivable that two uintptr_t values compare different but convert back to the same pointer (namely if there are bits in uintptr_t not used for storing a pointer value, and the conversion does not zero them). But that is the opposite of the OP's problem.
After hunting for a related or duplicate question concerning the following to no avail (I can only do marginal justice to describe the sheer number of pointer-arithmetic and post-decrement questions tagged with C, but suffice it to say "boatloads" does a grave injustice to that result set count) I toss this in the ring in hopes of clarification or a referral to a duplicate that eluded me.
If the post-decrement operator is applied to a pointer such as below, a simple reverse-iteration of an array sequence, does the following code invoke undefined behavior?
#include <stdio.h>
#include <string.h>
int main()
{
char s[] = "some string";
const char *t = s + strlen(s);
while(t-->s)
fputc(*t, stdout);
fputc('\n', stdout);
return 0;
}
It was recently proposed to me that 6.5.6.p8 Additive operators, in conjunction with 6.5.2.p4, Postfix increment and decrement operators, specifies even performing a post-decrement upon t when it already contains the base-address of s invokes undefined behavior, regardless of whether the resulting value of t (not the t-- expression result) is evaluated or not. I simply want to know if that is indeed the case.
The cited portions of the standard were:
6.5.6 Additive Operators
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined.
and its nearly tightly coupled relationship with...
6.5.2.4 Postfix increment and decrement operators Constraints
The operand of the postfix increment or decrement operator shall have
atomic, qualified, or unqualified real or pointer type, and shall be a
modifiable lvalue.
Semantics
The result of the postfix ++ operator is the value of the operand. As a side effect, the value of the operand object is incremented (that is, the value 1 of the appropriate type is added to it). See the discussions of additive operators and compound assignment for information on constraints, types, and conversions and the effects of operations on pointers. The value computation of the result is sequenced before the side effect of updating the stored value of the operand. With respect to an indeterminately-sequenced function call, the operation of postfix ++ is a single evaluation. Postfix ++ on an object with atomic type is a read-modify-write operation with memory_order_seq_cst memory order semantics.98)
The postfix -- operator is analogous to the postfix ++ operator, except that the value of the operand is decremented (that is, the value 1 of the appropriate type is subtracted from it).
Forward references: additive operators (6.5.6), compound assignment (6.5.16.2).
The very reason for using the post-decrement operator in the posted sample is to avoid evaluating an eventually-invalid address value against the base address of the array. For example, the code above was a refactor of the following:
#include <stdio.h>
#include <string.h>
int main()
{
char s[] = "some string";
size_t len = strlen(s);
char *t = s + len - 1;
while(t >= s)
{
fputc(*t, stdout);
t = t - 1;
}
fputc('\n', stdout);
}
Forgetting for a moment this has a non-zero-length string for s, this general algorithm clearly has issues (perhaps not as clearly to some). If s[] were instead "", then t would be assigned a value of s-1, which itself is not in the valid range of s through its one-past-address, and the evaluation for comparison against s that ensues is no good. If s has non-zero length, that addresses the initial s-1 problem, but only temporarily, as eventually this is still counting on that value (whatever it is) being valid for comparison against s to terminate the loop. It could be worse. it could have naively been:
size_t len = strlen(s) - 1;
char *t = s + len;
This has disaster written all over it if s were a zero-length string. The refactored code of this question opened with was intended to address all of these issues. But...
My paranoia may be getting to me, but it isn't paranoia if they're really all out to get you. So, per the standard (these sections, or perhaps others), does the original code (scroll to the top of this novel if you forgot what it looks like by now) indeed invoke undefined behavior or not?
I am pretty certain that the result of the post-decrement in this case is indeed undefined behaviour. The post-decrement clearly subtracts one from a pointer to the beginning of an object, so the result does not point to an element of the same array, and by the definition of pointer arithmetic (§6.5.6/8, as cited in the OP) that's undefined behaviour. The fact that you never use the resulting pointer is irrelevant.
What's wrong with:
char *t = s + strlen(s);
while (t > s) fputc(*--t, stdout);
Interesting but irrelevant fact: The implementation of reverse iterators in the standard C++ library usually holds in the reverse iterator a pointer to one past the target element. This allows the reverse iterator to be used normally without ever involving a pointer to "one before the beginning" of the container, which would be UB, as above.
int func(int **a)
{
*a = NULL;
return 1234;
}
int main()
{
int x = 0, *ptr = &x;
*ptr = func(&ptr); // <-???
printf("%d\n", x); // print '1234'
printf("%p\n", ptr); // print 'nil'
return 0;
}
Is this an example of undefined behavior or has to do with sequence points?
why the line:
*ptr = func(&ptr);
doesn't behave like:
*NULL = 1234;
EDIT: I forgot to mention that I get the output '1234' and 'nil' with gcc 4.7.
Since there is no sequence point between evaluations of the left and right hand sides of the assignment operator, it is not specified whether *ptr or func(&ptr) is evaluated first. Thus it is not guaranteed that the evaluation of *ptr is allowed, and the program has undefined behaviour.
The language does not guarantee you that the right-hand side subexpression func(&ptr) in
*ptr = func(&ptr);
is evaluated first, and the left-hand side subexpression *ptr is evaluated later (which is apparently what you expected to happen). The left-hand side can legally be evaluated first, before call to func. And this is exactly what happened in your case: *ptr got evaluated before the call, when ptr was still pointing to x. After that the assignment destination became finalized (i.e. it became known that the code will assign to x). Once it happens, changing ptr no longer changes the assignment destination.
So, the immediate behavior of your code is unspecified due to unspecified order of evaluation. However, one possible evaluation schedule leads to undefined behavior by causing a null pointer dereference. This means that in general case the behavior is undefined.
If I had to model the behavior of this code in terms of C++ language, I'd say that the process of evaluation in this case can be split into these essential steps
1a. int &lhs = *ptr; // evaluate the left-hand side
1b. int rhs = func(&ptr); // evaluate the right-hand side
2. lhs = rhs; // perform the actual assignment
(Even though C language does not have references, internally it uses the same concept of "run-time bound lvalue" to store the result of evaluation of left-hand side of assignment.) The language specification allows enough freedom to make steps 1a and 1b to occur in any order. You expected 1b to occur first, while your compiler decided to start with 1a.
This is undefined behaviour, I believe. The standard does not stipulate when the LHS of the assignment is evaluated compared to the RHS. If *ptr is evaluated after the function is called, you will be dereferencing a null pointer; if it is evaluated before the function is called, then you get sane behaviour.
The code is thoroughly disreputable. Do not try using it, or anything similar, in real code.
Note that there is a sequence point immediately before a function is called, after its arguments have been evaluated; there is also a sequence point immediately before a function returns. Thus, there are sequence points related to the evaluation of the function arguments and its return value, but...and this is crucial in this context...it still does not tell you whether *ptr is evaluated before or after the function is called. Either is possible; both are correct; the code depends on which happens, which makes it rely on undefined behaviour.
The assignment operator is not a sequence point. So, there is no guarantee, as to which side will be evaluated first. So, it is unspecified behaviour.
In one of the cases (dereferencing a NULLPTR) it could exhibit undefined behavior.
Between consecutive "sequence points" an object's value can be
modified only once by an expression.
You can see a list of what are defined sequence points in C here.
While the call of a function is a sequence point, this is bound to the evaluation of parameters (before the call), not the functions side-effects (the call itself).