Assignment and pointers, undefined behavior? - c

int func(int **a)
{
*a = NULL;
return 1234;
}
int main()
{
int x = 0, *ptr = &x;
*ptr = func(&ptr); // <-???
printf("%d\n", x); // print '1234'
printf("%p\n", ptr); // print 'nil'
return 0;
}
Is this an example of undefined behavior or has to do with sequence points?
why the line:
*ptr = func(&ptr);
doesn't behave like:
*NULL = 1234;
EDIT: I forgot to mention that I get the output '1234' and 'nil' with gcc 4.7.

Since there is no sequence point between evaluations of the left and right hand sides of the assignment operator, it is not specified whether *ptr or func(&ptr) is evaluated first. Thus it is not guaranteed that the evaluation of *ptr is allowed, and the program has undefined behaviour.

The language does not guarantee you that the right-hand side subexpression func(&ptr) in
*ptr = func(&ptr);
is evaluated first, and the left-hand side subexpression *ptr is evaluated later (which is apparently what you expected to happen). The left-hand side can legally be evaluated first, before call to func. And this is exactly what happened in your case: *ptr got evaluated before the call, when ptr was still pointing to x. After that the assignment destination became finalized (i.e. it became known that the code will assign to x). Once it happens, changing ptr no longer changes the assignment destination.
So, the immediate behavior of your code is unspecified due to unspecified order of evaluation. However, one possible evaluation schedule leads to undefined behavior by causing a null pointer dereference. This means that in general case the behavior is undefined.
If I had to model the behavior of this code in terms of C++ language, I'd say that the process of evaluation in this case can be split into these essential steps
1a. int &lhs = *ptr; // evaluate the left-hand side
1b. int rhs = func(&ptr); // evaluate the right-hand side
2. lhs = rhs; // perform the actual assignment
(Even though C language does not have references, internally it uses the same concept of "run-time bound lvalue" to store the result of evaluation of left-hand side of assignment.) The language specification allows enough freedom to make steps 1a and 1b to occur in any order. You expected 1b to occur first, while your compiler decided to start with 1a.

This is undefined behaviour, I believe. The standard does not stipulate when the LHS of the assignment is evaluated compared to the RHS. If *ptr is evaluated after the function is called, you will be dereferencing a null pointer; if it is evaluated before the function is called, then you get sane behaviour.
The code is thoroughly disreputable. Do not try using it, or anything similar, in real code.
Note that there is a sequence point immediately before a function is called, after its arguments have been evaluated; there is also a sequence point immediately before a function returns. Thus, there are sequence points related to the evaluation of the function arguments and its return value, but...and this is crucial in this context...it still does not tell you whether *ptr is evaluated before or after the function is called. Either is possible; both are correct; the code depends on which happens, which makes it rely on undefined behaviour.

The assignment operator is not a sequence point. So, there is no guarantee, as to which side will be evaluated first. So, it is unspecified behaviour.
In one of the cases (dereferencing a NULLPTR) it could exhibit undefined behavior.
Between consecutive "sequence points" an object's value can be
modified only once by an expression.
You can see a list of what are defined sequence points in C here.

While the call of a function is a sequence point, this is bound to the evaluation of parameters (before the call), not the functions side-effects (the call itself).

Related

Is dereferencing a NULL pointer to array valid in C?

Is this behavior defined or not?
volatile long (*volatile ptr)[1] = (void*)NULL;
volatile long v = (long) *ptr;
printf("%ld\n", v);
It works because by dereferencing pointer to array we are receiving an array itself, then that array decaying to pointer to it's first element.
Updated demo: https://ideone.com/DqFF6T
Also, GCC even considers next code as a constant expression:
volatile long (*ptr2)[1] = (void*)NULL;
enum { this_is_constant_in_gcc = ((void*)ptr2 == (void*)*ptr2) };
printf("%d\n", this_is_constant_in_gcc);
Basically, dereferencing ptr2 at compile time;
This:
long (*ptr)[1] = NULL;
Is declaring a pointer to an "array of 1 long" (more precisely, the type is long int (*)[1]), with the initial value of NULL. Everything fine, any pointer can be NULL.
Then, this:
long v = (long) *ptr;
Is dereferencing the NULL pointer, which is undefined behavior. All bets are off, if your program does not crash, the following statement could print any value or do anything else really.
Let me make this clear one more time: undefined behavior means that anything can happen. There is no explanation as to why anything strange happens after invoking undefined behavior, nor there needs to be. The compiler could very well emit 16-bit Real Mode x86 assembly, produce a binary that deletes your entire home folder, emit the Apollo 11 Guidance Computer assembly code, or whatever else. It is not a bug. It's perfectly conforming to the standard.
The only reason your code seems to work is because GCC decides, purely out of coincidence, to do the following (Godbolt link):
mov QWORD PTR [rbp-8], 0 ; put NULL on the stack
mov rax, QWORD PTR [rbp-8]
mov QWORD PTR [rbp-16], rax ; move NULL to the variable v
Causing the NULL-dereference to never actually happen. This is most probably a consequence of the undefined behavior in dereferencing ptr ¯\_(ツ)_/¯
Funnily enough, I previously said in a comment:
dereferencing NULL is invalid and will basically always cause a segmentation fault.
But of course, since it is undefined behavior that "basically always" is wrong. I think this is the first time I ever see a null-pointer dereference not cause a SIGSEGV.
Is this behavior defined or not?
Not.
long (*ptr)[1] = NULL;
long v = (long) *ptr;
printf("%ld\n", v);
It works because by dereferencing pointer to array we are receiving an
array itself, then that array decaying to pointer to it's first
element.
No, you are confusing type with value. It is true that the expression *ptr on the second line has type long[1], but evaluating that expression produces undefined behavior regardless of the data type, and regardless of the automatic conversion that would be applied to the result if it were defined.
The relevant section of the spec is paragraph 6.5.2.3/4:
The unary * operator denotes indirection. If the operand points to a
function, the result is a function designator; if it points to an
object, the result is an lvalue designating the object. If the operand
has type ''pointer to type'', the result has type ''type''. If an
invalid value has been assigned to the pointer, the behavior of the
unary * operator is undefined.
A footnote goes on to clarify that
[...] Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer [...]
It may "work" for you in an empirical sense, but from a language perspective, any output at all or none is a conforming result.
Update:
It may be interesting to note that the answer would be different for explicitly taking the address of *ptr than it is for supposing that array decay will overcome the undefinedness of the dereference. The standard provides that, as a special case, where the operand of the unary & operator is the result of a unary * operator, neither of those operators is evaluated. Provided that all relevant constraints are satisfied, the result is as if they were both omitted altogether, except that it is never an lvalue.
Thus, this is ok:
long (*ptr)[1] = NULL;
long v = (long) &*ptr;
printf("%ld\n", v);
On many implementations it will reliably print 0, but do note that C does not specify that it must be 0.
The key distinction here is that in this case, the * operation is not evaluated (per spec). The * operation in the original code is is evaluated, notwithstanding the fact that if the pointer value were valid, the resulting array would be converted right back to a pointer (of a different type, but to the same location). That does suggest an obvious shortcut that implementations may take with the original code, and they may take it, if they wish, without regard to whether ptr's value is valid because if it is invalid then they can do whatever they want.
To just answer you´re provided questions:
Is dereferencing a NULL pointer to array valid in C?
No.
Is this behavior defined or not?
It is classified as "undefined behavior", so it is not defined.
Never mind of the case, that this trick with the array, maybe will work on some implementations and it fills absolutely no needs to do so (I imply you are asking out of curiousity), it is not valid per the C standard to dereference a NULL pointer in any way and will cause "Undefined Behavior".
Anything can happen when you implement such statements into your program.
Look at the answers on this question, which explain why:
What EXACTLY is meant by "de-referencing a NULL pointer"?
One qoute from Adam Rosenfield´s answer:
A null pointer is a pointer that does not point to any valid data (but it is not the only such pointer). The C standard says that it is undefined behavior to dereference a null pointer. This means that absolutely anything could happen: the program could crash, it could continue working silently, or it could erase your hard drive (although that's rather unlikely).
Is this behavior defined or not?
The behavior is undefined because you are applying * operator to a pointer that compares equal to null pointer constant.
The following stackoverflow thread tries to explain what undefined behavior is: Undefined, unspecified and implementation-defined behavior

Using pointed to content in assignment of a pointer

It has always been my understanding that the lack of a sequence point after the reading of the right expression in an assignment makes an example like the following produce undefined behavior:
void f(void)
{
int *p;
/*...*/
p = (int [2]){*p};
/*...*/
}
// p is assigned the address of the first element of an array of two ints, the
// first having the value previously pointed to by p and the second, zero. The
// expressions in this compound literal need not be constant. The unnamed object
// has automatic storage duration.
However, this is EXAMPLE 2 under "6.5.2.5 Compound literals" in the committee draft for the C11 standard, the version identified as n1570, which I understand to be the final draft (I don't have access to the final version).
So, my question: Is there something in the standard that gives this defined and specified behavior?
EDIT
I would like to expound on exactly what I see as the problem, in response to some of the discussion that has come up.
We have two conditions under which an assignment is explicitly stated to have
undefined behavior, as per 6.5p2 of the standard quoted in the answer given by dbush:
1) A side effect on a scalar object is unsequenced relative to a different side
effect on the same scalar object.
2) A side effect on a scalar object is unsequenced relative to a value
computation using the value of the same scalar object.
An example of item 1 is "i = ++i + 1". In this case the side effect of
writing the value i+1 into i due to ++i is unsequenced relative to the side effect of assigning the RHS to the LHS. There is a sequence point between the value calculations of each side and the assignment of RHS to LHS, as described in 6.5.16.1 given in the answer by Jens Gustedt below. However, the modification of i due to ++i is not subject to that sequence point, otherwise the behavior would
be defined.
In the example I give above, we have a similar situation. There is a value computation, which involves the creation of an array and the conversion of that array to a pointer to its first element. There is also a side effect of writing a value to part of that array, *p to the first element.
So, I don't see what gaurantees we have in the standard that the modification
of the otherwise uninitialized first element of the array will be sequenced
before the writing of the array address to p. What about this modification (writing *p to the first element) is different from the modification of writing
i+1 to i?
To put it another way, suppose an implementation looked at the statement of interest in the example as three tasks: 1st, allocate space for the compound literal object; 2nd: assign a pointer to said space to p; 3rd: write *p to the first element in the newly allocated space. The value computation for both RHS and LHS would be sequenced before the assignment, as computing the value of the RHS only requires the address. In what way is this hypothetical implementation not standard compliant?
You need to look at the definition of the assignment operator in 6.5.16.1
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands.
The evaluations of the operands are unsequenced.
So here you clearly see that first it evaluates the expressions on both sides in any order or even concurrently, and then stores the value of the right into the object designated by the left.
Additionally, you should know that LHS and RHS of an assignment are evaluated differently. Citations are a bit too long, so here is a summary
For the LHS the evaluation leaves "lvalues", that is objects such as
p, untouched. In particular it doesn't look at the contents of the
object.
For the RHS there is "lvalue conversion", that is for any object that is found there (e.g *p) the contents of that object is loaded.
If the RHS contains an lvalue of array type, this array is converted to a pointer to its first element. This is what is happening to your compound literal.
Edit: You added another question
What about this modification (writing *p to the first element) is
different from the modification of writing i+1 to i?
The difference is simply that i in the LHS of the assignment and thus has to be updated. The array from the compound literal is not in the LHS and thus is of no concern for the update.
Section 6.5p2 of the C standard details why this is valid:
If a side effect on a scalar object is unsequenced relative to either
a different side effect on the same scalar object or a value
computation using the value of the same scalar object, the behavior is
undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an
unsequenced side effect occurs in any of the orderings. 84)
And footnote 84 states:
84) This paragraph renders undefined statement expressions such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a[i] = i;
The posted snippet from 6.5.2.5 falls under the latter, as there is no side effect.
In (int [2]){*p}, *p provides an initial value for the compound literal. This is not an assignment, and it is not a side effect. The initial value is part of the object when the object is created. There is no moment when the array exists and it is not initialized.
In p = (int [2]){*p}, we know the side effect of updating p is sequenced after the computation of the right side because C 2011 [N1570] 6.5.16 3 says “The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands.”

C order of evaluation of assignment statement

I've been encountered on a case where cross-platform code was behaving differently on a basic assignment statement.
One compiler evaluated the Lvalue first, Rvalue second and then the assignment.
Another compiler did the Rvalue first, Lvalue second and then the assignment.
This may have impact in case Lvalue influence the value of Rvalue as shown in the following case:
struct MM {
int m;
}
int helper (struct MM** ppmm ) {
(*ppmm) = (struct MM *) malloc (sizeof (struct MM));
(*ppmm)->m = 1000;
return 100;
}
int main() {
struct MM mm = {500};
struct MM* pmm = &mm
pmm->m = helper(&pmm);
printf(" %d %d " , mm.m , pmm->m);
}
The example above, the line pmm->m = helper(&mm);, depend on the order of evaluation. if Lvalue evaluated first, than pmm->m is equivalent to mm.m, and if Rvalue calculated first than pmm->m is equivalent to the MM instance that allocated on heap.
My question is whether there's a C standard to determine the order of evaluation (didn't find any), or each compiler can choose what to do.
are there any other similar pitfalls I should be aware of ?
The semantics for evaluation of an = expression include that
The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
(C2011, 6.5.16/3; emphasis added)
The emphasized provision explicitly permits your observed difference in the behavior of the program when compiled by different compilers. Moreover, unsequenced means, among other things, that it is permissible for the evaluations to occur in different order even in different runs of the very same build of the program. If the function in which the unsequenced evaluations appear were called more than once, then it would be permissible for the evaluations to occur in different order during different calls within the same execution of the program.
That already answers the question, but it's important to see the bigger picture. Modifying an object or calling a function that does so is a side effect (C2011, 5.1.2.3/2). This key provision therefore comes into play:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
(C2011, 6.5/2)
The called function has the side effect of modifying the value stored in main()'s variable pmm, evaluation of the left-hand operand of the assignment involves a value computation using the value of pmm, and these are unsequenced, therefore the behavior is undefined.
Undefined behavior is to be avoided at all costs. Because your program's behavior is undefined, is not limited to the two alternatives you observed (in case that wasn't bad enough). The C standard places no limitations whatever on what it may do. It might instead crash, zero out your hard drive's partition table, or, if you have suitable hardware, summon nasal demons. Or anything else. Most of these are unlikely, but the best viewpoint is that if your program has undefined behavior then your program is wrong.
When using the simple assignment operator: =, the order of evaluation of operands is unspecified. There is also no sequence point in between the evaluations.
For example if you have two functions:
*Get() = logf(2.0f);
It is not specified in which order they are called at any time, and yet this behavior is completely defined.
A function call will introduce a sequence point. It will happen after the evaluation of the arguments and before the actual call. The operator ; will also introduce a sequence point. This is important because an object must not be modified twice without an intervening sequence point, otherwise the behavior is undefined.
Your example is particularly complicated due to unspecified behavior, and may have different results, depending the left or right operand is evaluated first.
The left operand is evaluated first.
The left operand is evaluated and the pointer pmm will point to the struct mm. Then the function is called, and a sequence point occurs. it modifies the pointer pmm by pointing it to allocated memory, followed by a sequence point because of the operator ;. Then it stores the value 1000 to the member m, followed by another sequence point because of ;. The function returns 100 and assigns it to the left operand, but since the left operand was evaluated first, the value 100, it is assigned to the object mm, more specifically its member m.
mm->m has the value 100 and ppm->m has the value 1000. This is defined behavior, no object is modified twice in-between sequence points.
The right operand is evaluated first.
The function is called first, the sequence point occurs, it modifies the pointer ppm by pointing it to new allocated struct, followed by a sequence point. Then it stores the value 1000 to the member m, followed by a sequence point. Then the function returns. Then the left operand is evaluated, ppm->m will point to the new allocated struct, and its member m, is modified by assigning it the value 100.
mm->m will have the value 500 since it was never modified, and pmm->m will have the value 100. No object was modified twice in-between sequence points. The behavior is defined.

Does applying post-decrement on a pointer already addressing the base of an array invoke undefined behavior?

After hunting for a related or duplicate question concerning the following to no avail (I can only do marginal justice to describe the sheer number of pointer-arithmetic and post-decrement questions tagged with C, but suffice it to say "boatloads" does a grave injustice to that result set count) I toss this in the ring in hopes of clarification or a referral to a duplicate that eluded me.
If the post-decrement operator is applied to a pointer such as below, a simple reverse-iteration of an array sequence, does the following code invoke undefined behavior?
#include <stdio.h>
#include <string.h>
int main()
{
char s[] = "some string";
const char *t = s + strlen(s);
while(t-->s)
fputc(*t, stdout);
fputc('\n', stdout);
return 0;
}
It was recently proposed to me that 6.5.6.p8 Additive operators, in conjunction with 6.5.2.p4, Postfix increment and decrement operators, specifies even performing a post-decrement upon t when it already contains the base-address of s invokes undefined behavior, regardless of whether the resulting value of t (not the t-- expression result) is evaluated or not. I simply want to know if that is indeed the case.
The cited portions of the standard were:
6.5.6 Additive Operators
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined.
and its nearly tightly coupled relationship with...
6.5.2.4 Postfix increment and decrement operators Constraints
The operand of the postfix increment or decrement operator shall have
atomic, qualified, or unqualified real or pointer type, and shall be a
modifiable lvalue.
Semantics
The result of the postfix ++ operator is the value of the operand. As a side effect, the value of the operand object is incremented (that is, the value 1 of the appropriate type is added to it). See the discussions of additive operators and compound assignment for information on constraints, types, and conversions and the effects of operations on pointers. The value computation of the result is sequenced before the side effect of updating the stored value of the operand. With respect to an indeterminately-sequenced function call, the operation of postfix ++ is a single evaluation. Postfix ++ on an object with atomic type is a read-modify-write operation with memory_order_seq_cst memory order semantics.98)
The postfix -- operator is analogous to the postfix ++ operator, except that the value of the operand is decremented (that is, the value 1 of the appropriate type is subtracted from it).
Forward references: additive operators (6.5.6), compound assignment (6.5.16.2).
The very reason for using the post-decrement operator in the posted sample is to avoid evaluating an eventually-invalid address value against the base address of the array. For example, the code above was a refactor of the following:
#include <stdio.h>
#include <string.h>
int main()
{
char s[] = "some string";
size_t len = strlen(s);
char *t = s + len - 1;
while(t >= s)
{
fputc(*t, stdout);
t = t - 1;
}
fputc('\n', stdout);
}
Forgetting for a moment this has a non-zero-length string for s, this general algorithm clearly has issues (perhaps not as clearly to some). If s[] were instead "", then t would be assigned a value of s-1, which itself is not in the valid range of s through its one-past-address, and the evaluation for comparison against s that ensues is no good. If s has non-zero length, that addresses the initial s-1 problem, but only temporarily, as eventually this is still counting on that value (whatever it is) being valid for comparison against s to terminate the loop. It could be worse. it could have naively been:
size_t len = strlen(s) - 1;
char *t = s + len;
This has disaster written all over it if s were a zero-length string. The refactored code of this question opened with was intended to address all of these issues. But...
My paranoia may be getting to me, but it isn't paranoia if they're really all out to get you. So, per the standard (these sections, or perhaps others), does the original code (scroll to the top of this novel if you forgot what it looks like by now) indeed invoke undefined behavior or not?
I am pretty certain that the result of the post-decrement in this case is indeed undefined behaviour. The post-decrement clearly subtracts one from a pointer to the beginning of an object, so the result does not point to an element of the same array, and by the definition of pointer arithmetic (§6.5.6/8, as cited in the OP) that's undefined behaviour. The fact that you never use the resulting pointer is irrelevant.
What's wrong with:
char *t = s + strlen(s);
while (t > s) fputc(*--t, stdout);
Interesting but irrelevant fact: The implementation of reverse iterators in the standard C++ library usually holds in the reverse iterator a pointer to one past the target element. This allows the reverse iterator to be used normally without ever involving a pointer to "one before the beginning" of the container, which would be UB, as above.

Why ++*p++ works while ++i++ does not?

Assume p is a integer pointer and i is an integer:
*p++ gives an integer value corresponding to p.
i++ gives an integer value incremented by 1
Since by behavior, both the above yields integer, ++*p++ and ++i++ shouldn't have same error reported? But why ++*p++ works while ++i++ gives compiler error?
int main()
{
int a[10] = {0};
int *p = (int*)&a;
int i = 0;
// printf("%d", ++i++); -- FAILS error: lvalue required as increment operand
printf("%d\n", ++*p++ ); // Prints 1
return 0;
}
EDIT
++i++ is decomposed as following:
i++
++(result)
This is exactly where I am confused :
In the same way we can decompose ++*p++ as
*p++
++(result).
*p++ returns a value (rvalue) , not a pointer. So why the difference ?
The result of post-increment is an rvalue. You're not allowed to modify it. ++i++ attempts to modify that rvalue, which the compiler rejects.
p++ produces an rvalue, but it's of pointer type. You're not allowed to modify it, but you are allowed to dereference it. *p++ dereferences that rvalue. This gives you the value it points at, as an lvalue. The pre-increment then modifies the lvalue it points at, not the rvalue that was produced by the post-increment.
Edit: I should probably also add one more point: even if ++i++ was allowed by the compiler, the result would be undefined behavior, because it attempts to modify i twice without an intervening sequence point. In the case of ++*p++, that doesn't happen either -- the post increment modifies the pointer itself, while the pre-increment modifies what the pointer pointed at (before it was incremented). Since we're modifying two entirely different locations, the result in not undefined behavior.
If you wanted to badly enough, you could still get undefined behavior by initializing the pointer to point at itself, in which case both increments would attempt to modify the pointer. The committee didn't work very hard at preventing this, probably because only the truly pedantic would be at all likely to even think of such an insane thing.
Bottom line: in this case, the compiler is mostly trying to protect you from yourself, though you can still shoot yourself in the foot if you try hard enough.
++i++ is decomposed as following:
i++
++(result)
Problem: i++ returns a rvalue, ie a 'temporary' value, not incrementable. This is because i++ returns i before incrementing it.

Resources