Clarification c./ change in wording of C99 standard - c

I realize that merely asking about undefined behavior leads to downvotes by some, but I have a question comparing C99 v. Sep 2007 (the only one I have access to, and which so matters to me), and the one from 2011. The relevant quotes are from 6.5 (2) in either version:
2007: "Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored. (highlight added)"
2011: "If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the same scalar object, the behavior is undefined. (...)"
An example given to illustrate what contradicts this in the 2007 version is:
i = ++i + 1;
As C considers an assignment always an expression (no assignment statements), this expression is semantically delineated by 2 sequence points. It is fairly obvious that both versions declare the above to result in undefined behavior.
However, given the highlighted sentence of the 2007 version, it would be my understanding that even the following expression (lying again between two sequence points) would result in undefined behavior:
++i; // or i++; or a = ++i;
, clearly, the "value to be stored" is not only read ('stored' is a bit ambiguous, but I would naturally read it as the one read): it is read, incremented, then stored back. It is sequenced though, and so fine (as it probably should be) by the 2011 wording.
Was this adjustment to the wording made to address the above, in order to match intent to description?
Note: I realize that to an extent this is opinion-based, but (1) the best-case would be that someone actually involved in writing the standard sees this, and (2) while I believe my interpretation to be reasonable/"true", if someone argues convincingly against it, this would be useful too.

However, given the highlighted sentence of the 2007 version, it would be my understanding that even the following expression (lying again between two sequence points) would result in undefined behavior:
++i; // or i++; or a = ++i;
You understood wrong. This is best explained in c-faq question-3.8:
....And that's what the second sentence says: if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written. This rule effectively constrains legal expressions to those in which the accesses demonstrably precede the modification. For example, the old standby i = i + 1 is allowed, because the access of i is used to determine i's final value. The example
a[i] = i++
is disallowed because one of the accesses of i (the one in a[i]) has nothing to do with the value which ends up being stored in i (which happens over in i++), and so there's no good way to define--either for our understanding or the compiler's--whether the access should take place before or after the incremented value is stored.
In case of i++; or ++i; the access of i and its incrementation has to do with the value which ends up being stored in i.

Related

Isn't a[i++] = 1 (one) where, computation of the increment to be unsequenced relative to the indexing of the array, leading to violation of S6.5.2

Word limit on question length..
As pointed out by #Karl Knechtel I am confused that isn't fetching the operation of the array indexing unsequenced relative to the i++ increment operation? If they are unsequenced, why the C Standard 6.5.2 line mentioning about (emphasis added to the words/phrase which i understand, applies here)
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the
behavior is undefined.
I read this question I can not understand some sentences in C99 wherein the OP tries to understand why a[i++] = 1 is undefined. Accepted and one of the highest voted answers by Pascal Cuoq mentions that this is defined behavior.
I also tried compiling the program using the -std=c99, -Wall and -Wextra flag and a slew of other flags (basically all the flags which are enabled in GCC 11.2.0), but the code didn't throw any warning.
However, my question/confusion is why is this a defined behaviour?
From the C11 standard S6.5.2
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the
behavior is undefined. If there are multiple allowable orderings of the subexpressions of an
expression, the behaviour is undefined if such an unsequenced side effect occurs in any of the
orderings.
my understanding/reasoning after reading through most of the threads on SO (with Tags [C] and [sequence-points]) is that i++ would result in a side effect of updating the value of i. in that case this side-effect is unsequenced to the value computation using the same scaler object. I understand that a[integer object] constitutes value computation. Then, it should be undefined behavior?
Even from the C99 S6.5(p2)
Furthermore, the prior value shall be read only to determine the value to be stored.
I understand/construe that this expression should also render a[i++] = 1 undefined?
in that case this side-effect is unsequenced to the value computation using the same scaler object.
The scalar object involved in i++ is i. The side effect of updating i is not unsequenced relative to the computation of the value of i++ because C 2018 6.5.2.4 (which specifies behavior of postfix increment and decrement operators) paragraph 2 says:
… The value computation of the result is sequenced before the side effect of updating the stored value of the operand…
C 2011 has the same wording. (C 2018 contains only technical corrections and clarifications to C 2011.)
Even from the C99 S6.5(p2)
Furthermore, the prior value shall be read only to determine the value to be stored.
A rule in the C 1999 standards has no application to the 2011 or 2018 standards; it must be interpreted separately. Between 1999 and 2011, the standard moved from solitary sequence points to finer rules about sequencing relationships.
In i++, the prior value is read to determine what the new value of i should be, so it conforms to that rule.
The rule was an attempt to say that any reads of a scalar object had to be in the prerequisite chain of an writes of the object. For example, in i = 3*i + i*i, all three reads of i are necessary to compute the value to be written to i, so they are necessarily performed before the write. But in i = ++i + i;, the read of i for that last term is not a prerequisite for writing to i for the ++i, so it is not necessarily performed before the write. Thus, it would not conform to the rule.
… I am confused that isn't fetching the operation of the array indexing unsequenced relative to the i++ increment operation?
The read of the array element is unsequenced relative to the update of i, and that is okay because there is no rule that requires it to be sequenced. C 2018 6.5 2 says, emphasis added:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
The array element is a different scalar object from i, so we do not care that there is no sequencing between the read of the array element and the update to i.
Thanks a lot for the immediate responses from members. I would try to attempt an answer in the language, which, I understood.
After reading a suggestion to read this article use of abstract tree to tackle sequence point problem (I know, it's not normative)
Let me represent a[i++] =1 using the abstract syntax tree.

Why does this code print 1 2 2 and not the expected 3 3 1? [duplicate]

This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Closed 2 years ago.
Notice: this is a self-Q/A and a more visible targeting the erroneous information promoted by the book "Let us C". Also, please let's keep the c++ out of the discussion, this question is about C.
I am reading the book "Let us C" by Yashwant Kanetkar.
In the book there is the following example:
#include <stdio.h>
int main(void) {
int a = 1;
printf("%d %d %d", a, ++a, a++);
}
The author claims that this code should output 3 3 1:
Surprisingly, it outputs 3 3 1. This is
because C’s calling convention is from right to left. That is, firstly
1 is passed through the expression a++ and then a is incremented
to 2. Then result of ++a is passed. That is, a is incremented to 3
and then passed. Finally, latest value of a, i.e. 3, is passed. Thus in
right to left order 1, 3, 3 get passed. Once printf( ) collects them it
prints them in the order in which we have asked it to get them
printed (and not the order in which they were passed). Thus 3 3 1
gets printed.
However when I compile the code and run it with clang, the result is 1 2 2, not 3 3 1; why is that?
The author is wrong. Not only is the order of evaluation of function arguments unspecified in C, the evaluations are unsequenced with regards to each other. Adding to the injury, reading and modifying the same object without an intervening sequence point in independent expressions (here the value of a is evaluated in 3 independent expressions and modified in 2) has undefined behaviour, so the compiler has the liberty of producing any kind of code that it sees fit.
For details, see Why are these constructs using pre and post-increment undefined behavior?
C’s calling convention
This has nothing to do with calling convention! And C does not even specify a certain calling convention - "cdecl" etc are x86 PC inventions (and have nothing to do with this). The correct and formal C language term is order of evaluation.
The order of evaluation is unspecified behavior (formally defined term), meaning that we can't know if it is left to right or right to left. The compiler need not document it and need not have a consistent order from case to case basis.
But there is a more severe problem yet here: the so-called unsequenced side-effects. C17 6.5/2 states:
If a side effect on a scalar object is unsequenced relative to either a different side effect
on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an unsequenced side
effect occurs in any of the orderings.
This text is quite hard to digest for normal humans. A rough, simplified translation from language-lawyer nerd language to plain English:
In case the binary operators1) used in the expression don't explicitly state the order that the operands are executed2) and,
a side-effect, such as changing the value, happens to a variable in the expression, and,
that same variable is used elsewhere in the same expression,
then the program is broken and might do anything.
1) Operators with 2 operands.
2) Most operators don't do this, only a few exceptions like || && , operators do so.
The author is wrong and the book has multiple instances of incorrect statements like this one.
In C, the behavior of printf("%d %d %d", a, ++a, a++); is undefined both because the order of evaluation of function arguments is unspecified and because modifying the same object multiple times between 2 sequence points has undefined behavior just to name these two.
Note that the book is referenced as do not use in The Definitive C Book Guide and List for providing incorrect advice with this precise example.
Note also that other languages may have a different take on this kind of statement, notably java where the behavior is fully defined.
I read many years ago in University (C step by step by Mitchell Waite) which c compiler (some compilers) uses stack for printf by pushing arguments from right to left (to specifier) and then pop them one by one and print them.
I write this code and the output is 3 3 1 : Online Demo.
Based on the book in the stack we have something like this
But after a minor challenges with experts (in comments) I found that maybe in some compilers this sequence be true but not for all.
#Lundin provided this code and the output is 1 2 2 :Online Demo
and #Bob__ provided another example which output is totally different: Online Demo
It totally depends on compiler implementation and has undefined behaviour.

Why does c = ++(a+b) give compilation error?

After researching, I read that the increment operator requires the operand to have a modifiable data object: https://en.wikipedia.org/wiki/Increment_and_decrement_operators.
From this I guess that it gives compilation error because (a+b) is a temporary integer and so is not modifiable.
Is this understanding correct? This was my first time trying to research a problem so if there was something I should have looked for please advise.
It's just a rule, that's all, and is possibly there to (1) make it easier to write C compilers and (2) nobody has convinced the C standards committee to relax it.
Informally speaking you can only write ++foo if foo can appear on the left hand side of an assignment expression like foo = bar. Since you can't write a + b = bar, you can't write ++(a + b) either.
There's no real reason why a + b couldn't yield a temporary on which ++ can operate, and the result of that is the value of the expression ++(a + b).
The C11 standard states in section 6.5.3.1
The operand of the prefix increment or decrement operator shall have
atomic, qualified, or unqualified real or pointer type, and shall be a
modifiable lvalue
And "modifiable lvalue" is described in section 6.3.2.1 subsection 1
An lvalue is an expression (with an object type other than void) that
potentially designates an object; if an lvalue does not designate an
object when it is evaluated, the behavior is undefined. When an
object is said to have a particular type, the type is
specified by the lvalue used to designate the object. A modifiable
lvalue is an lvalue that does not have array type, does not have
an incomplete type, does not have a const-qualified type, and
if it is a structure or union, does not have any member
(including, recursively, any member or element of all contained
aggregates or unions) with a const-qualified type.
So (a+b) is not a modifiable lvalue and is therefore not eligible for the prefix increment operator.
You are correct. the ++ tries to assign the new value to the original variable. So ++a will take the value of a, adds 1 to it and then assign it back to a. Since, as you said, (a+b) is a temp value, and not a variable with assigned memory address the assignment can't be performed.
I think you mostly answered your own question.
I might make a small change to your phrasing and replace "temporary variable" with "rvalue" as C.Gibbons mentioned.
The terms variable, argument, temporary variable and so on will become more clear as you learn about C's memory model (this looks like a nice overview: https://www.geeksforgeeks.org/memory-layout-of-c-program/ ).
The term "rvalue" may seem opaque when you're just starting out, so I hope the following helps with developing an intuition about it.
Lvalue/rvalue are talking about the different sides of an equals sign (assignment operator):
lvalue = left hand side (lowercase L, not a "one")
rvalue = right hand side
Learning a little about how C uses memory (and registers) will be helpful for seeing why the distinction is important. In broad brush strokes, the compiler creates a list of machine language instructions that compute the result of an expression (the rvalue) and then puts that result somewhere (the lvalue). Imagine a compiler dealing with the following code fragment:
x = y * 3
In assembly pseudocode it might look something like this toy example:
load register A with the value at memory address y
load register B with a value of 3
multiply register A and B, saving the result in A
write register A to memory address x
The ++ operator (and its -- counterpart) need a "somewhere" to modify, essentially anything that can work as an lvalue.
Understanding the C memory model will be helpful because you'll get a better idea in your head about how arguments get passed to functions and (eventually) how to work with dynamic memory allocation, like the malloc() function. For similar reasons you might study some simple assembly programming at some point to get a better idea of what the compiler is doing. Also if you're using gcc, the -S option "Stop after the stage of compilation proper; do not assemble." can be interesting (though I'd recommend trying it on a small code fragment).
Just as an aside:
The ++ instruction has been around since 1969 (though it started in C's predecessor, B):
(Ken Thompson's) observation (was) that the translation of ++x was smaller than that of x=x+1."
Following that wikipedia reference will take you to an interesting writeup by Dennis Ritchie (the "R" in "K&R C") on the history of the C language, linked here for convenience: http://www.bell-labs.com/usr/dmr/www/chist.html where you can search for "++".
The reason is that the standard requires the operand being an lvalue. The expression (a+b) is not a lvalue, so applying the increment operator isn't allowed.
Now, one might say "OK, that's indeed the reason, but there is actually no *real* reason other than that", but unluckily the particular wording of how the operator works factually does require that to be the case.
The expression ++E is equivalent to (E+=1).
Obviously, you cannot write E += 1 if E isn't a lvalue. Which is a shame because one could just as well have said: "increments E by one" and be done. In that case, applying the operator on a non-lvalue would (in principle) be perfectly possible, at the expense of making the compiler slightly more complex.
Now, the definition could trivially be reworded (I think it isn't even originally C but an heirloom of B), but doing so would fundamentally change the language to something that's no longer compatible with its former versions. Since the possible benefit is rather small but the possible implications are huge, that never happened and probably is never going to happen.
If you consider C++ in addition to C (question is tagged C, but there was discussion about operator overloads), the story becomes even more complicated. In C, it's hard to imagine that this could be the case, but in C++ the result of (a+b) could very well be something that you cannot increment at all, or incrementing could have very considerable side effects (not just adding 1). The compiler must be able to cope with that, and diagnose problematic cases as they occur. On a lvalue, that's still kinda trivial to check. Not so for any kind of haphazard expression inside a parenthesis that you throw at the poor thing.
This isn't a real reason why it couldn't be done, but it sure lends as an explanation why the people who implemented this are not precisely ecstatic to add such a feature which promises very little benefit to very few people.
(a+b) evaluates to an rvalue, which cannot be incremented.
++ tries to give the value to the original variable and since (a+b) is a temp value it cannot perform the operation. And they are basically rules of the C programming conventions to make the programming easy. That's it.
When ++(a+b) expression performed, then for example :
int a, b;
a = 10;
b = 20;
/* NOTE :
//step 1: expression need to solve first to perform ++ operation over operand
++ ( exp );
// in your case
++ ( 10 + 20 );
// step 2: result of that inc by one
++ ( 30 );
// here, you're applying ++ operator over constant value and it's invalid use of ++ operator
*/
++(a+b);

Order of evaluation: subexpressions, sequence points and postfix increments in C

A discussion arose around the C statement x = b[i] + i++; and its definedness.
The argument for said statement to be undefined goes something like this:
§ 6.5 of C99 states:
[…] the order of evaluation of subexpressions and the order in which side effects take place are both unspecified.
Thus it is not guaranteed that i is incremented after it is used in the subscript operator as index of the array.
However, I interpret said specification differently.
§ 6.5 of C99 additionally states:
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.
§ 5.1.2.3 of C99 states:
At
certain specified points in the execution sequence called sequence points, all side effects
of previous evaluations shall be complete and no side effects of subsequent evaluations
shall have taken place.
A list of sequence points is given in annex C and only the following matches the statement in question IMHO.
The end of a full expression
The evaluation of b[i] (the value of element i of b) and that of i++ (just i) can happen in any order before the addition (and evaluation of =, which is the value of the RHS) is done. However, the side effects of the whole statement are deferred until after all these evaluations because that's the only sequence point. In this case the side effects are the change of x and the increment of i.
Who is right? Are there additional paragraphs relevant for the argument? Is it any different in C++?
Side effects don't have to be deferred until the sequence point -- they may be applied immediately upon evaluation. Or not.
C 2011 has some slightly different (more precise) language:
If a side effect on a scalar object is unsequenced relative to either a different side effect
on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined.
C 2011, §6.5 ¶2
i++ has a side effect on i, b[i] uses i in a value computation, and the two subexpressions are unsequenced relative to each other (i.e., there is no intervening sequence point). Thus, the behavior of b[i] + i++ is undefined.
Your quotation from section 6.5 is the relevant one:
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
[In that event, f]urthermore, the prior value
shall be read [between those sequence points] only to determine the value
to be stored.
(Clarifications mine.)
In your statement, the value of i is both modified and used as an index into b. Your statement contains no internal sequence points, so these effects must occur between the same pair of sequence points. The statement therefore violates the quoted requirement. Section 4, paragraph 2 then applies:
If a ''shall'' or ''shall not'' requirement that appears outside of a
constraint is violated, the behavior is undefined. [...]
That's all there is to it. No other considerations are required. Your argument about actual order of operations is completely irrelevant.
Nevertheless, your claim that
the side effects of the whole statement are deferred until after all
these evaluations because that's the only sequence point.
reflects a serious misunderstanding of sequence points. Sequence points do not represent times when things happen, but rather boundaries between which things happen. Not only are side effects not deferred to the next sequence point, they are far less constrained (by the standard) than operations involved in computing the values of expressions.
§6.5.2.4 states
The side effect of updating the stored value of the operand shall occur between the previous and the next sequence point.
Just like Eugene's comment suggested. In case this is not clear enough the statement cited in the question in § 6.5 (1)
the prior value shall be read only to determine the value to be stored.
is violated directly as well. The value of i is not just read to determine the value after incrementing but also as operand of the subscript operator.
This question and its accepted answer might give additional insights as it discusses the sequence points introduced by the , operator and its interaction with the potential UB-provoking behavior of assignments.

Strange C precedence evaluation

Can somebody explain what is happening with the precedence in this code? I've be trying to figure out what is happening by myself but I could'nt handle it alone.
#include <stdio.h>
int main(void) {
int v[]={20,35,76,80};
int *a;
a=&v[1];
--(*++a);
printf("%d,%d,%d,%d\n",v[0],v[1],v[2],v[3]);
(*++a);
printf("%d\n", *a);
*a--=*a+1; // WHAT IS HAPPENING HERE?
printf("%d\n", *a);
printf("%d,%d,%d,%d\n",v[0],v[1],v[2],v[3]);
}
//OUTPUT
20,35,75,80
80
75
20,35,75,76
*a--=*a+1; // WHAT IS HAPPENING HERE?
What's happening is that the behavior is undefined.
6.5 Expressions
...
2 If a side effect on a scalar object is unsequenced relative to either a different side effect
on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an unsequenced side
effect occurs in any of the orderings.84)
3 The grouping of operators and operands is indicated by the syntax.85) Except as specified
later, side effects and value computations of subexpressions are unsequenced.86)
C 2011 Online Draft (N1570)
The expressions *a-- and *a are unsequenced relative to each other. Except in a few cases, C does not guarantee that expressions are evaluated left to right; therefore, it's not guaranteed that *a-- is evaluated (and the side effect applied) before *a.
*a-- has a side effect - it updates a to point to the previous element in the sequence. *a + 1 is a value computation - it adds 1 to the value of what a currently points to.
Depending on the order that *a-- and *a are evaluated and when the side effect of the -- operator is actually applied, you could be assigning the result of v[1] + 1 to v[0], or v[1] + 1 to v[1], or v[0] + 1 to v[0], or v[0] + 1 to v[1], or something else entirely.
Since the behavior is undefined, the compiler is not required to do anything in particular - it may issue a diagnostic and halt translation, it may issue a diagnostic and finish translation, or it may finish translation without a diagnostic. At runtime, the code may crash, you may get an unexpected result, or the code may work as intended.
I'm not going to explain the whole program; I'm going to focus on the "WHAT IS HAPPENING HERE" line. I think we can agree that before this line, the v[] array looks like this, with a pointing at v's last element:
+----+----+----+----+
v: | 20 | 35 | 75 | 80 |
+----+----+----+----+
0 1 2 3
^
+-|-+
a: | * |
+---+
Now, we have
*a-- = *a+1;
It looks like this is going to assign something to where a points, and decrement a. So it looks like it will assign something to v[3], but leave a pointing at v[2].
And the value that gets assigned will evidently be the value that a points to, plus 1.
But the key question is, when we take *a+1 on the right-hand side, will it use the old or the new value of a, before or after the decrement on the right-hand side? It turns out this is a really, really hard question to answer.
If we take the value after the decrement, it'll be a[2], plus 1, or 76 that gets assigned to a[3]. It looks like that's how your compiler interpreted it. And this makes a certain amount of sense, because when we read from left to right, it's easy to imagine that by the time we get around to computing *a+1, the a-- has already happened.
Or, if we took the value before the decrement, it would be a[3], plus 1, or 81 that gets assigned to a[3]. And that's how it was interpreted by three different compilers I tried it on. And this makes a certain amount of sense, too, because of course assignments actually proceed from right to left, so it's easy to imagine that *a+1 happens before the a-- on the left-hand side.
So which compiler is correct, yours or mine, and which is wrong? This is where the answer gets a little strange, and/or surprising. The answer is that neither compiler is wrong. This is because it turns out that it's not just really hard to decide what should happen here, it is (by definition) impossible to figure out what happens here. The C standard does not define how this expression should behave. In fact, it goes one farther than not defining how this expression should behave: the C Standard explicitly says that this expression is undefined. So your compiler is right to put 76 in v[3], and my compilers are right to put 81. And since "undefined behavior" means that anything can happen, it wouldn't be wrong for a compiler to arrange to put some other number into v[3], or to end up assigning to something other than v[3].
So the other part of the answer is that you must not write code like this. You must not depend on undefined behavior. It will do different things under different compilers. It may do something completely unpredictable. It is impossible to understand, maintain, or explain.
It's pretty easy to detect when an expression is undefined due to order-of-evaluation ambiguity. There are two cases: (1) the same variable gets modified twice, as in x++ + x++. (2) The same variable gets modified in one place, and used in another, as in *a-- = *a+1.
It's worth noting that one of the three compilers I used said "eo.c:15: warning: unsequenced modification and access to 'a'", and another said "eo.c:15:5: warning: operation on ‘a’ may be undefined". If your compiler has an option to enable warnings like these, use it! (Under gcc it's -Wsequence-point or -Wall. Under clang, it's -Wunsequenced or -Wall.)
See John Bode's answer for the detailed language from the C Standard that makes this expression undefined. See also the canonical StackOverflow question on this topic, Why are these constructs (using ++) undefined behavior?
Not exactly sure which expression you have problems with. Increment and decrement operators have the highest precedence. Dereference comes after. Addition, substraction, after.
But with regards to assignment, C does not specify order of evaluation (right to left or left to right).
will right hand side of an expression always evaluated first
C does not specify which of the right hand side or left hand side of the = operator is evaluated first.
*a--=*a+1;
So it could be that your pointer a is decremented first or after it's dereferenced on the right hand side.
In other words, depending on the compiler this expression could be equivalent to either:
a--;
*a = *a+1;
or
*(a-1)=*a+1;
a--;
I personally never rely too much on operator precedence in my code. I makes it more legible to either put parenthesis or separate in different lines.
Unless you're building a compiler yourself and need to make a decision to what assembly code to generate.

Resources