Parentheses operator clarification - c

int a=10,b=20;
b = a+b-(a=b);
In this expression why (a=b) is not first operation? if it is performs according to priority then it b has to get 20 itself. But b is getting 10 itself, why? Could any one clarify my doubt?

This invokes undefined behavior. Anything could be happen. Note that it is sure here that (a=b) evaluates before the subtraction but it does not guarantee that the value of b get assigned to a just after the evaluation. a may get modified after the next sequence point (the ; of the statement here).
The Standard states that
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored.
Suggested reading: c-faq Question 3.8

Related

In place vector "abs" warning: operation may be undefined? [duplicate]

A sequence point in imperative programming defines any point in a computer program's execution at which it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been performed.
What does this mean? Can somebody please explain it in simple words?
When a sequence point occurs, it basically means that you are guaranteed that all previous operations are complete.
Changing a variable twice without an intervening sequence point is one example of undefined behaviour.
For example, i = i++; is undefined because there's no sequence point between the two changes to i.
Note that it's not just changing a variable twice that can cause a problem. It's actually a change involved with any other use. The standard uses the term "value computation and side effect" when discussing how things are sequenced. For example, in the expression a = i + i++, the i (value computation) and i++ (side effect) may be done in arbitrary order.
Wikipedia has a list of the sequence points in the C and C++ standards although the definitive list should always be taken from the ISO standard. From C11 appendix C (paraphrased):
The following are the sequence points described in the standard:
Between the evaluations of the function designator and actual arguments in a function call and the actual call;
Between the evaluations of the first and second operands of the operators &&, ||, and ,;
Between the evaluations of the first operand of the conditional ?: operator and whichever of the second and third operands is evaluated;
The end of a full declarator;
Between the evaluation of a full expression and the next full expression to be evaluated. The following are full expressions:
an initializer;
the expression in an expression statement;
the controlling expression of a selection statement (if or switch);
the controlling expression of a while or do statement;
each of the expressions of a for statement;
the expression in a return statement.
Immediately before a library function returns;
After the actions associated with each formatted input/output function conversion specifier;
Immediately before and immediately after each call to a comparison function, and also between any call to a comparison function and any movement of the objects passed as arguments to that call.
An important thing to note about sequence points is that they are not global, but rather should be regarded as a set of local constraints. For example, in the statement
a = f1(x++) + f2(y++);
There is a sequence point between the evaluation of x++ and the call to f1, and another sequence point between the evaluation of y++ and the call to f2. There is, however, no guarantee as to whether x will be incremented before or after f2 is called, nor whether y will be incremented before or after x is called. If f1 changes y or f2 changes x, the results will be undefined (it would be legitimate for the compiler's generated code to e.g. read x and y, increment x, call f1, check y against the previously-read value, and--if it changed--go on a rampage seeking out and destroying all Barney videos and merchandise; I don't think any real compilers generate code that would actually do that, alas, but it would be permitted under the standard).
Expanding on paxdiablo's answer with an example.
Assume the statement
x = i++ * ++j;
There are three side effects: assigning the result of i * (j+1) to x, adding 1 to i, and adding 1 to j. The order in which the side effects are applied is unspecified; i and j may each be incremented immediately after being evaluated, or they may not be incremented until after both have been evaluated but before x has been assigned, or they may not be incremented until after x has been assigned.
The sequence point is the point where all side effects have been applied (x, i, and j have all been updated), regardless of the order in which they were applied.
It means a compiler may do funky optimizations, tricks and magic but must reach a well-defined state at these so-called sequence points.

Does a[a[0]] = 1 produce undefined behavior?

Does this C99 code produce undefined behavior?
#include <stdio.h>
int main() {
int a[3] = {0, 0, 0};
a[a[0]] = 1;
printf("a[0] = %d\n", a[0]);
return 0;
}
In the statement a[a[0]] = 1; , a[0] is both read and modified.
I looked n1124 draft of ISO/IEC 9899. It says (in 6.5 Expressions):
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
It does not mention reading an object to determine the object itself to be modified. Thus this statement might produce undefined behavior.
However, I feel it strange. Does this actually produce undefined behavior?
(I also want to know about this problem in other ISO C versions.)
the prior value shall be read only to determine the value to be stored.
This is a bit vague and caused confusion, which is partly why C11 threw it out and introduced a new sequencing model.
What it is trying to say is that: if reading the old value is guaranteed to occur earlier in time than writing the new value, then that's fine. Otherwise it is UB. And of course it is a requirement that the new value be computed before it is written.
(Of course the description I have just written will be found by some to be more vague than the Standard text!)
For example x = x + 5 is correct because it is not possible to work out x + 5 without first knowing x. However a[i] = i++ is wrong because the read of i on the left hand side is not required in order to work out the new value to store in i. (The two reads of i are considered separately).
Back to your code now. I think it is well-defined behaviour because the read of a[0] in order to determine the array index is guaranteed to occur before the write.
We cannot write until we have determined where to write. And we do not know where to write until after we read a[0]. Therefore the read must come before the write, so there is no UB.
Someone commented about sequence points. In C99 there is no sequence point in this expression, so sequence points do not come into this discussion.
Does this C99 code produce undefined behavior?
No. It will not produce undefined behavior. a[0] is modified only once between two sequence points (first sequence point is at the end of initializer int a[3] = {0, 0, 0}; and second is after the full expression a[a[0]] = 1).
It does not mention reading an object to determine the object itself to be modified. Thus this statement might produce undefined behavior.
An object can be read more than once to modify itself and its a perfectly defined behavior. Look at this example
int x = 10;
x = x*x + 2*x + x%5;
Second statement of the quote says:
Furthermore, the prior value shall be read only to determine the value to be stored.
All the x in the above expression is read to determine the value of object x itself.
NOTE: Note that there are two parts of the quote mentioned in the question. First part says: Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression., and
therefore the expression like
i = i++;
comes under UB (Two modifications between previous and next sequence points).
Second part says: Furthermore, the prior value shall be read only to determine the value to be stored., and therefore the expressions like
a[i++] = i;
j = (i = 2) + i;
invoke UB. In both expressions i is modified only once between previous and next sequence points, but the reading of the rightmost i do not determine the value to be stored in i.
In C11 standard this has been changed to
6.5 Expressions:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. [...]
In expression a[a[0]] = 1, there is only one side effect to a[0] and the value computation of index a[0] is sequenced before the value computation of a[a[0]].
C99 presents an enumeration of all the sequence points in annex C. There is one at the end of
a[a[0]] = 1;
because it is a complete expression statement, but there are no sequence points inside. Although logic dictates that the subexpression a[0] must be evaluated first, and the result used to determine to which array element the value is assigned, the sequencing rules do not ensure it. When the initial value of a[0] is 0, a[0] is both read and written between two sequence points, and the read is not for the purpose of determining what value to write. Per C99 6.5/2, the behavior of evaluating the expression is therefore undefined, but in practice I don't think you need to worry about it.
C11 is better in this regard. Section 6.5, paragraph (1) says
An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof. The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.
Note in particular the second sentence, which has no analogue in C99. You might think that would be sufficient, but it isn't. It applies to the value computations, but it says nothing about the sequencing of side effects relative to the value computations. Updating the value of the left operand is a side effect, so that extra sentence does not directly apply.
C11 nevertheless comes through for us on this one, as the specifications for the assignment operators provide the needed sequencing (C11 6.5.16(3)):
[...] The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
(In contrast, C99 just says that updating the stored value of the left operand happens between the previous and next sequence points.) With sections 6.5 and 6.5.16 together, then, C11 gives a well-defined sequence: the inner [] is evaluated before the outer [], which is evaluated before the stored value is updated. This satisfies C11's version of 6.5(2), so in C11, the behavior of evaluating the expression is defined.
The value is well defined, unless a[0] contains a value that is not a valid array index (i.e. in your code is not negative and does not exceed 3). You could change the code to the more readable and equivalent
index = a[0];
a[index] = 1; /* still UB if index < 0 || index >= 3 */
In the expression a[a[0]] = 1 it is necessary to evaluate a[0] first. If a[0] happens to be zero, then a[0] will be modified. But there is no way for a compiler (short of not complying with the standard) to change order of evaluations and modify a[0] before attempting to read its value.
A side effect includes modification of an object1.
The C standard says that behavior is undefined if a side effect on object is unsequenced with a side effect on the same object or a value computation using the value of the same object2.
The object a[0] in this expression is modified (side effect) and it's value (value computation) is used to determine the index. It would seem this expression yields undefined behavior:
a[a[0]] = 1
However the text in assignment operators in the standard, explains that the value computation of both left and right operands of the operator =, is sequenced before the left operand is modified3.
The behavior is thus defined, as the first rule1 isn't violated, because the modification (side effect) is sequenced after the value computation of the same object.
1 (Quoted from ISO/IEC 9899:201x 5.1.2.3 Program Exectution 2):
Accessing a volatile object, modifying an object, modifying a file, or calling a function
that does any of those operations are all side effects, which are changes in the state of
the execution environment.
2 (Quoted from ISO/IEC 9899:201x 6.5 Expressions 2):
If a side effect on a scalar object is unsequenced relative to either a different side effect
on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined.
3 (Quoted from ISO/IEC 9899:201x 6.5.16 Assignment operators 3):
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands. The evaluations of
the operands are unsequenced.

Pre-increment operator returns lvalue or rvalue? [duplicate]

This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Closed 9 years ago.
Going through other questions here, I've found that pre-increment operator in C returns rvalue, not lvalue. But, on trying the code below
int a=35;
printf("%d %d %d %d %d",a++,a,++a,a++,++a);
I expected the output
38 39 38 36 36
But I get the output
38 39 39 36 39
If pre-increment returns rvalue, then why is the final value of 'a' printed (implying lvalue is returned) instead of the value returned immediately after incrementing is performed?
You are invoking undefined behavior: Commas separating function call parameters are not the "comma operator". And unlike it, they are not sequence points.
Also, your example has nothing to do with whether the operator return rvalue or lvalue, since your code doesn't attempt to modify the returned value.
When you write a function call such as printf("%d %d %d %d %d",a++,a,++a,a++,++a);, it turns into a set of operations to be performed:
The first argument evaluates to a pointer to the string.
The second argument results in two operations: Produce the value of a. Set a to the value of a incremented by one.
The third argument evaluates to the value of a.
The fourth argument results in two operations: Produce one more than the value of a. Set a to the value of a increment by one.
The fifth argument results in two operations: Produce the value of a. Set a to the value of a incremented by one.
The sixth argument results in two operations: Produce one more than the value of a. Set a to the value of a increment by one.
The function call passes the arguments to the function and calls it.
The C implementation has to perform all these operations. However, the C standard does not fully specify the order in which they are performed. Per C 2011 (N1570) 6.5.2.3 10, there is a sequence point after the evaluations of the arguments and before the call. This means that the operations resulting from the arguments themselves must be performed before the call is performed.
However, the C standard does not say in what order the arguments are evaluated. It does not even say that the two operations resulting from a single expression (as in a++) have to be performed together. The C implementation is allowed to prepare the value of a, then evaluate some other arguments, then later set a to the value of a incremented by one.
This means that your function call might be using the value of a and setting the value of a, multiple times, in any number of different orders. That is a problem because it violates a rule in the C standard. 6.5 2 says:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.…
In this passage, “unsequenced” means that two operations do not have a definite order set by the C standard: If a C implementation is allowed to perform two operations in either order, then they are unsequenced. Your printf call contains several unsequenced operations that have side effects on a (setting a is a side effect of a++) and that compute a value (simply using a to pass an argument “computes” its value).
Therefore, your printf call violates the rules of the C standard, and the resulting behavior is undefined.
Every normal expression you write should modify any object only once. Never use a++ twice in the same expression. Additionally, if you modify an object, you cannot separately use it. If you have a++ one place in an expression, you cannot have a separately in another place.
Some special operators have exceptions. The && and || operators are required to evaluate their left operands first. This causes subexpressions on the left to be sequenced before subexpressions on the right, and that avoids violating the rule in C 2011 6.5 2. Therefore a++ && a++ is legal. The comma operator also evaluates subexpressions on the left first, so a++, ++a, a is legal. However, arguments to a function call are a list of arguments, not a use of the comma operator.

Defined behaviour for expressions

The C99 Standard says in $6.5.2.
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.
(emphasis by me)
It goes on to note, that the following example is valid (which seems obvious at first)
a[i] = i;
While it does not explicitly state what a and i are.
Although I believe it does not, I'd like to know whether this example covers the following case:
int i = 0, *a = &i;
a[i] = i;
This will not change the value of i, but access the value of i to determine the address where to put the value. Or is it irrelevant that we assign a value to i which is already stored in i? Please shed some light.
Bonus question; What about a[i]++ or a[i] = 1?
The first sentence:
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
is clear enough. The language doesn't impose an order of evaluation on subexpressions unless there's a sequence point between them, and rather than requiring some unspecified order of evaluation, it says that modifying an object twice produces undefined behavior. This allows aggressive optimization while still making it possible to write code that follows the rules.
The next sentence:
Furthermore, the prior value shall be read only to determine the value to be stored
does seem unintuitive at first (and second) glance; why should the purpose for which a value is read affect whether an expression has defined behavior?
But what it reflects is that if a subexpression B depends on the result of a subexpression A, then A must be evaluated before B can be evaluated. The C90 and C99 standards do not state this explicitly.
A clearer violation of that sentence, given in an example in the footnote, is:
a[i++] = i; /* undefined behavior */
Assuming that a is a declared array object and i is a declared integer object (no pointer or macro trickery), no object is modified more than once, so it doesn't violate the first sentence. But the evaluation of i++ on the LHS determines which object is to be modified, and the evaluation of i on the RHS determines the value to be stored in that object -- and the relative order of the read operation on the RHS and the write operation on the LHS is not defined. Again, the language could have required the subexpressions to be evaluated in some unspecified order, but instead it left the entire behavior undefined, to permit more aggressive optimization.
In your example:
int i = 0, *a = &i;
a[i] = i; /* undefined behavior (I think) */
the previous value of i is read both to determine the value to be stored and to determine which object it's going to be stored in. Since a[i] refers to i (but only because i==0), modifying the value of i would change the object to which the lvalue a[i] refers. It happens in this case that the value stored in i is the same as the value that was already stored there (0), but the standard doesn't make an exception for stores that happen to store the same value. I believe the behavior is undefined. (Of course the example in the standard wasn't intended to cover this case; it implicitly assumes that a is a declared array object unrelated to i.)
As for the example that the standard says is allowed:
int a[10], i = 0; /* implicit, not stated in standard */
a[i] = i;
one could interpret the standard to say that it's undefined. But I think that the second sentence, referring to "the prior value", applies only to the value of an object that's modified by the expression. i is never modified by the expression, so there's no conflict. The value of i is used both to determine the object to be modified by the assignment, and the value to be stored there, but that's ok, since the value of i itself never changes. The value of i isn't "the prior value", it's just the value.
The C11 standard has a new model for this kind of expression evaluation -- or rather, it expresses the same model in different words. Rather than "sequence points", it talks about side effects being sequenced before or after each other, or unsequenced relative to each other. It makes explicit the idea that if a subexpression B depends on the result of a subexpression A, then A must be evaluated before B can be evaluated.
In the N1570 draft, section 6.5 says:
1 An expression is a sequence of operators and operands that
specifies computation of a value, or that designates an object
or a function, or that generates side effects, or that performs
a combination thereof. The value computations of the operands
of an operator are sequenced before the value computation of the
result of the operator.
2 If a side effect on a scalar object is unsequenced relative to
either a different side effect on the same scalar object or a
value computation using the value of the same scalar object, the
behavior is undefined. If there are multiple allowable orderings
of the subexpressions of an expression, the behavior is undefined
if such an unsequenced side effect occurs in any of the orderings.
3 The grouping of operators and operands is indicated by the syntax.
Except as specified later, side effects and value computations
of subexpressions are unsequenced.
Reading an object's value to determine where to store it does not count as "determining the value to be stored". This means that the only point of contention can be whether or not we are "modifying" the object i: if we are, it is undefined; if we are not, it is OK.
Does storing the value 0 into an object which already contains the value 0 count as "modifying the stored value"? By the plain English definition of "modify" I would have to say not; leaving something unchanged is the opposite of modifying it.
It is, however, clear that this would be undefined behaviour:
int i = 0, *a = &i;
a[i] = 1;
Here there can be no doubt that the stored value is read for a purpose other than determining the value to be stored (the value to be stored is a constant), and that the value of i is modified.

Question about C programming

int a, b;
a = 1;
a = a + a++;
a = 1;
b = a + a++;
printf("%d %d, a, b);
output : 3,2
What's the difference between line 3 and 5?
What you are doing is undefined.
You can't change the value of a variable you are about to assign to.
You also can't change the value of a variable with a side effect and also try to use that same variable elsewhere in the same expression (unless there is a sequence point, but in this case there isn't). The order of evaluation for the two arguments for + is undefined.
So if there is a difference between the two lines, it is that the first is undefined for two reasons, and line 5 is only undefined for one reason. But the point is both line 3 and line 5 are undefined and doing either is wrong.
What you're doing on line 3 is undefined. C++ has the concept of "sequence points" (usually delimited by semicolons). If you modify an object more than once per sequence point, it's illegal, as you've done in line 3. As section 6.5 of C99 says:
(2) Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.
Line 5 is also undefined because of the second sentence. You read a to get its value, which you then use in another assignment in a++.
a++ is a post-fix operator, it gets the value of a then increments it.
So, for lines 2,3:
a = 1
a = 1 + 1, a is incremented.
a becomes 3 (Note, the order these operations are performed may vary between compilers, and a can easily also become 2)
for lines 4,5:
a = 1
b = 1 + 1, a is incremented.
b becomes 2, a becomes 2. (Due to undefined behaviour, b could also become 3 of a++ is processed before a)
Note that, other than for understanding how postfix operators work, I really wouldn't recommend using this trick. It's undefined behavior and will get different results when compiled using different compilers
As such, it is not only a needlessly confusing way to do things, but an unreliable, and worst-practice way of doing it.
EDIT: And has others have pointed out, this is actually undefined behavior.
Line 3 is undefined, line 5 is not.
EDIT:
As Prasoon correctly points out, both are UB.
The simple expression a + a++ is undefined because of the following:
The operator + is not a sequence point, so the side effects of each operands may happen in either order.
a is initially 1.
One of two possible [sensible] scenarios may occur:
The first operand, a is evaluated first,
a) Its value, 1 will be stored in a register, R. No side effects occur.
b) The second operand a++ is evaluated. It evaluates to 1 also, and is added to the same register R. As a side effect, the stored value of a is set to 2.
c) The result of the addition, currently in R is written back to a. The final value of a is 2.
The second operand a++ is evaluated first.
a) It is evaluated to 1 and stored in register R. The stored value of a is incremented to 2.
b) The first operand a is read. It now contains the value 2, not 1! It is added to R.
c) R contains 3, and this result is written back to a. The result of the addition is now 3, not 2, like in our first case!
In short, you mustn't rely on such code to work at all.

Resources