This question already has answers here:
sequence points in c
(4 answers)
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Closed 6 years ago.
Can anyone help me solve this question
void main()
{
int num, a=5;
num = -a-- + + ++a;
printf("%d %d\n", num, a);
}
Answer: 0 5
But how? Can anyone explain the logic behind?
Once upon a time I watched two aggressive Boston drivers trying to park in the same parking space, such that they actually crashed their cars into each other. Something very much like that happens here.
There are other things wrong with this code, but let's just focus on what gets stored into a. If we imagine that the expression a-- + ++a gets evaluated from left to right, the first thing that happens is the a-- part. a started out as 5, so a-- has the value of 5, with a note to store 4 into a. Imagine that this note is given to the driver of the first car for delivery.
Next we get to ++a. Assume for the moment that the message to store 4 into a hasn't gotten through yet. So ++a has the value 6, with a note to store 6 into a. Imagine that this note is given to the driver of the second car for delivery.
One big question is, what is the final value of num, but I'm not worried about that for now. Let's just ask, what is the final value of a? We've got one car driver trying to set it to 4, and one trying to set to 6. But there's only one parking space labeled a. Maybe the first driver gets there first and sets it to 4, or maybe the second driver gets there and sets it to 6. Or maybe they get there at the exact same time, and crash into each other, and a gets set to a twisted piece of metal from the second car's front bumper. (Perhaps that twisted piece of metal looks a little like the number 5, so printing 5 was the best that printf could do.)
As I said, there are other things wrong with this code. For one thing, we shouldn't have imagined that it gets evaluated from left to right; there's no guarantee about that at all. We also don't know if the ++a part should operate on the original value of a, or the value affected by a--. In fact, we have no idea what this expression should do, and the perhaps surprising fact is, the compiler doesn't know, either! Different compilers will interpret an expression like this differently, giving wildly different answers. None of the answers are right, and none are wrong, because this expression is undefined.
The expression is undefined because there are two different attempts inside it to assign a new value to a. (You'll find more formal definitions of undefined behavior in the linked answers.) So a simple rule for avoiding undefined behavior is, "don't try to set a variable's value twice in one expression".
You might be thinking that this expression ought to have a well-defined interpretation. You might be thinking that it should obviously be evaluated from left to right. You might be thinking that a-- should store the new value into a immediately, such that it would be guaranteed to be the value seen by "later" parts of the expression. You might think those things, and they might be true about some other programming language, but not C. C does not guarantee that expressions are evaluated strictly left to right, and it does not guarantee that a-- or ++a update a's value immediately. Trying to update a's value twice in one expression leads to undefined behavior, and undefined behavior means that anything can happen.
Related
This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Closed 2 years ago.
Notice: this is a self-Q/A and a more visible targeting the erroneous information promoted by the book "Let us C". Also, please let's keep the c++ out of the discussion, this question is about C.
I am reading the book "Let us C" by Yashwant Kanetkar.
In the book there is the following example:
#include <stdio.h>
int main(void) {
int a = 1;
printf("%d %d %d", a, ++a, a++);
}
The author claims that this code should output 3 3 1:
Surprisingly, it outputs 3 3 1. This is
because C’s calling convention is from right to left. That is, firstly
1 is passed through the expression a++ and then a is incremented
to 2. Then result of ++a is passed. That is, a is incremented to 3
and then passed. Finally, latest value of a, i.e. 3, is passed. Thus in
right to left order 1, 3, 3 get passed. Once printf( ) collects them it
prints them in the order in which we have asked it to get them
printed (and not the order in which they were passed). Thus 3 3 1
gets printed.
However when I compile the code and run it with clang, the result is 1 2 2, not 3 3 1; why is that?
The author is wrong. Not only is the order of evaluation of function arguments unspecified in C, the evaluations are unsequenced with regards to each other. Adding to the injury, reading and modifying the same object without an intervening sequence point in independent expressions (here the value of a is evaluated in 3 independent expressions and modified in 2) has undefined behaviour, so the compiler has the liberty of producing any kind of code that it sees fit.
For details, see Why are these constructs using pre and post-increment undefined behavior?
C’s calling convention
This has nothing to do with calling convention! And C does not even specify a certain calling convention - "cdecl" etc are x86 PC inventions (and have nothing to do with this). The correct and formal C language term is order of evaluation.
The order of evaluation is unspecified behavior (formally defined term), meaning that we can't know if it is left to right or right to left. The compiler need not document it and need not have a consistent order from case to case basis.
But there is a more severe problem yet here: the so-called unsequenced side-effects. C17 6.5/2 states:
If a side effect on a scalar object is unsequenced relative to either a different side effect
on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an unsequenced side
effect occurs in any of the orderings.
This text is quite hard to digest for normal humans. A rough, simplified translation from language-lawyer nerd language to plain English:
In case the binary operators1) used in the expression don't explicitly state the order that the operands are executed2) and,
a side-effect, such as changing the value, happens to a variable in the expression, and,
that same variable is used elsewhere in the same expression,
then the program is broken and might do anything.
1) Operators with 2 operands.
2) Most operators don't do this, only a few exceptions like || && , operators do so.
The author is wrong and the book has multiple instances of incorrect statements like this one.
In C, the behavior of printf("%d %d %d", a, ++a, a++); is undefined both because the order of evaluation of function arguments is unspecified and because modifying the same object multiple times between 2 sequence points has undefined behavior just to name these two.
Note that the book is referenced as do not use in The Definitive C Book Guide and List for providing incorrect advice with this precise example.
Note also that other languages may have a different take on this kind of statement, notably java where the behavior is fully defined.
I read many years ago in University (C step by step by Mitchell Waite) which c compiler (some compilers) uses stack for printf by pushing arguments from right to left (to specifier) and then pop them one by one and print them.
I write this code and the output is 3 3 1 : Online Demo.
Based on the book in the stack we have something like this
But after a minor challenges with experts (in comments) I found that maybe in some compilers this sequence be true but not for all.
#Lundin provided this code and the output is 1 2 2 :Online Demo
and #Bob__ provided another example which output is totally different: Online Demo
It totally depends on compiler implementation and has undefined behaviour.
Can somebody explain what is happening with the precedence in this code? I've be trying to figure out what is happening by myself but I could'nt handle it alone.
#include <stdio.h>
int main(void) {
int v[]={20,35,76,80};
int *a;
a=&v[1];
--(*++a);
printf("%d,%d,%d,%d\n",v[0],v[1],v[2],v[3]);
(*++a);
printf("%d\n", *a);
*a--=*a+1; // WHAT IS HAPPENING HERE?
printf("%d\n", *a);
printf("%d,%d,%d,%d\n",v[0],v[1],v[2],v[3]);
}
//OUTPUT
20,35,75,80
80
75
20,35,75,76
*a--=*a+1; // WHAT IS HAPPENING HERE?
What's happening is that the behavior is undefined.
6.5 Expressions
...
2 If a side effect on a scalar object is unsequenced relative to either a different side effect
on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an unsequenced side
effect occurs in any of the orderings.84)
3 The grouping of operators and operands is indicated by the syntax.85) Except as specified
later, side effects and value computations of subexpressions are unsequenced.86)
C 2011 Online Draft (N1570)
The expressions *a-- and *a are unsequenced relative to each other. Except in a few cases, C does not guarantee that expressions are evaluated left to right; therefore, it's not guaranteed that *a-- is evaluated (and the side effect applied) before *a.
*a-- has a side effect - it updates a to point to the previous element in the sequence. *a + 1 is a value computation - it adds 1 to the value of what a currently points to.
Depending on the order that *a-- and *a are evaluated and when the side effect of the -- operator is actually applied, you could be assigning the result of v[1] + 1 to v[0], or v[1] + 1 to v[1], or v[0] + 1 to v[0], or v[0] + 1 to v[1], or something else entirely.
Since the behavior is undefined, the compiler is not required to do anything in particular - it may issue a diagnostic and halt translation, it may issue a diagnostic and finish translation, or it may finish translation without a diagnostic. At runtime, the code may crash, you may get an unexpected result, or the code may work as intended.
I'm not going to explain the whole program; I'm going to focus on the "WHAT IS HAPPENING HERE" line. I think we can agree that before this line, the v[] array looks like this, with a pointing at v's last element:
+----+----+----+----+
v: | 20 | 35 | 75 | 80 |
+----+----+----+----+
0 1 2 3
^
+-|-+
a: | * |
+---+
Now, we have
*a-- = *a+1;
It looks like this is going to assign something to where a points, and decrement a. So it looks like it will assign something to v[3], but leave a pointing at v[2].
And the value that gets assigned will evidently be the value that a points to, plus 1.
But the key question is, when we take *a+1 on the right-hand side, will it use the old or the new value of a, before or after the decrement on the right-hand side? It turns out this is a really, really hard question to answer.
If we take the value after the decrement, it'll be a[2], plus 1, or 76 that gets assigned to a[3]. It looks like that's how your compiler interpreted it. And this makes a certain amount of sense, because when we read from left to right, it's easy to imagine that by the time we get around to computing *a+1, the a-- has already happened.
Or, if we took the value before the decrement, it would be a[3], plus 1, or 81 that gets assigned to a[3]. And that's how it was interpreted by three different compilers I tried it on. And this makes a certain amount of sense, too, because of course assignments actually proceed from right to left, so it's easy to imagine that *a+1 happens before the a-- on the left-hand side.
So which compiler is correct, yours or mine, and which is wrong? This is where the answer gets a little strange, and/or surprising. The answer is that neither compiler is wrong. This is because it turns out that it's not just really hard to decide what should happen here, it is (by definition) impossible to figure out what happens here. The C standard does not define how this expression should behave. In fact, it goes one farther than not defining how this expression should behave: the C Standard explicitly says that this expression is undefined. So your compiler is right to put 76 in v[3], and my compilers are right to put 81. And since "undefined behavior" means that anything can happen, it wouldn't be wrong for a compiler to arrange to put some other number into v[3], or to end up assigning to something other than v[3].
So the other part of the answer is that you must not write code like this. You must not depend on undefined behavior. It will do different things under different compilers. It may do something completely unpredictable. It is impossible to understand, maintain, or explain.
It's pretty easy to detect when an expression is undefined due to order-of-evaluation ambiguity. There are two cases: (1) the same variable gets modified twice, as in x++ + x++. (2) The same variable gets modified in one place, and used in another, as in *a-- = *a+1.
It's worth noting that one of the three compilers I used said "eo.c:15: warning: unsequenced modification and access to 'a'", and another said "eo.c:15:5: warning: operation on ‘a’ may be undefined". If your compiler has an option to enable warnings like these, use it! (Under gcc it's -Wsequence-point or -Wall. Under clang, it's -Wunsequenced or -Wall.)
See John Bode's answer for the detailed language from the C Standard that makes this expression undefined. See also the canonical StackOverflow question on this topic, Why are these constructs (using ++) undefined behavior?
Not exactly sure which expression you have problems with. Increment and decrement operators have the highest precedence. Dereference comes after. Addition, substraction, after.
But with regards to assignment, C does not specify order of evaluation (right to left or left to right).
will right hand side of an expression always evaluated first
C does not specify which of the right hand side or left hand side of the = operator is evaluated first.
*a--=*a+1;
So it could be that your pointer a is decremented first or after it's dereferenced on the right hand side.
In other words, depending on the compiler this expression could be equivalent to either:
a--;
*a = *a+1;
or
*(a-1)=*a+1;
a--;
I personally never rely too much on operator precedence in my code. I makes it more legible to either put parenthesis or separate in different lines.
Unless you're building a compiler yourself and need to make a decision to what assembly code to generate.
int p=4, s=5;
int m;
m = (p=s) * (p==s);
printf("%d",m);
How is this expression evaluated?
What will be the value of m?
How do parenthesis change the precedence of operators in this example?
I am getting m=4 in Turbo C.
Is this a code fragment that actually came up in your work, or is it an assignment someone gave you?
The expression contains two parts:
p=s /* in this part p's value is assigned */
p==s /* in this part p's value is used */
So before we can figure out what the value of the expression is, we have to figure out: does p's value get set before or after it gets used?
And the answer is -- I'm going to shout a little here -- WE DO NOT KNOW.
Let me say that again. We simply do not know whether p's value gets set before or after it gets used. So we have no way of predicting what value this expression will evaluate to.
Now, you might think that precedence will tell you whether p gets set before or after it gets used. But it turns out that, no, in this case precedence does not tell you whether p gets set before or after it gets used.
You might think that associativity will tell you whether p gets set before or after it gets used. But it turns out that, no, in this case associativity does not tell you whether p gets set before or after it gets used, either.
Finally, you might think that I'm wrong, that precedence and/or associativity have to be able to tell you what you need to know, that there has to be a way of figuring out what this expression does. But it turns out that I'm right: there is no way of figuring out what this expression does.
You could compile it and try it, but that will only tell you how the compiler you're using today chooses to evaluate it. I guarantee you that there's another compiler out there that will evaluate it differently.
You might ask, which compiler is right? And the answer is, they're both right (or at least, neither of them is wrong). As far as the C language is concerned, this expression is undefined. So there's no right answer, and your compiler isn't wrong no matter what it does.
If this is code that came up in your work, please delete it right away, and figure out what you were really trying to do, and figure out some cleaner, well-defined way of expressing it. And if this code is an assignment, your answer is simply: it's undefined. (If your instructor believes this expression has a well-defined result, you're in the unfortunate position of having an instructor who doesn't know what he's talking about.)
You're reading and writing the variable p multiple times in the same expression without a sequence point, which means the compiler is free to evaluate the two sub-expressions in any order.
This results in undefined behavior, meaning the program's behavior is unpredictable.
This question already has answers here:
Undefined behavior and sequence points
(5 answers)
Closed 8 years ago.
For my compiler class, we are gradually creating a pseudo-PASCAL compiler. It does, however, follow the same precedence as C. That being said, in the section where we create prefix and postfix operators, I get 0 for
int a = 1;
int b = 2;
++a - b++ - --b + a--
when C returns a 1. What I don't understand is how you can even get a 1. By doing straight prefix first, the answer should be 2. And by doing postfix first, the answer should be -2. By doing everything left to right, I get zero.
My question is, what should my precedence of my operators be to return a 1?
Operator precedence tells you for example whether ++a - b means (++a) - b or ++(a - b). Clearly it should be the former since the latter isn't even valid. In your implementation it's clearly the former (or you wouldn't be getting a result at all), so you implemeneted operator precedence correctly.
Operator precedence has nothing to do with the order in which subexpressions are evaluated. In fact the order in which the operator operands to + and - are evaluated is unspecified in C and any code that modifies the same variable twice without a sequence point in between invokes undefined behavior. So whichever order you choose is fine and 0 is as valid a result as any other value.
It is illegal to change variables several times in a row like that (roughly between asignments, the standard talks about sequence points). Technically, this is what the C standard calls undefined behaviour. The compiler has no obligation to detect you are writing nonsense, and can assume you will never do. Anything whatsoever can happen when you run the program (or even while compiling). Also check nasal demons in the Jargon File.
The ++ increment and -- decrement operators can be placed before or after a value, different affect. If placed before the operand (prefix), its value is immediately changed, if placed after the operand (postfix) its value is noted first, then the value is changed.
McGrath, Mike. (2006). C programming in easy steps, 2nd Edition. United Kingdom : Computer Step.
Our class was asked this question by the C programming prof:
You are given the code:
int x=1;
printf("%d",++x,x+1);
What output will it always produce ?
Most students said undefined behavior. Can anyone help me understand why it is so?
Thanks for the edit and the answers but I'm still confused.
The output is likely to be 2 in every reasonable case. In reality, what you have is undefined behavior though.
Specifically, the standard says:
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
There is a sequence point before evaluating the arguments to a function, and a sequence point after all the arguments have been evaluated (but the function not yet called). Between those two (i.e., while the arguments are being evaluated) there is not a sequence point (unless an argument is an expression includes one internally, such as using the && || or , operator).
That means the call to printf is reading the prior value both to determine the value being stored (i.e., the ++x) and to determine the value of the second argument (i.e., the x+1). This clearly violates the requirement quoted above, resulting in undefined behavior.
The fact that you've provided an extra argument for which no conversion specifier is given does not result in undefined behavior. If you supply fewer arguments that conversion specifiers, or if the (promoted) type of the argument disagrees with that of the conversion specifier you get undefined behavior -- but passing an extra parameter does not.
Any time the behavior of a program is undefined, anything can happen — the classical phrase is that "demons may fly out of your nose" — although most implementations don't go that far.
The arguments of a function are conceptually evaluated in parallel (the technical term is that there is no sequence point between their evaluation). That means the expressions ++x and x+1 may be evaluated in this order, in the opposite order, or in some interleaved way. When you modify a variable and try to access its value in parallel, the behavior is undefined.
With many implementations, the arguments are evaluated in sequence (though not always from left to right). So you're unlikely to see anything but 2 in the real world.
However, a compiler could generate code like this:
Load x into register r1.
Calculate x+1 by adding 1 to r1.
Calculate ++x by adding 1 to r1. That's ok because x has been loaded into r1. Given how the compiler was designed, step 2 cannot have modified r1, because that could only happen if x was read as well as written between two sequence points. Which is forbidden by the C standard.
Store r1 into x.
And on this (hypothetical, but correct) compiler, the program would print 3.
(EDIT: passing an extra argument to printf is correct (§7.19.6.1-2 in N1256; thanks to Prasoon Saurav) for pointing this out. Also: added an example.)
The correct answer is: the code produces undefined behavior.
The reason the behavior is undefined is that the two expressions ++x and x + 1 are modifying x and reading x for an unrelated (to modification) reason and these two actions are not separated by a sequence point. This results in undefined behavior in C (and C++). The requirement is given in 6.5/2 of C language standard.
Note, that the undefined behavior in this case has absolutely nothing to do with the fact that printf function is given only one format specifier and two actual arguments. To give more arguments to printf than there are format specifiers in the format string is perfectly legal in C. Again, the problem is rooted in the violation of expression evaluation requirements of C language.
Also note, that some participants of this discussion fail to grasp the concept of undefined behavior, and insist on mixing it with the concept of unspecified behavior. To better illustrate the difference let's consider the following simple example
int inc_x(int *x) { return ++*x; }
int x_plus_1(int x) { return x + 1; }
int x = 1;
printf("%d", inc_x(&x), x_plus_1(x));
The above code is "equivalent" to the original one, except that the operations that involve our x are wrapped into functions. What is going to happen in this latest example?
There's no undefined behavior in this code. But since the order of evaluation of printf arguments is unspecified, this code produces unspecified behavior, i.e. it is possible that printf will be called as printf("%d", 2, 2) or as printf("%d", 2, 3). In both cases the output will indeed be 2. However, the important difference of this variant is that all accesses to x are wrapped into sequence points present at the beginning and at the end of each function, so this variant does not produce undefined behavior.
This is exactly the reasoning some other posters are trying to force onto the original example. But it cannot be done. The original example produces undefined behavior, which is a completely different beast. They are apparently trying to insist that in practice undefined behavior is always equivalent to unspecified behavior. This is a totally bogus claim that only indicate the lack of expertise in those who make it. The original code produces undefined behavior, period.
To continue with the example, let's modify the previous code sample to
printf("%d %d", inc_x(&x), x_plus_1(x));
the output of the code will become generally unpredictable. It can print 2 2 or it can print 2 3. However note that even though the behavior is unpredictable, it still does not produce the undefined behavior. The behavior is unspecified, bit not undefined. Unspecified behavior is restricted to two possibilities: either 2 2 or 2 3. Undefined behavior is not restricted to anything. It can format you hard drive instead of printing something. Feel the difference.
Most students said undefined behavior. Can anyone help me understand why it is so?
Because order in which function parameters are calculated is not specified.
What output will it always produce ?
It will produce 2 in all environments I can think of. Strict interpretation of the C99 standard however renders the behaviour undefined because the accesses to x do not meet the requirements that exist between sequence points.
Most students said undefined behavior.
Can anyone help me understand why it
is so?
I will now address the second question which I understand as "Why do most of the students of my class say that the shown code constitutes undefined behaviour?" and I think no other poster has answered so far. One part of the students will have remembered examples of undefined value of expressions like
f(++i,i)
The code you give fits this pattern but the students erroneously think that the behaviour is defined anyway because printf ignores the last parameter. This nuance confuses many students. Another part of the student will be as well versed in standard as David Thornley and say "undefined behaviour" for the correct reasons explained above.
The points made about undefined behavior are correct, but there is one additional wrinkle: printf may fail. It's doing file IO; there are any number of reasons it could fail, and it's impossible to eliminate them without knowing the complete program and the context in which it will be executed.
Echoing codaddict the answer is 2.
printf will be called with argument 2 and it will print it.
If this code is put in a context like:
void do_something()
{
int x=1;
printf("%d",++x,x+1);
}
Then the behaviour of that function is completely and unambiguously defined. I'm not of course arguing that this is good or correct or that the value of x is determinable afterwards.
The output will be always (for 99.98% of the most important stadard compliant compilers and systems) 2.
According to the standard, this seems to be, by definition, "undefined behaviour", a definition/answer that is self-justifying and that says nothing about what actually can happen, and especially why.
The utility splint (which is not a std compliance checking tool), and so splint's programmers, consider this as "unspecified behaviour". This means, basically, that the evaluation of (x+1) can give 1+1 or 2+1, depending on when the update of x is actually done. Since however the expression is discarded (printf format reads 1 argument), the output is unaffected, and we can still say it is 2.
undefined.c:7:20: Argument 2 modifies x, used by argument 3 (order of
evaluation of actual parameters is undefined): printf("%d\n", ++x, x + 1)
Code has unspecified behavior. Order of evaluation of function parameters or
subexpressions is not defined, so if a value is used and modified in
different places not separated by a sequence point constraining evaluation
order, then the result of the expression is unspecified.
As said before, the unspecified behaviour affect just the evaluation of (x+1), not the whole statement or other expressions of it. So in the case of "unspecified behaviour" we can say that the output is 2, and nobody could object.
But this is not unspecified behaviour, it seems to be "undefined behaviour". And the "undefined behaviour" seems to have to be something that affect the whole statement instead of the single expression. This is due to the mistery around where the "undefined behaviour" actually occur (i.e. what exactly affects).
If there would be motivations to attach the "undefined behaviour" just to the (x+1) expression, as in the "unspecified behaviour" case, then we still could say that the output is always (100%) 2. Attaching the "undefined behaviour" just to (x+1) means that we are not able to say if it is 1+1 or 2+1; it is just "anything". But again, that "anything" is dropped because of the printf, and this means that the answer would be "always (100%) 2".
Instead, because of misterious asymmetries, the "undefined behaviour" can't be attached just to the x+1, but indeed it must affect at least the ++x (which by the way is the responsible for the undefined behaviour), if not the whole statement. If it infects just the ++x expression, the output is a "undefined value", i.e. any integer, e.g. -5847834 or 9032. If it infects the whole statement, then you could see gargabe in your console output, likely you could have to stop the program with ctrl-c, possibly before it starts to choke your cpu.
According to an urban legend, the "undefined behaviour" infects not only the whole program, but also your computer and the laws of physics, so that misterious creatures can be created by your program and fly away or eat you.
No answers explain anything competently about the topic. They are just a "oh see the standard says this" (and it is just an interpretation, as usual!). So at least you have learned that "standards exist", and they make arid the educational questions (since of course, don't forget that your code is wrong, regardless undefined/unspecified behaviourism and other standard facts), unuseful the logic arguments and aimless the deep investigations and understanding.